Computer scientists learn to predict which photos will 'go viral' on Facebook

By Tom Abate | Stanford Engineering | 22 Apr 2014

It's hard to predict which of the many millions of photos on Facebook will spring from obscurity and 'go viral.' But Stanford researchers have found some hints by studying "cascades," the term used to describe photos or videos being shared multiple times.

"It wasn't clear whether information cascades could be predicted because they happen so rarely," said Jure Leskovec, assistant professor of computer science.

According to data provided by Facebook scientists in a recent collaboration with university scientists, only 1 in 20 photos posted on the social network is shared even once. And just 1 in 4,000 gets more than 500 shares – a lot but hardly an epidemic.

"It is very hard to quantify what going viral means," said Leskovec. "Anyone would say 'Gangnam Style' went viral, but that's a singular event," he said, referring to the YouTube video that has been viewed almost 2 billion times.

In a paper being presented at the International World Wide Web Conference, the team of scientists, including Leskovec, Stanford doctoral student Justin Cheng, Facebook researchers Lada Adamic and P. Alex Dow, and Cornell University computer scientist Jon Kleinberg, will describe how they accurately predicted, 8 out of 10 times, when a photo cascade would double in shares; that is, if a photo got 10 shares, would it get 20? If it got 500, would it reach 1,000, and so on?

They began by analysing 150,000 Facebook photos, each of which had been shared at least five times. The data were stripped of names and identifiers to protect privacy.

 
Jure Leskovec, assistant professor of computer science at Stanford, explains how researchers can use the speed and pattern of photo sharing events to predict, eight times out of 10, when a photo will 'go viral' on Facebook. (Tom Abate)  
A preliminary analysis of those photos revealed that, at any given point in a cascade, there was a 50-50 chance that the number of shares would double.

The scientists then looked for variables that might help them predict doubling events more accurately than a coin toss, including the rate and speed at which photos were shared, and the structure of sharing (photos reposted in multiple networks proved to create stronger cascades).

After factoring several criteria into their analysis the computer scientists were able to accurately predict doubling events almost 80 percent of the time.

Their algorithm became more accurate the more times a photo was shared. For photos shared hundreds of times, their accuracy rate approached 88 percent.

The speed of sharing was the best predictor of cascade growth. Simply analyzing how quickly a cascade unfolded predicted doublings 78 percent of the time.

"Slow, persistent cascades don't really double in size," Leskovec said.

How a photo was shared – scientists call this the structure of the cascade – was the next best predictive factor. Photos that spread among different friendship networks or fan groups indicated a breadth of interest. Structure proved 67.1 per cent accurate in predicting doubling when used alone.

But the researchers found no simple trick to ensure widespread sharing.

"Even if you have the best cat picture ever, it could work for your network but not for my boring academic friends," Leskovec said, adding, "You have to understand your network."

Tom Abate is the associate director of communications at Stanford Engineering.