Background

Fluctuation in social systems

The notorious unpredictability and variability of social systems has achieved new levels of prominence, and possibly new extremes, with the rise of the internet. These unpredictable viral dynamics of social media have variable impact, variable magnitude and impacts, and little connection between these dimensions. Consider:

  1. On the 26th of February 2015, a low-quality photograph of a dress of indeterminate color sparking a battle on the internet that garnered 16 million views within 6 hours on Buzzfeed alone. [101]
  2. A 61-million person experiment on peer recommendations by Facebook found that strategically applied viral peer-to-peer systems can mobilize citizens politicly on a massive scale. They estimate that they were able to garner 280,000 extra votes in the election using this system - enough to strongly influence the outcome of federal elections in the US. [17]
  3. Religious militant organization Islamic State of Iraq and Syria, ISIS, exploits viral meme propagation to recruit volunteers and attract funding for its military campaigns by peer recommendation on Youtube and twitter. []

Understanding how, and when, and why this kind of viral propagation takes place is crucial to understanding the function of modern society. Why did that particular dress photograph have such an impact? For that matter, as impressive as the scale of the voter experiment is, it took the backing of a multi-billion dollar corporation to produce this effect, and yet the viral dress photo was simply a thoughtless photograph from a cheap phone camera. And yet, as we see from ISIS, understanding the dynamics of these peer-to-peer systems is implicated in global life-and-death struggles and violent political upheaval.

Learning to understand the dynamics of these systems is economically and politically important. And, thanks to the quantification of communication on the internet, potentially plausible.

One piece of the puzzle of such systems, which I explore here, is the use of models of self-exciting systems. In such system, activity may be understood to be partly exogenous, triggered by influences from the outside world, and partly endogenous, triggered by their own past activity. [104, 39, 31] Concretely, this stylized description is the kind of dynamic we observe in, for example, financial markets, where (exogenous) news about a certain company might trigger movement in the price of its stock, but also movement in the price of a company's stock could itself trigger further movements as traders attempt to surf the tide. In social systems, the mysterious popularity of the photograph of a dress viewed 16 million times in a single day is a paradigmatic example of endogenous triggering; there is no plausible news content attached to it.

The particular self-exciting system that I use here is the linear Hawkes process This model has been applied to such diverse systems as earthquakes [88], product popularity [39, 65], financial markets [57, 45], social media [31], crime [82], neural firing [18] and many others [107].

If we can successfully explain the dynamics of the data using the Hawkes process model, then we are a step closer quantitative predictions of the process behavior, and of future unpredictability by measuring and predicting the importance of the endogenous versus the exogenous component of such systems.

Youtube

The particular data that I have was collected from Youtube, the social video sharing website. Youtube is owned by Google and headquartered in the USA. It was founded in February 2005 and officially launched in November of the same year.

Distribution of popularity of video on Youtube is often claimed to exhibit classic indicators of the kind of "heavy-tailed" behavior that would indicate certain kinds of self-exciting process behavior. For example, in 2011 a YouTube software engineer was asserted to reveal that 30% of videos accounted for 99% of views on the site. [120] 1

[1]This often-cited statistic a published in British newspaper the Telegraph without references and I have been unable to find primary sources for its claims. Nonetheless, as I will show later, it is plausible given my dataset.

Shortly before the time that this dataset was collected, YouTube reported that each day it served 100 million videos and accepted more than 65,000 uploads. [95]. As at January 2012, they reported approximately 4 billion daily video views. [91], and individual videos with more than 2 billion views, [122].

They seem, in other words, to perfect test bed to experiment with self exciting models, if we can get the right sort of data about them, and the right methods to analyze it. This brings me to the question of inspecting the data.