ConclusionsΒΆ

The Youtube data is both promising and troubling, as far as revealing the secrets of endogenously triggered dynamics. The kinds of problems and promise that it shows are, I believe of general importance, and I encountered many of them in the course of this thesis.

First I presented the seismically inspired Hawkes self exciting process as a potential model for viral dynamics in social media, and mentioned how its parameters might quantify endogenous dynamics in the system. I then presented the methods to estimate such a model, which are uncontroversial.

Ultimately I was not able to recommend tenable estimates for the parameters for this model, however for two reasons.

Firstly, the data is sparsely observed, and the ad hoc interpolation scheme used to approximate the missing information destroys some times of timing information, removing our ability to estimate kernel parameters in the important Omori law case.

Secondly, inhomogeneity in the data lead to extremely poor model identification for the estimator, and the distribution of the estimates so compiled is not a credible predictor of "true" parameters. Using the homogeneous model for this system may give good results for earthquake modeling, where there is no exogenous influence to control for. But where we are concerned with the interaction of endogenous and exogenous factors, these methods are not flexible enough. We cannot meaningfully find the "true" values for the parameter in the model when it is too ill-specified for the estimates of those true values to be informative.

At the same time, the model behind the branching process is much more general than the version we typically fit using off-the-shelf estimators, and I have shown that estimation procedures can be extended to estimate these more general models. My attempt to extend the estimator is not the only such attempt, and there are many diverse ways that it might be done. I have demonstrated that this method can solve certain problems, removing the bias due to large spikes, and potentially identifying exogenous triggers from noisy data. There are clearly other issues to solve. At the same time, the method of penalized regression I have proposed is flexible and could provide the basis for many other such methods by different choice of kernel parameters, penalty functions and so on.

There remains much work to be done; Not only could the estimator be generalized, but it would also benefit from analysis regarding the sampling distribution, stability and model selection procedures. A practical demonstration, as I give here, is the best justification to invest in such work, however.

In short, whilst I cannot tell you right now that I have identified the governing process of Youtube, I have presented a novel way that you could eventually do so, with a little more work.