# A quick introduction to point process theory¶

I construct the point process theory necessary for the model here informally; More formal introductions to the field may be found in e.g. [34, 35].

I have already introduced basic notation. I return to it here for context.

## Univariate temporal point processes¶

A temporal point process is a stochastic model for the random placement of points in time.
The \(N(t)\) function that I discussed in the context of video view counter is the obvious example.
If \(N(t)\) counts the number of views of a video, and it increments by one every time someone finishes watching a video then we may associate with each video such a counting function. I call each increment of the counting function an occurrences. ^{1}
When I consider the generating process to be a stochastic model i will refer to specific time series as *realizations* generated by that model.

I may equivalently represent the information in that function by the list of occurrence times \(0=t_1,t_2,...,t_N=:t_{1:N}\), taken to be ordered.

We can see the equivalence by examining
\(N: \mathbb{R}\mapsto\mathbb{Z}^+\) such that
\(N(t)\equiv \sum_{i=1}^N\mathbb{I}_{\{t_i<t\}}\).
Here we will only be considering *simple* processes, which means that for all event indices \(i\), \(\mathrm{Pr}(t_i=t_{i+!})=0\) - so the increments of the series have size one almost surely.

[1] | Referred to in the literature also as events, which is ambiguous,
or arrivals, which is an awkward term to describe video views,
or epochs, which sounds like it has something to do
with the extinctions of dinosaurs. Occurrences, as used in seismology,
seems the least confusing to me. |

The Poisson process is the stochastic process whose inter-occurrence times are identically and independently distributed such that \(t_i-t_{i-1}\sim\textrm{Exp}(1/\lambda)\). By convention, we set all \(t_i\geq 0\) and thence \(N(0)=0\) a.s. our counting representation. It is easy to show that \(N(t)\sim\textrm{Pois}(\lambda t)\).

It is a standard result, that increments of such process have a Poisson distribution. For \(t_j \geq t_i\):

## Conditional intensity processes¶

Note also the standard result that

This suggests that we could generalise the Poisson process to have a variable rate by choosing \(\lambda\) to be a function of time, or even a stochastic process itself.
This is indeed possible.
In the former case, we have an inhomogeneous Poisson process, and in the latter, a *doubly stochastic* or *Cox* process, where we might condition the event upon some \(\sigma\) algebra, \(\mathcal{S}\).
We call \(\lambda(t|\mathcal{S})\) the *conditional intensity process*.

The Hawkes process is a particular case of the doubly stochastic Poisson process: it is a linear self-exciting process. Its conditional intensity process has a particular form which depends upon the previous values of the process itself. Specifically, given the left-continuous filtration \(\mathcal{F}^-_{(-\infty,t)}\) generated by the occurences of \(\{N(s)\}_{s\lt t}\), its conditional intensity is given, up to an additive constant background rate \(\mu\), by the convolution of the path of the process with an interaction kernel \(\phi\) and an "branching ratio" \(\eta\). The interaction kernel \(\phi\) is taken to be a probability density absolutely dominated by the Lebesque measure - i.e. it has no atoms. To keep the process causal, we require that the interaction kernel has only positive support. \(\phi(t)=0 \quad\forall t \lt 0\)

(Since we only deal with left-continuous filtrations in temporal point process, I will suppress the "-" superscript henceforth.)

The interpretation here is that each occurrence increases the probability of another occurrence in the near future, or, equivalently, momentarily increases the rate of new occurrences. There are several equivalent ways of thinking about this. One is the formulation in terms of rate - and this is the basis of the goodness of fit test used for this model [88].

Another is as a branching process [60], much like the Galton-Watson model of reproduction. In this case the branching ratio \(\eta\) is the expected number of offspring that any given occurrence will have. The offspring may in turn have, on average, \(\eta\) offspring of their own, and so on.

In this branching formulation, \(\mu\) is the "immigration rate", and reflect the rate of new arrivals to our population of occurrences. The system approaches a stationary distribution if \(\mu>0\) and \(1\gt\eta\geq 0\). [59]

The importance of this model in the current context is that these models gives us the possibility that observed occurrence in a point process is exogenously driven, e.g. it is an immigrant, or endogenously driven - it was the offspring of a previous occurrence. For Youtube, we could think of Youtube views driven by advertising, or views driven by the recommendations of your peers.

The key parameter in this sense is the branching ratio. Using the usual branching process generating function arguments, one can show that the expected number of occurrences due to a single immigrant is \(1/(1-\eta)\). As \(\eta\to 1\) the proportion of occurrences attributed to the endogenous dynamics of the system increases rapidly, until, when it passes criticality such that \(\eta>1\) the system diverges to infinity with positive probability.

Where we consider realizations on the half line, meaning with no events before time \(0\), we usually take by convention \(t_0=0\) Then we have

and we abbreviate the whole thing to \(\lambda(t|\mathcal{F}_t)\), or even \(\lambda^*(t)\).

This is interpred in the same way as the intensity of the Poisson process, but instanteously: The greater the intensity, the more likely another occurence in the immediate future.

Inspecting the defintion of intensity for the process the Hawkes process, this means that, as we had hoped, the more occurrences we've had recently, the more we are likely to have soon.

## Kernels¶

I have left the kernel unspecified up to now. Apart from demanding "causality", normalization, and continuity, We have in principle the freedom to choose here, and even to non parametrically estimate an interaction kernel. [82, 6, 57, 8]

There are some classic and widely-supported favorites, however, and I restrict myself to these here as they are the ones that my software supports.

### Exponential kernel¶

The simplest kernel, and the fastest to calculate, is the exponential.

Such a kernel gives the Hawkes process a number of convenient properties, such as a closed-form linear estimator for the process. [34] (chapter 8.5), computationally-efficent parameter estimation [92], [89], and a Markovian representation [86].

When comapring the influence of this kernel with other kernel shapes we can quantify the "time scale" of this kernel in various ways. One choice is the "mean delay", in the sense that if we take interpret the kernel as a probability density for some random variable \(X\sim\phi_\mathrm{Exp}(\kappa)\), then its expected value is \(EX=\kappa\). We could alternatively choose the median, which is given by \(\log(2)\kappa\).

### “Basic” power-law kernel families¶

The *Omori law* is a widely used kernel, famous for its long history in earthquake modeling, e.g [112].

In the current context, I use the modified Omori law with the following parameterization, recycling \(\kappa\) as a kernel parameter to economise on limited supplied of greek letters.

The notable feature of this kernel is that for many parameter values it has a *power law tail* with shape controlled by the \(\theta\) parameter.

Interpreting the Omori law as an interaction kernel, we can expect long-rage interactions to be comparatively more important than for exponential kernels with the same branching ratio. If we interpret this kernel as a probability density we can see that variables draw from this distribution do not necessarily have finite moments of any order.

The "mean delay" \(X\sim\phi_\mathrm{Omori}, \theta \gt 1\Rightarrow EX\kappa/(\theta-1)\). When \(\theta \leq 1\) the expectation does not exist. A little calculus reveals that the median point for an Omori-law distributed variable is always defined, and given by \(\kappa(2^{1/\theta}-1)\).

This long range interaction effect, as well as high branching ratio, is implicated in surprisingly variable behavior in various dynamical systems, so we woudl like to know if our model had this kind of kernel. [39, 53]

## The Hawkes Process in action¶

Having presented the model I present *why* --- I wish to understand how much of the behaviour of a time series is generated by endogenous dyanamics, and what these dynamics are.

To this end, the branching ratio \(\eta\) of the Hawkes process is a plausible choice to partly quantify this, as it tells me about the criticality and explosive kind of behaviour that we can expect, and in particular, the ratio between endogenous and exogenously triggered behavior in the system.

I might also be concerned about the timescale and rate of these dynamics, in which case I will also want to estimate the type and parameters of the influence kernel. \(\phi\) This will tell me, loosely, how *rapidly* these dynamics work, and, to an extent, what kind of evolution we can expect in the system. [39].

The background rate, \(\mu\) seems to be of more instrumental interest. if we truly regard it as an exogenous factor, then it is a "given" whose effects we wish to understand.
Nonetheless, we might wish to idenfity it to determin, for example, *why* something went viral, or did not, or identify the effect of a change in background rate.