# Simulations for the inhomogeneous estimator¶

## Empirical validation on simulated data¶

To summarize the proposed scheme:

My composite estimator combines the interpolated estimator used in the previous chapter with a $$\ell_1$$ penalized estimate of inhomogeneous background rate. The single hyper-parameter, the regularization penalty $$\pi$$ is chosen by AIC or AICc, depending which works better. The kernel boundaries are chosen to match the sampling boundaries in the data.

It is time to see if this estimates anything like what we would wish. It turns out that this procedure performs well on simulated data, as I will demonstrate with examples, and with aggregate statistics from simulations.

I begin with a single time series to visualize the kind of output we should expect.

The parameters of the test model are: $$\mu_t=2+8\mathbb{I}_{149\lt t\leq150},\, \eta=0.95,\, \kappa=1.0$$. I simulate this time series over 300 ttme units, quantize this time series into unit intervals, and fit on randomly interpolated time series based on these intervals.

First, consider the following realization of our generating process, one where it has nearly homogeneous parameters. $$\mu_t$$ and a realization of the resulting quantized intensity process. $$\eta=0.95,\, \kappa=1.0$$.

In this case the homogenous fit is good, returning $$\hat{\mu}=1.945,\, \hat{\eta}=0.956,\, \hat{\kappa}=1.010$$

In this situation it is not a priori clear that we want to bother with inhomogeneous estimators, and indeed that it could be risky to do so, introducing increased estimation variance for no benefit.

Fitting this model using the vanilla AIC indeed results in a significant over-fitting of background rate fluctuations, and also poor matches to the homogeneous parameters, with a concomitant underestimation of the branching ratio. $$\mu_t$$ recovery under vanilla AIC. $$\eta=0.95,\, \kappa=1.0$$. $$\hat{\eta}_\mathrm{AIC}=0.688,\, \hat{\kappa}_\mathrm{AIC}=3.358$$

On the other hand, the finite-sample AICc correction chooses a model which not only recovers the original parameters well, but also estimates the background rate with high accuracy. $$\mu_t$$ recovery under finite-sample-penalized AICc. $$\eta=0.95,\, \kappa=1.0$$. $$\hat{\eta}_\mathrm{AICc}=0.953,\, \hat{\kappa}_\mathrm{AICc}=1.051$$

Based on this particular graph, and ensemble simulations, it seems that AICc generally performs acceptably in selecting the model, and certainly performs better than the AIC itself. For the remainder of this section I will continue to use it in preference to the AIC.

I reiterate, these point estimates are presented without guarantees or confidence intervals at this stage. It is not immediately clear what the confidence intervals of the complete estimator are, especially in combination with the AIC based model selection.

A single graph is not representative; by changing the seed value I can get results both more and less optimistic than this. In this case, for example, there was a random decline in unconditional intensity in this realization immediately before the spike which intuitively should make the spike "easier" to detect. We will need more comprehensive tests to be persuasive.

I construct a simulation test that looks somewhat like kind of time series I find in the Youtube data set. I choose $$\mu_t=2+398\mathbb{I}_{149\lt t\leq 150},\, \eta=0.8,\, \kappa=1.0$$ This corresponds to $$\mu=2,\omega_{149}=398,$$ and $$\omega_i=0 \forall i\ne 149$$. I repeat the simulation 100 times, and compare error in estimates. 1

  For this comparison to be fair, I would have used the same parameters as with the "lead balloon" test, i.e. 200 steps, initial spike, and 300 repetitions. The reason for the discrepancy is that the simulations were too CPU intensive to repeat once I had entered the wrong parameters. Fixing this in future work should be trivial, however.

Performance is variable but generally superior to the inhomogenous test on the same data.

This inhomogeneous rate estimator still only captures the true value a small fraction of the time, and yet clear it is less biassed than the homogenous estimator.

Whether this is what we want is context dependent. We might be especially concerned with "true" values of the parameters of the generating process, or we might be concerned with recovering the "true" background rate. It's hard to plot estimates for whole functions. Instead I will plot the error functional using $$L_1$$ norm - specifically $$\mathrm{Err}\|\hat{\mu}_t-\mu_t\|_1/T$$ . Note that since the homogeneous estimator assumes that $$\mu_t\equiv \mu$$ but the true generating process is not constant, that the homogeneous estimate cannot attain zero error by this metric. Overall error in background rate estimate $$\mathrm{Err}\|\hat{\mu}_t-\mu_t\|_1/T$$

The last one is most disappointing; it seems that with this estimator the background rate estimates are sometimes worse, when better background rate estimation was a selling point of this estimator.

Alternatively, consider the possibility that the estimator is identifying background rate peaks in terms of size, but it placing them at the incorrect location - say, $$\hat{\mu}_t=2+398\mathbb{I}_{148\lt t\leq 149}$$ instead of $$\mu_t=2+398\mathbb{I}_{149\lt t\leq 150}$$. In the the $$L_1$$ penalty, this is more heavily penalized than $$\hat{mu}_t=2$$, which may not be desired behavior.

To diagnose this, I try an alternative error metric for this background rate, based on comparing the background rate and estimate using some smoothing kernel $$\delta$$.

$\mathrm{Err}_\mathrm{s}(\hat{mu}_t):= \|\hat{\mu}_t*\delta-\mu_t*\delta\|_1/T$

I pick the arbitrary value

$\delta(t) := \frac 1 5 \mathbb{I}_{[-5/2,5/2)}$

Indeed, this smoothed loss function shows the inhomogeneous estimator in a better light, performing nearly always better than the homogeneous competitor. Whether the smoothed version is a more appropriate error metric will depend upon the purpose. Error in smoothed background rate $$\mathrm{Err}_\mathrm{s}(\hat{mu}_t)$$

It seems that the inhomogeneous estimator is in fact selecting "good" parameters for this model. To understand what is happening here, I show the estimation paths for all the models in the simulation set for two of the parameters, once again with $$\mu_t=2+398\mathbb{I}_{149\lt t\leq 150},\, \eta=0.8,\, \kappa=1.0$$ Estimated parameter values and AICc for different penalty parameters

As the penalty parameter increases, the estimates of the branching parameters are gradually perturbed over their ranges. Using the AICc selection criteria corresponds to presuming that the model's estimates are optimal at the minimum of this AICc estimate. The graph might reassure us of this.

It also shows a potential problem with this procedure, which is that the AICc minimum for any given estimation path is very flat - hardly visible to the naked eye at this scale. It is also not clear whether the minimum necessarily corresponds to the oracle estimates in general. We could possibly do better by choosing some method of choosing the penalty, such as cross validation. For now, the AICc functions well enough for my purposes, and I will consider that question no further.

Now, let us consider the case that was especially poorly handled by the homogeneous estimator - the case of a non-branching, purely heterogeneous process, the "lead balloon" I set $$\eta=0$$, and rerun the previous batch of simulations. $$\kappa$$ recovery under finite-sample-penalized AICc. Estimation error in smoothed background rate $$\mathrm{Err}_\mathrm{s}(\hat{\mu}_t)$$

Once again, the approximation to the oracle values due to the inhomogeneous estimator are imperfect, but superior to the homogenous ones.

The negative values estimated for $$\hat{\eta}$$ and $$\hat{\eta}$$ are indications that I should constrain the estimator values to meaningful ranges, which is am omission in my coding.

More, I might improve this result by embedding this estimate in a selection procedure that will reject the Hawkes model entirely when there is no branching, much as in the homogeneous case we could reject the Hawkes mode in favor of a constant-rate Poisson process.

However, this is enough to begin.

There are many more steps in evaluating a new estimator than these simulations. I should also test the performance on data generated by non sparse-background rate processes $$\mu_t$$, and variable bin width, and investigate eh influence of observation interval, test alternative kernels and interpolation schemes, and so on.

For now, I have demonstrated that unlike the homogenous estimator, there is at least some hope for Youtube-type data, and I move on.