# Generalized Galton-Watson processes

This needs a better intro, but the Galton-Watson process is the archetype here.

There are many standard expositions. Two good ones:

• Gesine Reinert’s Introduction to Branching Processes: Parts 1 and 2.

• Steven Lalley’s intro.

Working through some generalisations of the Galton-Watson process as an INAR process. That is, this is something like the Galton-Watson process, but

Consider

• van Harn & Steutel’s work on “F-stable branching processes.” Also bounded influence kernel?

• Lee, Hopcraft, Jakeman and Williams on discrete stable processes. Discrete state, continuous time - How do these differ from the usual Hawkes processes, if at all?

## Long Memory Galton-Watson

For my own edification and amusement I would like to walk through the construction of a particular analogue of the continuous time Hawkes point process on a discrete index set.

Specifically, a non-Markovian generalisation of the Galton-Watson process which still operates in quantised time, but has interesting, possibly-unbounded influence kernels, like the Hawkes process.

I denote a realisation of the process $$\{N_t\}_{t\in\mathbb{N}}$$. and the associated non-negative increment process $$\{X_t\}\equiv\{N_t-N_{t-1}\}$$ and a conditional non-negative pseudo-intensity process $$\lambda_t\equiv g(\{N_s\}_{s < t})$$, adapted to the whole history $$\{N_s\}_{s < t}$$. By “pseudo-intensity” I mean that the innovation law $$X_t\sim\mathcal{L}_t$$ is parameterised (solely, for now) by some scalar-valued process $$\lambda_t(\mathcal{F}(X_t))$$. That is, $$\{X_t\}|\{N_s\}_{s < t}\sim \mathcal{L}(\lambda_t)$$. For the moment I will take this be Poisson. To complete the analogy with the Hawkes process I choose the dependence on the past values of the process linear with influence kernel $$\phi$$: This is also close to clustering, and indeed there are lots of papers noticing the connection.

$\lambda_t\equiv \phi * X$

Then a linear conditional intensity process $$\lambda_t$$ would be

$\lambda_t := \mu + \eta\sum_{0 \leq s <t} \phi(s-t-1)N_s$

The $$-1$$ in $$\phi(s-t-1)$$ is to make sure our influence kernel is defined on $$\mathbb{N}_0$$, which is convenient for typical count distribution functions.

If the kernel has bounded support such that

$s>p\Rightarrow\phi(s)=0$

then we have an autoregressive count process of order p. More on that in a moment.

What influence kernel shape will we use?

Geometric distributions are natural, although it doesn’t have to be strictly monotonic, or even unimodal. Poisson or negative binomial would also work. We could in general give any arbitrary probability mass function as influence kernel, or use a nonparametric form.

$\phi_\text{Exp}(i) = \sum_{0 \leq k <K} b_ke^{a_ki}$

for some $$\{a_k, b_k\}$$.

If we expect to be using sparsifying lasso penalties for such a kernel we probably want to decompose the kernel in a way that minimises correlation between mixture components to improve our odds of correctly identifying dependency at different scales. If we constrain our distributions to be positive the only way to do this is for them to be completely orthogonal is to have disjoint support.

Intermediately, we could choose a Poisson mixture

$\phi_\text{Pois}(i) = \sum_{0 \leq k <K} \frac{a_k^i}{i!} e^{-a_k}$

There is a subtlety here with regard to the filtration - do we set up the kernel strictly to regard triggering events at previous timesteps? If so, no problem. If we want to allow same-day triggering, we might allow the exogenous events to also contribute to the kernel, in which case we might have to estimate an extra influence parameter, or find some principled way to include it in the kernel weights.

🏗 unconditional distribution using, e.g. generator fns.

## Autoregressive characterisation

Steutel and van Harn characterised this process in 1979 - see (Wait - is this strictly true, that we can make this go with a thinning operator? Many related definitions here, muddying the waters)

We need their binomial thinning operator $$\odot$$, which is defined for some count RV $$X$$ by

$\alpha\odot X = \sum_{i=1}^X N_i$

for $$N_i$$ independent $$\operatorname{Bernoulli}(\alpha)$$ RVs.

In terms of generating functions,

$$G_{\alpha\odot X}(s)=G_{X}(1-\alpha+\alpha s)$$

There are many generalisation of this operator - see for an overview.

Anyway, you can use this thinning operator to construct an autoregressive time series model driven by thinned versions of its history.

(Maybe it would be simpler to use Fokkianos’ GLM characterisation? I think they are equivalent or nearly equivalent in ths case - certainly with stable distributions they are.)

## Estimation of parameters

Well studied for finite-order GINAR(p) processes.

## Influence kernels

Hardiman et al propose multiple-scale exponential kernels to simultaneously estimate decays and branching ratios Bacry et al 2012 have a related nonparametric method based on estimating the kernel in the spectral domain. Convergence properties are unclear.

We are also free to use a sum-of-exponentials kernel, possibly calculating the branching ratio from that alone, and some measure of tail-heaviness from that.

Possibly Smooth-lasso (penalises component CHANGE)

## Endo-exo models

Note that we can still recover the endo-exo model with this by simply calculating the projected ratio between exogenous and endogenous events. It would be interesting to derive the properties of this as a single parameter of interest.

## References

Al-Osh, M. A., and A. A. Alzaid. 1987. Journal of Time Series Analysis 8 (3): 261–75.
Al-Osh, Mohamed A., and Emad-Eldin A. A. Aly. 1992. Communications in Statistics - Theory and Methods 21 (9): 2483–92.
Aly, Emad-Eldin A. A., and Nadjib Bouzar. 2005. International Journal of Mathematics and Mathematical Sciences 2005 (1): 1–18.
Alzaid, A., and M. Al-Osh. 1988. Statistica Neerlandica 42 (1): 53–61.
Aragón, Tomás J. 2012. Applied Epidemiology Using R. MedEpi Publishing. http://www. medepi. net/epir/index. html. Calendar Time. Accessed.
Barndorff-Nielsen, O. E., and M. Sørensen. 1994. International Statistical Review / Revue Internationale de Statistique 62 (1): 133–65.
Bhat, B. R., and S. R. Adke. 1981. Advances in Applied Probability 13 (3): 498–509.
Bhattacharjee, M. C. 1987. Probability in the Engineering and Informational Sciences 1 (03): 265–78.
Bibby, Bo Martin, and Michael Sørensen. 1995. Bernoulli 1 (1/2): 17–39.
Böckenholt, Ulf. 1998. Journal of Econometrics 89 (1–2): 317–38.
Cui, Yunwei, and Robert Lund. 2009. Biometrika 96 (4): 781–92.
Drost, Feike C., Ramon van den Akker, and Bas J. M. Werker. 2009. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 467–85.
Dwass, Meyer. 1969. Journal of Applied Probability 6 (3): 682–86.
Eichler, Michael, Rainer Dahlhaus, and Johannes Dueck. 2016. Journal of Time Series Analysis, January, n/a–.
Fokianos, Konstantinos. 2011. Statistics 45 (1): 49–58.
Freeland, R. K., and B. P. M. McCabe. 2004. Journal of Time Series Analysis 25 (5): 701–22.
Gehler, Peter V., Alex D. Holub, and Max Welling. 2006. In Proceedings of the 23rd International Conference on Machine Learning, 337–44. ICML ’06. New York, NY, USA: ACM.
Geiger, Jochen, and Lars Kauffmann. 2004. Random Struct. Algorithms 25 (3): 311–35.
Hall, Andreia, Manuel Scotto, and João Cruz. 2009. TEST 19 (2): 359–74.
Harn, K. van, and F. W. Steutel. 1993. Stochastic Processes and Their Applications 45 (2): 209–30.
Harn, K. van, F. W. Steutel, and W. Vervaat. 1982. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 61 (1): 97–118.
Hawkes, Alan G., and David Oakes. 1974. Journal of Applied Probability 11 (3): 493.
Kedem, Benjamin, and Konstantinos Fokianos. 2002. Regression models for time series analysis. Chichester; Hoboken, NJ: John Wiley & Sons.
Kratz, Peter, and Etienne Pardoux. 2016. arXiv:1602.02803 [Math], February.
Kraus, Andrea, and Victor M. Panaretos. 2014. Biometrika 101 (1): 141–54.
Kvitkovičová, Andrea, and Victor M. Panaretos. 2011. Advances in Applied Probability 43 (4): 1166–90.
Laredo, Catherine, Olivier David, and Aurélie Garnier. 2009. arXiv:0902.4520 [Stat], February.
Latour, Alain. 1998. Journal of Time Series Analysis 19 (4): 439–55.
Lee, W. H., K. I. Hopcraft, and E. Jakeman. 2008. Physical Review E 77 (1): 011109.
McKenzie, Ed. 1986. Advances in Applied Probability 18 (3): 679–705.
———. 1988. Advances in Applied Probability 20 (4): 822–35.
McKenzie, Eddie. 2003. In Handbook of Statistics, edited by c Raoand and d Shanbhag, 21:573–606. Stochastic Processes: Modelling and Simulation. Elsevier.
Monteiro, Magda, Manuel G. Scotto, and Isabel Pereira. 2012. Communications in Statistics - Theory and Methods 41 (15): 2717–37.
Nanthi, K., and M.T. Wasan. 1984. Stochastic Processes and Their Applications 18 (2): 189.
Pardoux, Etienne, and Brice Samegni-Kepgnou. 2016. arXiv:1606.01619 [Math], June.
———. 2017. Journal of Applied Probability 54 (3): 905–20.
Sandkühler, J., and A. A. Eblen-Zajjur. 1994. Neuroscience 61 (4): 991–1006.
Soltani, A. R., A. Shirvani, and F. Alqallaf. 2009. Statistics & Probability Letters 79 (14): 1608–14.
Steutel, F. W., and K. van Harn. 1979. The Annals of Probability 7 (5): 893–99.
Turkman, Kamil Feridun, Manuel González Scotto, and Patrícia de Zea Bermudez. 2014. “Models for Integer-Valued Time Series.” In Non-Linear Time Series, 199–244. Springer International Publishing.
Wei, C. Z., and J. Winnicki. 1990. The Annals of Statistics 18 (4): 1757–73.
Weiß, Christian H. 2008. Advances in Statistical Analysis 92 (3): 319–41.
———. 2009. Communications in Statistics - Theory and Methods 38 (4): 447–60.
Winnicki, J. 1991. Probability Theory and Related Fields 88 (1): 77–106.
Zeger, Scott L. 1988. Biometrika 75 (4): 621–29.
Zeger, Scott L., and Bahjat Qaqish. 1988. Biometrics 44 (4): 1019–31.
Zheng, Haitao, and Ishwar V. Basawa. 2008. Statistics & Probability Letters 78 (1): 1–9.
Zheng, Haitao, Ishwar V. Basawa, and Somnath Datta. 2007. Journal of Statistical Planning and Inference 137 (1): 212–29.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.