## Estimating survival rates

Hereβs the set-up: looking at a data set of individualsβ lifespans you would like to infer the distributionsβAnalysing when people die, or things break etc. The statistical problem of estimating how long peopleβs lives are is complicated somewhat by the particular structure of the data β loosely, βevery person dies at most one timeβ, and there are certain characteristic difficulties that arise, such as right-censorship. (If you are looking at data from an experiment and not all your subjects have died yet, they presumably die later, but you donβt know when.)

Handily, the tools one invents to solve this kind of problem end up being useful to solve other problems, such as point process inference.

So letβs say you have a random variable \(X\) of positive support according to which the lifetime of your people (components, machines, whatever) are distributed, which possesses a pdf \(f_X(t)\) and cdf \(F_X(T)\).

We define several useful functions:

- The survival function (which is also the right tail CDF)
- \(S(t):=1-F(t)\)
- the hazard function
- \(\lambda(t):=f(t)/S(t)\)
- the cumulative hazard function
- \(\Lambda(t) :=\int_0^t\lambda(s) \textrm{d} s.\)

Why? Because it happens to come out nicely if we do
that, and these functions acquire intuitive interpretations once we squint at
them a bit.
The survival function is the probability of an individual surviving to time
\(t\) etc.
The hazard function will turn out to be the rate of deaths at time \(t\) *given that one has not yet occurred.*

Using the chain rule we can find the following useful relation:

\[S(t)=\exp[-\Lambda (t)]={\frac {f(t)}{\lambda (t)}}\]

The hazard function can be pretty much any non-negative function of non-negative support (or more generally, a Schwartz distribution, but letβs ignore that possibility for the moment.)

### Life table method

Over intervals of time \([t,u]\) we define the cumulative hazard increment

\[ H(t,u) :=\int_t^u h (s) \textrm{d} s = H(u)-H(t) \]

and the survival increment

\[ \chi(t,u) :=\frac{\chi(u)}{\chi(t)} \]

The following relations are useful

\[ \chi(t)=\exp[-H (t)]={\frac {f(t)}{h (t)}}. \]

and

\[ \chi(t,u)=\frac{\exp[-H (u)]}{\exp[-H (t)]}=\exp[H (t)-H (u)]=\exp[-H (t,u)] \]

and so

\[-\log\chi(t,u)=H (t,u).\]

We estimate hazard via the *life table* method. Given a time interval
\([t_{i}, t_{i+1})\) and survival counts \(N(t_{i})\) and \(N(t_{i+1})\) at,
respectively, the beginning and end of that interval, (assuming no
immigration) the life table estimate of a survival increment is

\[\hat{\chi}(t_i, t_{i+1}):= \frac{N(t_{i+1})}{N(t_{i})}\]

Plugging this in, we obtain cumulative hazard increment estimates

\[\begin{aligned} \hat{H} (t_i, t_{i+1})&=-\log \hat{\chi}(t_i, t_{i+1})\\ &=\log \frac{ N(t_{i}) }{ N(t_{i+1}) } \end{aligned}\]

From this we construct further point estimates of \(H\) at \(t\in[0, t_1, t_2,\dots]\) as

\[\hat{H} (t)=\sum_{t_i\leq t}\hat{H}(t_{i},t_{i+1})\] By introducing assumptions on the functional form, can estimate the entire hazard function. For example, we can take \(h (t)\) to be piecewise constant, so that

\[\begin{aligned} h (t)=\sum_i\mathbb{I}\{t_{i}<t<t_{i+1}\} h_i \end{aligned}\]

This corresponds to the assumption that \(H\) is piecewise linear and continuous; we are constructing a piecewise linear interpolant. Thus, for \(t\in(t_i,t_{i+1}],\) we such an interpolant \(\hat{H}\) for \(t\in[0,t_M]\) by a first order polynomial spline with knots \(0,t_1,t_2,\dots, t_M\) and values \(\hat{H}(0), \hat{H}(t_1), \hat{H}(t_2) \dots,\hat{H}(t_M).\)

### Nelson-Aalen estimates

a.k.a. Empirical Cumulative Hazard Function estimator.

The original Aalen paper on this is notoriously beautiful because of clever construction of a life point process and associated martingale. Clear and worth reading. Spoiler: despite the elegant derivation, the actual estimator is something a high-school student could discover by guessing.

TBC.

## Other reliability stuff

Reliawiki has handy stuff, e.g. comprehensive docs on the Weibull law. Itβs in support of some software package their are trying to sell, I think?

We can calculate an βeffective ageβ if we want an intuitive risk measure (Brenner, Gefeller, and Greenland 1993).

## tools

- For python sebp/scikit-survival: Survival analysis built on top of scikit-learn.
- For R Emily Zaborβs tutorial is a goof intro to the large R survival ecosystem ## References

*The Annals of Statistics*6 (4): 701β26.

*Survival and Event History Analysis: A Process Point of View*. Statistics for Biology and Health. New York, NY: Springer.

*Statistical models based on counting processes*. Corr. 2. print. Springer series in statistics. New York, NY: Springer.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Wiley StatsRef: Statistics Reference Online*, 1β14. American Cancer Society.

*Applied Survival Analysis*, 355β58. John Wiley & Sons, Ltd.

*Epidemiology (Cambridge, Mass.)*4 (3): 229β36.

*Journal of the Royal Statistical Society: Series B (Methodological)*34 (2): 187β202.

*Analysis of Survival Data.*

*Journal of Chronic Diseases*8 (6): 699β712.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Journal of the American Statistical Association*83 (402): 414β25.

*Gastroenterology & Hepatology*2 (5): 380β83.

*Journal of Machine Learning Research*12 (32): 1185β1224.

*The Annals of Statistics*18 (3): 1259β94.

*International Statistical Review / Revue Internationale de Statistique*60 (3): 355β87.

*Survival Analysis: State of the Art*, edited by John P. Klein and Prem K. Goel, 211β36. Nato Science 211. Springer Netherlands.

*Applied Survival Analysis: Regression Modeling of Time to Event Data*. Wiley Series in Probability and Statistics. New York: Wiley.

*Applied Survival Analysis: Regression Modeling of Time-to-Event Data*. Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Survival Analysis: A Self-Learning Text*. Statistics for Biology and Health 1.0. Springer.

*Journal of the American Statistical Association*76 (374): 231β40.

*Biometrika*99 (3): 717β31.

*Journal of Quality Technology*1 (1): 27β52.

*Technometrics*42 (1): 12β25.

*arXiv:1905.09690 [Cs, Stat]*, January.

*Applied Survival Analysis*, 244β85. John Wiley & Sons, Ltd.

*Annual Review of Statistics and Its Application*8 (1): 413β37.

*Journal of the American Statistical Association*72 (360): 854β58.

*Journal of Machine Learning Research*21 (212): 1β6.

*Journal of the American Statistical Association*98 (464): 789β95.

*Journal of Applied Probability*25 (3): 501β9.

*Proceedings of the 10th International Conference on Neural Information Processing Systems*, 661β67. NIPSβ97. Cambridge, MA, USA: MIT Press.

*Journal of Statistical Software*39 (5).

*Biometrics*56 (1): 227β36.

*International Journal of Forecasting*, April.

*Statistics in Medicine*16 (4): 385β95.

## References

*The Annals of Statistics*6 (4): 701β26.

*Survival and Event History Analysis: A Process Point of View*. Statistics for Biology and Health. New York, NY: Springer.

*Statistical models based on counting processes*. Corr. 2. print. Springer series in statistics. New York, NY: Springer.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Wiley StatsRef: Statistics Reference Online*, 1β14. American Cancer Society.

*Applied Survival Analysis*, 355β58. John Wiley & Sons, Ltd.

*Epidemiology (Cambridge, Mass.)*4 (3): 229β36.

*Journal of the Royal Statistical Society: Series B (Methodological)*34 (2): 187β202.

*Analysis of Survival Data.*

*Journal of Chronic Diseases*8 (6): 699β712.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Journal of the American Statistical Association*83 (402): 414β25.

*Gastroenterology & Hepatology*2 (5): 380β83.

*Journal of Machine Learning Research*12 (32): 1185β1224.

*The Annals of Statistics*18 (3): 1259β94.

*International Statistical Review / Revue Internationale de Statistique*60 (3): 355β87.

*Survival Analysis: State of the Art*, edited by John P. Klein and Prem K. Goel, 211β36. Nato Science 211. Springer Netherlands.

*Applied Survival Analysis: Regression Modeling of Time to Event Data*. Wiley Series in Probability and Statistics. New York: Wiley.

*Applied Survival Analysis: Regression Modeling of Time-to-Event Data*. Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.

*Wiley StatsRef: Statistics Reference Online*. American Cancer Society.

*Survival Analysis: A Self-Learning Text*. Statistics for Biology and Health 1.0. Springer.

*Journal of the American Statistical Association*76 (374): 231β40.

*Biometrika*99 (3): 717β31.

*Journal of Quality Technology*1 (1): 27β52.

*Technometrics*42 (1): 12β25.

*arXiv:1905.09690 [Cs, Stat]*, January.

*Applied Survival Analysis*, 244β85. John Wiley & Sons, Ltd.

*Annual Review of Statistics and Its Application*8 (1): 413β37.

*Journal of the American Statistical Association*72 (360): 854β58.

*Journal of Machine Learning Research*21 (212): 1β6.

*Journal of the American Statistical Association*98 (464): 789β95.

*Journal of Applied Probability*25 (3): 501β9.

*Proceedings of the 10th International Conference on Neural Information Processing Systems*, 661β67. NIPSβ97. Cambridge, MA, USA: MIT Press.

*Journal of Statistical Software*39 (5).

*Biometrics*56 (1): 227β36.

*International Journal of Forecasting*, April.

*Statistics in Medicine*16 (4): 385β95.

## No comments yet. Why not leave one?