## Estimating survival rates

Here’s the set-up: looking at a data set of individuals’ lifespans you would like to infer the distributions—Analysing when people die, or things break etc. The statistical problem of estimating how long people’s lives are is complicated somewhat by the particular structure of the data — loosely, “every person dies at most one time”, and there are certain characteristic difficulties that arise, such as right-censorship. (If you are looking at data from an experiment and not all your subjects have died yet, they presumably die later, but you don’t know when.)

Handily, the tools one invents to solve this kind of problem end up being useful to solve other problems, such as point process inference.

So let’s say you have a a random variable \(X\) of positive support according to which the lifetime of your people (components, machines, whatever) are distributed, which possesses a pdf \(f_X(t)\) and cdf \(F_X(T)\).

We define several useful functions:

- The survival function (which is also the right tail CDF)
- \(S(t):=1-F(t)\)
- the hazard function
- \(\lambda(t):=f(t)/S(t)\)
- the cumulative hazard function
- \(\Lambda(t) :=\int_0^t\lambda(s) \textrm{d} s.\)

Why? Because it happens to come out nicely if we do
that, and these functions acquire intuitive interpretations once we squint at
them a bit.
The survival function is the probability of an individual surviving to time
\(t\) etc.
The hazard function will turn out to be the rate of deaths at time \(t\) *given that one has not yet occurred.*

Using the chain rule we can find the following useful relation:

\[S(t)=\exp[-\Lambda (t)]={\frac {f(t)}{\lambda (t)}}\]

The hazard function can be pretty much any non-negative function of non-negative support (or more generally, a Schwartz distribution, but let’s ignore that possibility for the moment.)

### Life table method

Over intervals of time \([t,u]\) we define the cumulative hazard increment

\[ H(t,u) :=\int_t^u h (s) \textrm{d} s = H(u)-H(t) \]

and the survival increment

\[ \chi(t,u) :=\frac{\chi(u)}{\chi(t)} \]

The following relations are useful

\[ \chi(t)=\exp[-H (t)]={\frac {f(t)}{h (t)}}. \]

and

\[ \chi(t,u)=\frac{\exp[-H (u)]}{\exp[-H (t)]}=\exp[H (t)-H (u)]=\exp[-H (t,u)] \]

and so

\[-\log\chi(t,u)=H (t,u).\]

We estimate hazard via the *life table* method. Given a time interval
\([t_{i}, t_{i+1})\) and survival counts \(N(t_{i})\) and \(N(t_{i+1})\) at,
respectively, the beginning and end of that interval, (assuming no
immigration) the life table estimate of a survival increment is

\[\hat{\chi}(t_i, t_{i+1}):= \frac{N(t_{i+1})}{N(t_{i})}\]

Plugging this in, we obtain cumulative hazard increment estimates

\[\begin{aligned} \hat{H} (t_i, t_{i+1})&=-\log \hat{\chi}(t_i, t_{i+1})\\ &=\log \frac{ N(t_{i}) }{ N(t_{i+1}) } \end{aligned}\]

From this we construct further point estimates of \(H\) at \(t\in[0, t_1, t_2,\dots]\) as

\[\hat{H} (t)=\sum_{t_i\leq t}\hat{H}(t_{i},t_{i+1})\] By introducing assumptions on the functional form, can estimate the entire hazard function. For example, we can take \(h (t)\) to be piecewise constant, so that

\[\begin{aligned} h (t)=\sum_i\mathbb{I}\{t_{i}<t<t_{i+1}\} h_i \end{aligned}\]

This corresponds to the assumption that \(H\) is piecewise linear and continuous; we are constructing a piecewise linear interpolant. Thus, for \(t\in(t_i,t_{i+1}],\) we such an interpolant \(\hat{H}\) for \(t\in[0,t_M]\) by a first order polynomial spline with knots \(0,t_1,t_2,\dots, t_M\) and values \(\hat{H}(0), \hat{H}(t_1), \hat{H}(t_2) \dots,\hat{H}(t_M).\)

### Nelson-Aalen estimates

a.k.a. Empirical Cumulative Hazard Function estimator.

The original Aalen paper on this is notoriously beautiful because of clever construction of a life point process and associated martingale. Clear and worth reading. Spoiler, despite the elegant derivation, the actual estimator is something a high-school student could discover by guessing.

TBC.

## Other reliability stuff

Reliawiki has handy stuff, e.g. comprehensive docs on the Weibull law. It’s in support of some software package their are trying to sell, I think?

Aalen, Odd. 1978. “Nonparametric Inference for a Family of Counting Processes.” *The Annals of Statistics* 6 (4): 701–26. https://doi.org/10.1214/aos/1176344247.

Aalen, Odd O., Ørnulf Borgan, and S. Gjessing. 2008. *Survival and Event History Analysis: A Process Point of View*. Statistics for Biology and Health. New York, NY: Springer.

Andersen, Per Kragh, Ornulf Borgan, Richard D. Gill, and Niels Keiding. 1997. *Statistical Models Based on Counting Processes*. Corr. 2. print. Springer Series in Statistics. New York, NY: Springer.

Andersen, Per Kragh, and Niels Keiding. 2014. “Survival Analysis, Overview.” In *Wiley StatsRef: Statistics Reference Online*. American Cancer Society. https://doi.org/10.1002/9781118445112.stat06060.

Andersen, Per K., and Michael Vaeth. 2015. “Survival Analysis.” In *Wiley StatsRef: Statistics Reference Online*, 1–14. American Cancer Society. https://doi.org/10.1002/9781118445112.stat02177.pub2.

“Appendix 1: The Delta Method.” 2011. In *Applied Survival Analysis*, 355–58. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470258019.app1.

Cox, D. R. 1972. “Regression Models and Life-Tables.” *Journal of the Royal Statistical Society: Series B (Methodological)* 34 (2): 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.

Cox, D. R, and D. O Oakes. 2018. *Analysis of Survival Data.* https://ebookcentral.proquest.com/lib/qut/detail.action?docID=5477183.

Cutler, S. J., and F. Ederer. 1958. “Maximum Utilization of the Life Table Method in Analyzing Survival.” *Journal of Chronic Diseases* 8 (6, 6): 699–712.

Deddens, James A., and Gary G. Koch. 2014. “Survival Analysis, Grouped Data in.” In *Wiley StatsRef: Statistics Reference Online*. American Cancer Society. https://doi.org/10.1002/9781118445112.stat02178.

Efron, Bradley. 1988. “Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve.” *Journal of the American Statistical Association* 83 (402): 414–25. https://doi.org/10.2307/2288857.

Fink, Scott A, and Robert S. Brown. 2006. “Survival Analysis.” *Gastroenterology & Hepatology* 2 (5): 380–83. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5338193/.

Hjort, Nils Lid. 1992. “On Inference in Parametric Survival Data Models.” *International Statistical Review / Revue Internationale de Statistique* 60 (3): 355–87. https://doi.org/10.2307/1403683.

Hjort, Nils Lid, Mike West, and Sue Leurgans. 1992. “Semiparametric Estimation of Parametric Hazard Rates.” In *Survival Analysis: State of the Art*, edited by John P. Klein and Prem K. Goel, 211–36. Nato Science 211. Springer Netherlands. http://link.springer.com/chapter/10.1007/978-94-015-7983-4_13.

Hosmer, David W., and Stanley Lemeshow. 1999. *Applied Survival Analysis: Regression Modeling of Time to Event Data*. Wiley Series in Probability and Statistics. New York: Wiley.

Hosmer, David W., Stanley Lemeshow, and Susanne May. 2008. “Descriptive Methods for Survival Data.” In *Applied Survival Analysis: Regression Modeling of Time-to-Event Data*. Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc. https://doi.org/10.1002/9780470258019.

Klein, John P. 2014. “Survival Distributions and Their Characteristics.” In *Wiley StatsRef: Statistics Reference Online*. American Cancer Society. https://doi.org/10.1002/9781118445112.stat06062.

Kleinbaum, David G. 2010. *Survival Analysis: A Self-Learning Text*. Statistics for Biology and Health 1.0. Springer.

Laird, Nan, and Donald Olivier. 1981. “Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques.” *Journal of the American Statistical Association* 76 (374): 231–40. https://doi.org/10.1080/01621459.1981.10477634.

LU, W., Y. GOLDBERG, and J. P. FINE. 2012. “On the Robustness of the Adaptive Lasso to Model Misspecification.” *Biometrika* 99 (3): 717–31. https://doi.org/10.1093/biomet/ass027.

Nelson, Wayne. 1969. “Hazard Plotting for Incomplete Failure Data.” *Journal of Quality Technology* 1 (1): 27–52. https://doi.org/10.1080/00224065.1969.11980344.

———. 2000. “Theory and Applications of Hazard Plotting for Censored Failure Data.” *Technometrics* 42 (1): 12–25. https://doi.org/10.1080/00401706.2000.10485975.

“Parametric Regression Models.” 2011. In *Applied Survival Analysis*, 244–85. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470258019.ch8.

Peterson, Arthur V. 1977. “Expressing the Kaplan-Meier Estimator as a Function of Empirical Subsurvival Functions.” *Journal of the American Statistical Association* 72 (360): 854–58. https://doi.org/10.2307/2286474.

Schoenberg, Frederic Paik. 2003. “Multidimensional Residual Analysis of Point Process Models for Earthquake Occurrences.” *Journal of the American Statistical Association* 98 (464): 789–95. https://doi.org/10.1198/016214503000000710.

Shaked, Moshe, and J. George Shanthikumar. 1988. “On the First-Passage Times of Pure Jump Processes.” *Journal of Applied Probability* 25 (3): 501–9. https://doi.org/10.2307/3213979.

Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011. “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” *Journal of Statistical Software* 39 (5). http://www.jstatsoft.org/v39/i05/paper.

Sy, Judy P., and Jeremy M. G. Taylor. 2000. “Estimation in a Cox Proportional Hazards Cure Model.” *Biometrics* 56 (1): 227–36. https://doi.org/10.1111/j.0006-341X.2000.00227.x.

Taleb, Nassim Nicholas. 2020. “On the Statistical Differences Between Binary Forecasts and Real-World Payoffs.” *International Journal of Forecasting*, April. https://doi.org/10.1016/j.ijforecast.2019.12.004.

Tibshirani, Robert. 1997. “The Lasso Method for Variable Selection in the Cox Model.” *Statistics in Medicine* 16 (4): 385–95. https://doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3.