--- title: "km.outcomes" author: "Yuxin Qin , Lawrence Leemis , Heather Sasinowska " date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{km.outcomes} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `km.outcomes` function is part of the [conf](https://CRAN.R-project.org/package=conf) package. The Kaplan-Meier product-limit estimator (KMPLE) is used to estimate the survivor function for a data set of positive values in the presence of right censoring[^1]. The `km.outcomes` function generates a matrix with all possible combinations of observed failures and right-censored values and the resulting support values for the Kaplan-Meier product-limit estimator for a sample of size $n$. The function has only the sample size $n$ as its argument. [^1]: Kaplan, E. L., and Meier, P. (1958), "Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, 53, 457–481. ## Installation Instructions The `km.outcomes` function is accessible following installation of the `conf` package: ``` install.packages("conf") library(conf) ``` ## Details The KMPLE is a nonparametric estimate of the survival function from a data set of lifetimes that includes right-censored observations and is used in a variety of application areas. For simplicity, we will refer to the object of interest generically as the item and the event of interest as the failure. Let $n$ denote the number of items on test. For a given $n$, there are $2^{n+1} -1$ different possible outcomes (failure times or censoring times) for observing an experiment at a specific time of interest. For any combination of failure or censored times at a specific time, the KMPLE can be calculated. The KMPLE of the survival function $S(t)$ is given by $$ \hat{S}(t) = \prod\limits_{i:t_i \leq t}\left( 1 - \frac{d_i}{n_i}\right), $$ for $t \ge 0$, where $t_1, \, t_2, \, \ldots, \, t_k$ are the times when at least one failure is observed ($k$ is an integer between 1 and $n$, which is the number of distinct failure times in the data set), $d_1, \, d_2, \, \ldots, \, d_k$ are the number of failures observed at times $t_1, \, t_2, \, \ldots, \, t_k$, and $n_1, \, n_2, \, \ldots, \, n_k$ are the number of items at risk just prior to times $t_1, \, t_2, \, \ldots, \, t_k$. It is common practice to have the KMPLE "cut off" after the largest time recorded if it corresponds to a right-censored observation[^2]. The KMPLE drops to zero after the largest time recorded if it is a failure; the KMPLE is undefined, however, after the largest time recorded if it is a right-censored observation. The support values are calculated for each number of observed events between times 0 and the observation time, listed in column $l$ for each combination of failure times or censoring times up to that time. The columns labeled as $d1, d2, ..., dn$ list a 0 if the event corresponds to a censored observation and a 1 if the event corresponds to a failure. The support values are listed numerically in the $S(t)$ column, and in order to keep the support values as exact fractions, the numerators and denominators are stored separately in the output columns named $num$ and $den$. [^2]: Kalbfleisch, J. D., and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data (2nd ed.), Hoboken, NJ: Wiley. ## Examples To illustrate a simple case, consider the KMPLE for the experiment when there are $n = 4$ items on test. ### Specific Example Let's consider an experiment where failures occur at times $t = 1$ and $t = 3$, and right censorings occur at times $t = 2$ and $t = 4$. In this setting, the KMPLE is \begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{2}\right) = \frac{3}{8} & \qquad 3 \leq t < 4 \\ \text{NA} & \qquad t \geq 4, \end{cases} \end{equation*} \noindent where NA indicates that the KMPLE is undefined[^3]. [^3]: Qin Y., Sasinowska H. D., Leemis L. M. (2023), “The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator,” The American Statistician, 77 (1), 102–110. ```{r, fig.width = 4, fig.height = 4, fig.show = 'hold'} library(conf) # display the outcomes and KMPLE for n = 4 items on test n = 4 km.outcomes(n) ``` If we observe the experiment at time $t_0 = 2.5$, $\hat{S}(t) = 3/4$ and is represented by row 5 where $l = 2$ events have occurred: one failure $d1=1$ and one censored item $d2=0$. Notice that $d3$ and $d4$ are NA since they have not been observed yet. If instead, we choose $t_0 = 4.5$, $\hat{S}(t) = \text{NA}$ and is represented by row 21 where $l = 4$ events have occurred: first was a failure $d1=1$, second was a censored item $d2=0$, third was a failure $d3=1$, and the fourth and last item was a censored $d4 = 0$. ### General Example Looking at the above output from the Specific Example, the first row corresponds to choosing a time value $t_0$ that satisfies $0 < t_0 < 1$, which is associated with an observation time prior to the occurrence of an observed failure or censoring time. That is, $l = 0$ events have occurred, and -1's are listed to represent these initialized values. All $n$ items are on test and $\hat{S}(t)= 1$. For the second row, $l=1$ event has occurred and that event is a censored item $d1 = 0$. We have not observed any of the other items so they are listed as NA's. The third row shows the case when only $l=1$ event has occurred and that event is a failure; that is, $d1 = 1$. Again, we have not observed any of the other items so they are listed as NA's. ## Package Notes For more information on how the $\hat{S}(t)$ values are generated, please refer to the vignette titled *km.support* which is available via the link on the [conf](https://CRAN.R-project.org/package=conf) package webpage. In addition, the functions `km.pmf` and `km.surv`, which are also part of the [conf](https://CRAN.R-project.org/package=conf) package, have dependencies on `km.outcomes`.