--- title: "km.support" author: "Yuxin Qin , Lawrence Leemis , Heather Sasinowska " date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{km.support} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `km.support` function is part of the [conf](https://CRAN.R-project.org/package=conf) package. The function calculates the the support values for Kaplan and Meier's product–limit estimator[^1]. The Kaplan-Meier product-limit estimator (KMPLE) is used to estimate the survivor function for a data set of positive values in the presence of right censoring, and the support values are all possible values of the KMPLE for a specific sample size. The `km.support` function finds the support values of the KMPLE for a particular sample size $n$ (the number of items on test) using an induction algorithm[^2]. The support values are returned as a list with two components: numerators and denominators. This allows the user to generate exact fractions. [^1]: Kaplan, E. L., and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, 53, 457–481. [^2]: Qin Y., Sasinowska H. D., Leemis L. M. (2023), “The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator,” The American Statistician, 77 (1), 102–110. ## Installation Instructions The `km.support` function is accessible following installation of the `conf` package: ``` install.packages("conf") library(conf) ``` ## Details The KMPLE is a nonparametric estimate of the survival function from a data set of lifetimes that includes right-censored observations and is used in a variety of application areas. For simplicity, we will refer to the object of interest generically as the item and the event of interest as the failure. Let $n$ denote the number of items on test. The KMPLE of the survival function $S(t)$ is given by $$ \hat{S}(t) = \prod\limits_{i:t_i \leq t}\left( 1 - \frac{d_i}{n_i}\right), $$ for $t \ge 0$, where $t_1, \, t_2, \, \ldots, \, t_k$ are the times when at least one failure is observed ($k$ is an integer between 1 and $n$, which is the number of distinct failure times in the data set), $d_1, \, d_2, \, \ldots, \, d_k$ are the number of failures observed at times $t_1, \, t_2, \, \ldots, \, t_k$, and $n_1, \, n_2, \, \ldots, \, n_k$ are the number of items at risk just prior to times $t_1, \, t_2, \, \ldots, \, t_k$. It is common practice to have the KMPLE ''cut off'' after the largest time recorded if it corresponds to a right-censored observation[^3]. The KMPLE drops to zero after the largest time recorded if it is a failure; the KMPLE is undefined, however, after the largest time recorded if it is a right-censored observation. The support values in `km.support` are the calculated from $\hat{S}(t)$ at any $t \ge 0$ for all possible outcomes of an experiment with $n$ items on test. The function has only the sample size $n$ as its argument. [^3]: Kalbfleisch, J. D., and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data (2nd ed.), Hoboken, NJ: Wiley. ## Example To illustrate a simple case, consider the KMPLE for one particular experiment when there are $n = 4$ items on test, failures occur at times $t = 1$ and $t = 3$, and right censorings occur at times $t = 2$ and $t = 4$. In this setting, the KMPLE is \begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{2}\right) = \frac{3}{8} & \qquad 3 \leq t < 4 \\ \text{NA} & \qquad t \geq 4, \end{cases} \end{equation*} \noindent where NA indicates that the KMPLE is undefined. The KMPLE in this experiment has 3 of the 8 support values that are produced by `km.support`, as can be seen in the output below, and NA. The NA's will not be displayed in the output. ```{r, fig.width = 4, fig.height = 4, fig.show = 'hold'} library(conf) # display unsorted numerators and denominators of support values for n = 4 n = 4 s = km.support(n) s # display sorted support values for n = 4 as decimals sort(s$num / s$den) # display sorted support values for n = 4 as exact fractions i <- order(s$num / s$den) m <- length(s$num) f <- "" for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s$den[j], ", ", sep = "") cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = "")) ``` Consider the another KMPLE for a different outcome of the same experiment when there are $n = 4$ items on test. This time we observe 4 failures at times $t = 1,2,3,4$. In this setting, the KMPLE is \begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 2 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) = \frac{1}{2} & \qquad 2 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) \left(1 - \frac{1}{2}\right) = \frac{1}{4} & \qquad 3 \leq t < 4 \\ \text{0} & \qquad t \geq 4, \end{cases} \end{equation*} The KMPLE in this experiment has 5 (3 new ones: 1/2, 1/4, and 0) of the 8 support values that are produced by `km.support`, as can be seen in the output above. By looking at all possible combinations of outcomes, the remaining support values will be found. ## Package Notes The function `km.support` is also called from the functions `km.pmf` and `km.surv`, which are also part of the [conf](https://CRAN.R-project.org/package=conf) package.