Title: | Goodness-of-Fit Methods for Complete and Right-Censored Data |
---|---|
Description: | Graphical tools and goodness-of-fit tests for complete and right-censored data: 1. Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests, which utilize the empirical distribution function for complete data and are extended to handle right-censored data. 2. Generalized chi-squared-type test, which is based on the squared differences between observed and expected counts using random cells with right-censored data. 3. Graphical tools, such as probability and cumulative hazard plots, to help guide decisions about the most appropriate parametric model for the data. |
Authors: | Klaus Langohr [aut, cre], Mireia Besalú [aut], Matilde Francisco [aut], Arnau Garcia [aut], Guadalupe Gómez [aut] |
Maintainer: | Klaus Langohr <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.1 |
Built: | 2024-12-11 06:46:07 UTC |
Source: | CRAN |
This package provides both graphical tools and goodness-of-fit tests for analyzing complete and right-censored data. It includes:
Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests, which utilize the empirical distribution function for complete data and are extended to handle right-censored data.
Generalized chi-squared-type test, which is based on the squared differences between observed and expected counts using random cells with right-censored data.
Graphical tools, such as probability and cumulative hazard plots, to help guide decisions about the most appropriate parametric model for the data.
The GofCens
package can be used to assess the goodness of fit for the following eight distributions. The list below displays the parameterizations of their survival functions.
Exponential Distribution [Exp]
Weibull Distribution [Wei()]
Gumbel Distribution [Gum()]
Log-Logistic Distribution [LLogis()]
Logistic Distribution [Logis()]
Log-Normal Distribution [LN()]
Normal Distribution [N()]
4-Param. Beta Distribution [Beta()]
The parameters of the theoretical distribution can be set manually using the params0
argument in each function.
In this case, the correspondences are as follows: represents the
shape
, the
shape2
,
the
location
, and the
scale
parameter.
Package: | GofCens |
Type: | Package |
Version: | 1.2.1 |
Date: | 2024-11-9 |
License: | GPL (>= 2) |
Klaus Langohr, Mireia Besalú, Matilde Francisco, Arnau Garcia, Guadalupe Gómez
Maintainer: Klaus Langohr <[email protected]>
Function ADcens
computes the Anderson-Darling test statistic and p-value for complete
and right-censored data against eight possible distributions using bootstrapping.
## Default S3 method: ADcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' ADcens(formula, data, ...)
## Default S3 method: ADcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' ADcens(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
BS |
Number of bootstrap samples. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
tol |
Precision of survival times. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
... |
Additional arguments. |
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
To avoid long computation times due to bootstrapping, an alternative
with complete data is the function ad.test
of the goftest package.
The precision of the survival times is important mainly in the data generation step of the bootstrap samples.
ADcens
returns an object of class "ADcens"
.
An object of class "ADcens"
is a list containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the Anderson-Darling statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
If the amount of data is large, the execution time of the
function can be elevated. The parameter BS
can
limit the number of random censored samples generated and
reduce the execution time.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
G. Marsaglia and J. Marsaglia. Evaluating the Aderson-Darling Distrinution. In: Journal os Statistical Software, Articles, 9 (2) (2004), 1-5.
Function ad.test
(Package goftest) for complete data and
function gofcens for statistics and p-value of the Kolmogorov-Smirnov,
Cramér von-Mises and Anderson-Darling together for right-censored data.
# Complete data set.seed(123) ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 199) print(ADcens(times = rweibull(100, 12, scale = 4), distr = "exponential", BS = 199), outp = "table", print.BIC = FALSE, print.infoBoot = TRUE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] ADcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
# Complete data set.seed(123) ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 199) print(ADcens(times = rweibull(100, 12, scale = 4), distr = "exponential", BS = 199), outp = "table", print.BIC = FALSE, print.infoBoot = TRUE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] ADcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
Function chisqcens
computes the general chi-squared test statistic for
right-censored data introduced by Kim (1993) and the respective p-value
using bootstrapping.
## Default S3 method: chisqcens(times, cens = rep(1, length(times)), M, distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits=c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' chisqcens(formula, data, ...)
## Default S3 method: chisqcens(times, cens = rep(1, length(times)), M, distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits=c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' chisqcens(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
M |
Number indicating the number of cells that will be considered. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
BS |
Number of bootstrap samples. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
tol |
Precision of survival times. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
... |
Additional arguments. |
The function implements the test introduced by Kim (1993) and returns the value of the test statistic.
The cell boundaries of the test are obtained via the quantiles, which
are based on the Kaplan-Meier estimate of the distribution function.
In the presence of right-censored data, it is possible that not all
quantiles are estimated, and in this case, the value of M
provided by the user is reduced.
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
The precision of the survival times is important mainly in the data generation step of the bootstrap samples.
chisqcens
returns an object of class "chisqcens"
.
An object of class "chisqcens"
is a list containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the test statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
CellNumber |
Vector with two values: the original cell number introduced by the user and the final cell number used. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
J. H. Kim. Chi-Square Goodness-of-Fit Tests for Randomly Censored Data. In: The Annals of Statistics, 21 (3) (1993), 1621-1639.
# Complete data set.seed(123) chisqcens(times = rgumbel(100, 12, scale = 4), M = 8, distr = "gumbel", BS = 99) print(chisqcens(times = rlogis(100, 20, scale = 3), M = 8, distr = "loglogistic", BS = 105), print.AIC = FALSE, print.infoBoot = TRUE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] chisqcens(Surv(time, status) ~ 1, colonsamp, M = 6, distr = "normal") ## End(Not run)
# Complete data set.seed(123) chisqcens(times = rgumbel(100, 12, scale = 4), M = 8, distr = "gumbel", BS = 99) print(chisqcens(times = rlogis(100, 20, scale = 3), M = 8, distr = "loglogistic", BS = 105), print.AIC = FALSE, print.infoBoot = TRUE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] chisqcens(Surv(time, status) ~ 1, colonsamp, M = 6, distr = "normal") ## End(Not run)
Function cumhazPlot
uses the cumulative hazard plot to check if a certain distribution
is an appropiate choice for the data.
## Default S3 method: cumhazPlot(times, cens = rep(1, length(times)), distr = "all6", colour = 1, betaLimits = c(0, 1), igumb = c(10, 10), ggp = FALSE, m = NULL, prnt = TRUE, degs = 3, print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' cumhazPlot(formula, data, ...)
## Default S3 method: cumhazPlot(times, cens = rep(1, length(times)), distr = "all6", colour = 1, betaLimits = c(0, 1), igumb = c(10, 10), ggp = FALSE, m = NULL, prnt = TRUE, degs = 3, print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' cumhazPlot(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the names of the distributions to be studied.
The possible distributions are the exponential ( |
colour |
Colour of the points. Default colour: black. |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
ggp |
Logical to use or not the ggplot2 package to draw the plots.
Default is |
m |
Optional layout for the plots to be displayed. |
prnt |
Logical to indicate if the maximum likelihood estimates of the
parameters of all distributions considered should be printed.
Default is |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
... |
Optional arguments for function |
The cumulative hazard plot is based on transforming the cumulative
hazard function in such a way that it becomes linear in
or
. This transformation is specific for each distribution.
The function uses the data to compute the Nelson-Aalen estimator of the
cumulative hazard function,
, and the
maximum likelihood estimators of the parameters of the theoretical
distribution under study. If the distribution fits the data, the plot is
expected to be a straight line.
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
If prnt = TRUE
, the following output is returned:
Distribution |
Distribution under study. |
Estimates |
A list with the maximum likelihood estimates of the parameters of all distributions considered. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
# Complete data and default distributions set.seed(123) x <- rlogis(1000, 50, 5) cumhazPlot(x, lwd = 2) # Censored data comparing three distributions data(nba) cumhazPlot(Surv(survtime, cens) ~ 1, nba, distr = c("expo", "normal", "gumbel"))
# Complete data and default distributions set.seed(123) x <- rlogis(1000, 50, 5) cumhazPlot(x, lwd = 2) # Censored data comparing three distributions data(nba) cumhazPlot(Surv(survtime, cens) ~ 1, nba, distr = c("expo", "normal", "gumbel"))
Function CvMcens
computes the Cramér-von Mises statistic and p-value for complete
and right-censored data against eight possible distributions.
## Default S3 method: CvMcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' CvMcens(formula, data, ...)
## Default S3 method: CvMcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' CvMcens(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
BS |
Number of bootstrap samples. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
tol |
Precision of survival times. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
... |
Additional arguments. |
Koziol and Green (1976) proposed a Cramér-von Mises statistic for randomly censored data. This function reproduces this test for a given survival data and a theorical distribution. In presence of ties, different authors provide slightly different definitions of the product-limit estimator, what might provide different values of the test statistic.
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
To avoid long computation times due to bootstrapping, an alternative
with complete data is the function cvm.test
of the goftest package.
The precision of the survival times is important mainly in the data generation step of the bootstrap samples.
CvMcens
returns an object of class "CvMcens"
.
An object of class "CvMcens"
is a list containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the Cramér-von Mises statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
If the amount of data is large, the execution time of the
function can be elevated. The parameter BS
can
limit the number of random censored samples generated and
reduce the execution time.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
J. A. Koziol and S. B. Green. A Cramér-von Mises statistic for randomly censored data. In: Biometrika, 63 (3) (1976), 465-474.
A. N. Pettitt and M. A. Stephens. Modified Cramér-von Mises statistics for censored data. In: Biometrika, 63 (2) (1976), 291-298.
Function cvm.test
(Package goftest) for complete data and
gofcens for statistics and p-value of Kolmogorov-Smirnov,
Cramér von-Mises and Anderson-Darling together for right-censored data.
# Complete data set.seed(123) CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 199) print(CvMcens(times = rweibull(100, 12, scale = 4), distr = "normal", BS = 99), degs = 4, print.AIC = FALSE, print.BIC = FALSE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] CvMcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
# Complete data set.seed(123) CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 199) print(CvMcens(times = rweibull(100, 12, scale = 4), distr = "normal", BS = 99), degs = 4, print.AIC = FALSE, print.BIC = FALSE) ## Not run: # Censored data set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] CvMcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
Function gofcens
computes the Kolmogorov-Smirnov, Cramér-von Mises, and
Anderson-Darling statistics ans p-values for complete and right-censored data against
eight possible distributions using bootstrapping.
## Default S3 method: gofcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' gofcens(formula, data, ...)
## Default S3 method: gofcens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, ...) ## S3 method for class 'formula' gofcens(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
BS |
Number of bootstrap samples. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
tol |
Precision of survival times. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
... |
Additional arguments. |
All p-values are calculated via bootstrapping methods. For the three hypothesis tests, the same data generated with the bootstrapping method are used.
The precision of the survival times is important mainly in the data generation step of the bootstrap samples.
When dealing with complete data, we recommend the use of functions
ks.test
of the stats package and
cvm.test
and ad.test
of the goftest package.
gofcens
returns an object of class "gofcens"
.
An object of class "gofcens"
is a list containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the values of the Kolmogovor-Smirnov ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
If the amount of data is large, the execution time of the
function can be elevated. The parameter BS
can
limit the number of random censored samples generated and
reduce the execution time.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
J. A. Koziol and S. B. Green. A Cramér-von Mises statistic for randomly censored data. In: Biometrika, 63 (3) (1976), 465-474.
A. N. Pettitt and M. A. Stephens. Modified Cramér-von Mises statistics for censored data. In: Biometrika, 63 (2) (1976), 291-298.
ks.test (Package stats
), cvm.test
(Package goftest
), and ad.test
(Package goftest
) for complete data, and KScens for the
Kolmogorov-Smirnov test for right-censored data, which returns the p-value.
## Not run: # Complete data set.seed(123) gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 499) print(gofcens(times = rweibull(100, 12, scale = 4), distr = "exponential"), outp = "table", print.infoBoot = TRUE) # Censored data data(colon) set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] gofcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
## Not run: # Complete data set.seed(123) gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 499) print(gofcens(times = rweibull(100, 12, scale = 4), distr = "exponential"), outp = "table", print.infoBoot = TRUE) # Censored data data(colon) set.seed(123) colonsamp <- colon[sample(nrow(colon), 300), ] gofcens(Surv(time, status) ~ 1, colonsamp, distr = "normal") ## End(Not run)
The kmPlot
function generates a plot that combines a Kaplan-Meier
survival curve with a parametric survival curve in the same graph.
This is useful for comparing non-parametric survival estimates to a
fitted parametric survival model.
## Default S3 method: kmPlot(times, cens = rep(1, length(times)), distr = "all6", colour = c("black", "blue", "cornflowerblue"), betaLimits = c(0, 1), igumb = c(10, 10), ggp = FALSE, m = NULL, prnt = TRUE, degs = 3, print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' kmPlot(formula, data, ...)
## Default S3 method: kmPlot(times, cens = rep(1, length(times)), distr = "all6", colour = c("black", "blue", "cornflowerblue"), betaLimits = c(0, 1), igumb = c(10, 10), ggp = FALSE, m = NULL, prnt = TRUE, degs = 3, print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' kmPlot(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are
the Weibull ( |
colour |
Vector with three components indicating the colours of the displayed plots. The first element is for the survival curve, the second for the Kaplan-Meier curve, and the last one for the confidence intervals. |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
ggp |
Logical to use or not the ggplot2 package to draw the plots.
Default is |
m |
Optional layout for the plots to be displayed. |
prnt |
Logical to indicate if the maximum likelihood estimates of the
parameters should be printed. Default is |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
... |
Optional arguments for function |
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
If prnt = TRUE
, the following output is returned:
Distribution |
Distribution under study. |
Estimates |
A list with the maximum likelihood estimates of the parameters of all distributions considered. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
Peterson Jr, Arthur V. Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions. In: Journal of the American Statistical Association 72.360a (1977): 854-858.
# Plots for complete data and default distributions set.seed(123) x <- rexp(1000, 0.1) kmPlot(x) # Plots for censored data using ggplot2 kmPlot(Surv(time, status) ~ 1, colon, distr= "lognormal", ggp = TRUE) # Plots for censored data from three distributions data(nba) kmPlot(Surv(survtime, cens) ~ 1, nba, distr = c("normal", "weibull", "lognormal"), prnt = FALSE)
# Plots for complete data and default distributions set.seed(123) x <- rexp(1000, 0.1) kmPlot(x) # Plots for censored data using ggplot2 kmPlot(Surv(time, status) ~ 1, colon, distr= "lognormal", ggp = TRUE) # Plots for censored data from three distributions data(nba) kmPlot(Surv(survtime, cens) ~ 1, nba, distr = c("normal", "weibull", "lognormal"), prnt = FALSE)
Function KScens
computes the Kolmogorov-Smirnov statistic and p-value for complete
and right-censored data against eight possible distributions using either bootstrapping or
a modified test.
## Default S3 method: KScens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, boot = TRUE, ...) ## S3 method for class 'formula' KScens(formula, data, ...)
## Default S3 method: KScens(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), betaLimits = c(0, 1), igumb = c(10, 10), BS = 999, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), tol = 1e-04, boot = TRUE, ...) ## S3 method for class 'formula' KScens(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
BS |
Number of bootstrap samples. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
tol |
Precision of survival times. |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
boot |
Logical to indicate if the p-value is computed using bootstrapping or using the
the modified Kolmogorov-Smirnov test (see details). Default is |
... |
Additional arguments. |
By default, the p-value is computed via bootstrapping methods.
The parameter estimation is acomplished with the fitdistcens
function of the fitdistrplus package.
To avoid long computation times due to bootstrapping, an alternative
with complete data is the function ks.test
of the stats package.
The precision of the survival times is important mainly in the data generation step of the bootstrap samples.
If boot = FALSE
a modified test is used to compute the p-value.
Fleming et al. (1980) proposed a modified Kolmogorov-Smirnov test to use
with right-censored data. This function reproduces this test for a
given survival data and a theorical distribution. The approximation for
the p-value is acceptable when it is smaller than 0.8 and excellent when
it is smaller than 0.2. The output of the function follows the notation
of Fleming et al. (1980).
In presence of ties, different authors provide slightly different
definitions of , with which other values of
the test statistic might be obtained.
KScens
returns an object of class "KScens"
.
An object of class "KScens"
is a list containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the modified Kolmogorov-Smirnov statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. If the modified test is used, a 0 is returned. |
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
T. R. Fleming et al. Modified Kolmogorov-Smirnov test procedure with application to arbitrarily right-censored data. In: Biometrics 36 (1980), 607-625.
Function ks.test (Package stats) for complete data and gofcens for statistics and p-value of Kolmogorov-Smirnov, Cramér von-Mises and Anderson-Darling together for right-censored data.
# Censored data with bootstrapping KScens(Surv(time, status) ~ 1, colon, distr = "norm", BS = 99) # Censored data using the modified test KScens(Surv(time, status) ~ 1, colon, distr = "norm", boot = FALSE) data(nba) print(KScens(Surv(survtime, cens) ~ 1, nba, "logis", boot = FALSE), degs = 2) KScens(Surv(survtime, cens) ~ 1, nba, "beta", betaLimits = c(0, 80), boot = FALSE)
# Censored data with bootstrapping KScens(Surv(time, status) ~ 1, colon, distr = "norm", BS = 99) # Censored data using the modified test KScens(Surv(time, status) ~ 1, colon, distr = "norm", boot = FALSE) data(nba) print(KScens(Surv(survtime, cens) ~ 1, nba, "logis", boot = FALSE), degs = 2) KScens(Surv(survtime, cens) ~ 1, nba, "beta", betaLimits = c(0, 80), boot = FALSE)
Survival times of former NBA players after their NBA career.
data("nba")
data("nba")
A data frame with 3962 observations on the following 3 variables.
id
Player ID
survtime
Time (in years) from end of NBA career until either death or July 31, 2019.
cens
Death indicator (1, exact survival time; 0, right-censored survival time).
The survival times of former NBA players were analyzed by Martínez et al. (2022).
J. A. Martínez, K. Langohr, J. Felipo, L. Consuegra and M. Casals. Data set on mortality of national basketball association (NBA) players. In: Data in Brief, 45 (2022).
data(nba) cumhazPlot(Surv(survtime, cens) ~ 1, nba)
data(nba) cumhazPlot(Surv(survtime, cens) ~ 1, nba)
ADcens
object.Printing method for ADcens
object.
## S3 method for class 'ADcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
## S3 method for class 'ADcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
x |
An object of class |
prnt |
Logical to indicate if the estimations of the Anderson-Darling statistic and p-value should be printed. Default is |
outp |
Indicator of how the output will be displayed. The possible formats are |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
print.infoBoot |
Logical to indicate if the number of bootstrap samples used should be printed. Default is |
... |
Additional arguments. |
If prnt = TRUE
, a list or table (if outp = "table"
) containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the Anderson-Darling statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
The list is also returned invisibly.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
# List output set.seed(123) ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
# List output set.seed(123) ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(ADcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
chisqcens
object.Printing method for chisqcens
object.
## S3 method for class 'chisqcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
## S3 method for class 'chisqcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
x |
An object of class |
prnt |
Logical to indicate if the estimations of the chi-squared statistic and p-value should be printed. Default is |
outp |
Indicator of how the output will be displayed. The possible formats are |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
print.infoBoot |
Logical to indicate if the number of bootstrap samples used should be printed. Default is |
... |
Additional arguments. |
If prnt = TRUE
, a list or table (if outp = "table"
) containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the test statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
CellNumber |
Vector with two values: the original cell number introduced by the user and the final cell number used. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
The list is also returned invisibly.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
# List output set.seed(123) chisqcens(times = rweibull(100, 12, scale = 4), M = 8, distr = "weibull", BS = 149) # Table output set.seed(123) print(chisqcens(times = rweibull(100, 12, scale = 4), M = 8, distr = "weibull", BS = 99), outp = "table")
# List output set.seed(123) chisqcens(times = rweibull(100, 12, scale = 4), M = 8, distr = "weibull", BS = 149) # Table output set.seed(123) print(chisqcens(times = rweibull(100, 12, scale = 4), M = 8, distr = "weibull", BS = 99), outp = "table")
CvMcens
object.Printing method for CvMcens
object.
## S3 method for class 'CvMcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
## S3 method for class 'CvMcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
x |
An object of class |
prnt |
Logical to indicate if the estimations of the Cramér-von Mises statistic and p-value should be printed. Default is |
outp |
Indicator of how the output will be displayed. The possible formats are |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
print.infoBoot |
Logical to indicate if the number of bootstrap samples used should be printed. Default is |
... |
Additional arguments. |
If prnt = TRUE
, a list or table (if outp = "table"
) containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the Cramér-von Mises statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
The list is also returned invisibly.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
# List output set.seed(123) CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
# List output set.seed(123) CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(CvMcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
gofcens
object.Printing method for gofcens
object.
## S3 method for class 'gofcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
## S3 method for class 'gofcens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
x |
An object of class |
prnt |
Logical to indicate if the values of the Kolmogovor-Smirnov, Cramér-von Mises,
and Anderson-Darling test statistics along with the p-values should be printed.
Default is |
outp |
Indicator of how the output will be displayed. The possible formats are |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
print.infoBoot |
Logical to indicate if the number of bootstrap samples used should be printed. Default is |
... |
Additional arguments. |
If prnt = TRUE
, a list or table (if outp = "table"
) containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the values of the Kolmogovor-Smirnov ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. |
The list is also returned invisibly.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
## Not run: # List output set.seed(123) gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149), outp = "table") ## End(Not run)
## Not run: # List output set.seed(123) gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149) # Table output set.seed(123) print(gofcens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 149), outp = "table") ## End(Not run)
KScens
object.Printing method for KScens
object.
## S3 method for class 'KScens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
## S3 method for class 'KScens' print(x, prnt = TRUE, outp = c("list", "table"), degs = 3, print.AIC = TRUE, print.BIC = TRUE, print.infoBoot = FALSE, ...)
x |
An object of class |
prnt |
Logical to indicate if the estimations of the Kolmogorov-Smirnov statistic and p-value should be printed. Default is |
outp |
Indicator of how the output will be displayed. The possible formats are |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
print.infoBoot |
Logical to indicate if the number of bootstrap samples used should be printed. Default is |
... |
Additional arguments. |
If prnt = TRUE
, a list or table (if outp = "table"
) containing the following components:
Distribution |
Null distribution. |
Hypothesis |
Parameters under the null hypothesis (if |
Test |
Vector containing the value of the modified Kolmogorov-Smirnov statistic ( |
Estimates |
Vector with the maximum likelihood estimates of the parameters of the distribution under study. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
BS |
The number of bootstrap samples used. If the modified test is used, a 0 is returned. |
The list is also returned invisibly.
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
# List output set.seed(123) KScens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99) # Table output set.seed(123) print(KScens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
# List output set.seed(123) KScens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99) # Table output set.seed(123) print(KScens(times = rweibull(100, 12, scale = 4), distr = "weibull", BS = 99), outp = "table")
probPlot
provides four types of probability plots: P-P plot, Q-Q plot, Stabilised probability plot, and Empirically Rescaled plot to check if a certain distribution is an appropiate choice for the data.
## Default S3 method: probPlot(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), plots = c("PP", "QQ", "SP", "ER"), colour = c("green4", "deepskyblue4", "yellow3", "mediumvioletred"), mtitle = TRUE, ggp = FALSE, m = NULL, betaLimits = c(0, 1), igumb = c(10, 10), prnt = TRUE, degs = 3, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' probPlot(formula, data, ...)
## Default S3 method: probPlot(times, cens = rep(1, length(times)), distr = c("exponential", "gumbel", "weibull", "normal", "lognormal", "logistic", "loglogistic", "beta"), plots = c("PP", "QQ", "SP", "ER"), colour = c("green4", "deepskyblue4", "yellow3", "mediumvioletred"), mtitle = TRUE, ggp = FALSE, m = NULL, betaLimits = c(0, 1), igumb = c(10, 10), prnt = TRUE, degs = 3, params0 = list(shape = NULL, shape2 = NULL, location = NULL, scale = NULL), print.AIC = TRUE, print.BIC = TRUE, ...) ## S3 method for class 'formula' probPlot(formula, data, ...)
times |
Numeric vector of times until the event of interest. |
cens |
Status indicator (1, exact time; 0, right-censored time). If not provided, all times are assumed to be exact. |
distr |
A string specifying the name of the distribution to be studied.
The possible distributions are the exponential ( |
plots |
Vector stating the plots to be displayed. Possible choices are
the P-P plot ( |
colour |
Vector indicating the colours of the displayed plots. The vector will be recycled if its length is smaller than the number of plots to be displayed. |
mtitle |
Logical to add or not the title "Probability plots for a |
ggp |
Logical to use or not the ggplot2 package to draw the plots.
Default is |
m |
Optional layout for the plots to be displayed. |
betaLimits |
Two-components vector with the lower and upper bounds of the Beta distribution. This argument is only required, if the beta distribution is considered. |
igumb |
Two-components vector with the initial values for the estimation of the Gumbel distribution parameters. |
prnt |
Logical to indicate if the maximum likelihood estimates of the
parameters should be printed. Default is |
degs |
Integer indicating the number of decimal places of the numeric results of the output. |
params0 |
List specifying the parameters of the theoretical distribution.
By default, parameters are set to |
formula |
A formula with a numeric vector as response (which assumes no censoring) or |
data |
Data frame for variables in |
print.AIC |
Logical to indicate if the AIC of the model should be printed. Default is |
print.BIC |
Logical to indicate if the BIC of the model should be printed. Default is |
... |
Optional arguments for function |
By default, function probPlot
draws four plots: P-P plot,
SP plot, Q-Q plot, and EP plot. Following, a description is given for
each plot.
The Probability-Probability plot (P-P plot) depicts the empirical
distribution, , which is obtained with the Kaplan-Meier
estimator if data are right-censored, versus the theoretical cumulative
distribution function (cdf),
. If the data come
from the chosen distribution, the points of the resulting graph are
expected to lie on the identity line.
The Stabilised Probability plot (SP plot), proposed by Michael (1983),
is a transformation of the P-P plot. It stabilises the variance of the
plotted points. If and the parameters of
are known,
corresponds to the cdf of a uniform order statistic,
and the arcsin transformation stabilises its variance. If the data come
from distribution
, the SP plot will resemble the identity line.
The Quartile-Quartile plot (Q-Q plot) is similar to the P-P plot,
but it represents the sample quantiles versus the theoretical ones,
that is, it plots versus
.
Hence, if
fits the data well, the resulting plot will resemble
the identity line.
A drawback of the Q-Q plot is that the plotted points are not evenly spread.
Waller and Turnbull (1992) proposed the Empirically Rescaled plot
(EP plot), which plots against
, where
is the empirical cdf of the points corresponding
to the uncensored observations. Again, if
fits the
data well, the ER plot will resemble the identity line.
By default, all four probability plots are drawn and the maximum
likelihood estimates of the parameters of the chosen parametric model
are returned. The parameter estimation is acomplished with the
fitdistcens
function of the fitdistrplus package.
If prnt = TRUE
, the following output is returned:
Distribution |
Distribution under study. |
Parameters |
Parameters used to draw the plots (if |
Estimates |
A list with the maximum likelihood estimates of the parameters of all distributions considered. |
StdErrors |
Vector containing the estimated standard errors. |
aic |
The Akaike information criterion. |
bic |
The so-called BIC or SBC (Schwarz Bayesian criterion). |
K. Langohr, M. Besalú, M. Francisco, A. Garcia, G. Gómez.
J. R. Michael. The Stabilized Probability Plot. In: Biometrika 70 (1) (1983), 11-17.
L.A. Waller and B.W. Turnbull. Probability Plotting with Censored Data. In: American Statistician 46 (1) (1992), 5-12.
# P-P, Q-Q, SP, and EP plots for complete data set.seed(123) x <- rlnorm(1000, 3, 2) probPlot(x) probPlot(x, distr = "lognormal") # P-P, Q-Q, SP, and EP plots for censored data using ggplot2 probPlot(Surv(time, status) ~ 1, colon, "weibull", ggp = TRUE) # P-P, Q-Q and SP plots for censored data and lognormal distribution data(nba) probPlot(Surv(survtime, cens) ~ 1, nba, "lognorm", plots = c("PP", "QQ", "SP"), ggp = TRUE, m = matrix(1:3, nr = 1))
# P-P, Q-Q, SP, and EP plots for complete data set.seed(123) x <- rlnorm(1000, 3, 2) probPlot(x) probPlot(x, distr = "lognormal") # P-P, Q-Q, SP, and EP plots for censored data using ggplot2 probPlot(Surv(time, status) ~ 1, colon, "weibull", ggp = TRUE) # P-P, Q-Q and SP plots for censored data and lognormal distribution data(nba) probPlot(Surv(survtime, cens) ~ 1, nba, "lognorm", plots = c("PP", "QQ", "SP"), ggp = TRUE, m = matrix(1:3, nr = 1))