Title: | Semiparametric Regression Analysis of Interval-Censored Data |
---|---|
Description: | Currently using the proportional hazards (PH) model. More methods under other semiparametric regression models will be included in later versions. |
Authors: | Christopher S. McMahan and Lianming Wang |
Maintainer: | Lianming Wang <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2024-12-01 08:00:45 UTC |
Source: | CRAN |
This package allows for semiparametric regression analysis of interval-censored data under the proportional hazards model (PH). Monotone splines are used to estimate the unknown nondecreasing cumulative baseline hazard function. Model fitting is completed through the implementation of an expectation maximization (EM) algorithm.
Package: | ICsurv |
Type: | Package |
Version: | 1.0 |
Date: | 2013-10-18 |
Christopher S. McMahan and Lianming Wang
Maintainers: Christopher S. McMahan <[email protected]> and Lianming Wang <[email protected]>
This function is equivalent to the function "PH.ICsurv.EM" but runs much faster.
fast.PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int, order, g0, b0, tol, t.seq, equal = TRUE)
fast.PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int, order, g0, b0, tol, t.seq, equal = TRUE)
d1 |
vector indicating whether an observation is left-censored (1) or not (0). |
d2 |
vector indicating whether an observation is interval-censored (1) or not (0). |
d3 |
vector indicating whether an observation is right-censored (1) or not (0). |
Li |
the left endpoint of the observed interval, if an observation is left-censored its corresponding entry should be 0. |
Ri |
the right endpoint of the observed interval, if an observation is right-censored its corresponding entry should be Inf. |
Xp |
design matrix of predictor variables (in columns), should be specified without an intercept term. |
n.int |
the number of interior knots to be used. |
order |
the order of the basis functions. |
g0 |
initial estimate of the spline coefficients; should be of length n.int+order. |
b0 |
initial estimate of regression coefficients; should be of length dim(Xp)[2]. |
tol |
the convergence criterion of the EM algorithm, see details for further description. |
t.seq |
an increasing sequence of points at which the cumulative baseline hazard function is evaluated. |
equal |
logical, if TRUE knots are spaced evenly across the range of the endpoints of the observed intervals and if FALSE knots are placed at quantiles. |
The above function fits the semiparametric proportional hazards model (PH), proposed in Wang et al. (2014+), to interval censored data via an EM algorithm. For a discussion of starting values, number of interior knots, order, and further details please see Wang et al. (2014+). The EM algorithm converges when the maximum of the absolute difference in the parameter estimates (to include the regression and spline coefficients) is less than tol. The Hessian matrix of the observed likelihood is given in the output. The variance-covariance matrix of the estimated regression and spline coefficients can be obtained by taking the inverse of the Hessian matrix. When the Hessian matrix is singular, the variance matrix of the regression parameters is obtained by using the inverse of blocked matrix, which only involves taking inverse of lower dimensional matrices. To further provide robustness, the generalized inverse function "ginv" in the supporting package "MASS" is used in this case. A function in the supporting R package "matrixcalc" is used to check whether the Hessian matrix is singular.
b |
estimates of the regression coefficients. |
g |
estimates of the spline coefficients. |
hz |
estimated cumulative baseline hazard function evaluated at the points t.seq. |
Hessian |
the Hessian matrix. Its inverse is the variance covariance matrix of b and g. |
var.b |
the variance covariance matrix of b |
flag |
the indicator whether the Hessian matrix is non-singular. When flag=0, the variance estimate may not be accurate. |
AIC |
the Akaike information criterion. |
BIC |
the Bayesian information/Schwarz criterion. |
ll |
the value of the maximized log-likelihood. |
Wang, L., McMahan, C., and Hudgens, M. (2014+). A flexible and computationally efficient method for fitting the proportional hazards model to interval censored data. Submitted.
This function is a fast version of PH.Louis.ICsurv
. It calculates the negative of the Hessian of the log of the observed data likelihood, obtained via Louis's method, evaluated at the last step of the EM algorithm described in Wang et al. (2014+). This
is a support function for fast.PH.ICsurv.EM
.
fast.PH.Louis.ICsurv(b, g, bLi, bRi, d1, d2, d3, Xp)
fast.PH.Louis.ICsurv(b, g, bLi, bRi, d1, d2, d3, Xp)
b |
estimates of the regression coefficients obtained at the convergence of the EM algorithm. |
g |
estimates of the spline coefficients obtained from at the convergence of the EM algorithm. |
bLi |
an I-spline basis matrix of dimension c(length(knots)+order-2, length(x)), corresponding to the left end points of the observed intervals. |
bRi |
an I-spline basis matrix of dimension c(length(knots)+order-2, length(x),), corresponding to the left end points of the observed intervals. |
d1 |
vector indicating whether an observation is left-censored (1) or not (0). |
d2 |
vector indicating whether an observation is interval-censored (1) or not (0). |
d3 |
vector indicating whether an observation is right-censored (1) or not (0). |
Xp |
design matrix of predictor variables (in columns), should be specified without an intercept term. |
To obtain the Hessian matrix of the observed likelihood evaluated at the last step of the EM algorithm.
Hessian matrix.
Louis, T. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B 44, 226-233.
Wang, L., McMahan, C., and Hudgens, M. (2014+). A flexible and computationally efficient method for fitting the proportional hazards model to interval censored data. Submitted.
This data were collected as a part of a multi-center prospective study aimed at assessing the HIV-1 infection rate among people with hemophilia. In particular, the individuals in the study were at risk from contracting HIV-1 from blood products made from donor's plasma. In this study, 544 patients were categorized into one of four groups based on their average annual dose of blood products. Specifically, patients were classified into high, medium, low, or no dose group. The goal of the study was to compare the HIV-1 infection rate between these dose groups. The time at which patients contracted HIV-1 is not known exactly, but were known to occur before, between, or after certain observation times. For more details pertaining to this data set see Goedert et al. (1989) and Kroner et al. (1994).
data(Hemophilia)
data(Hemophilia)
A data frame with 544 observations on the following 8 variables.
d1
Censoring indicator, 1 if patient's infection time was left censored, 0 otherwise.
d2
Censoring indicator, 1 if patient's infection time was interval censored, 0 otherwise.
d3
Censoring indicator, 1 if patient's infection time was right censored and, otherwise.
L
Left endpoint of the observation interval
R
Right endpoint of the observation interval
Low
Binary variable indicating the patient was a member of the low dose group.
Medium
Binary variable indicating the patient was a member of the medium dose group.
High
Binary variable indicating the patient was a member of the high dose group.
Goedert, J., Kessler, C., Adedort, L., Biggar, R., et al., 1989. A progressive-study of human immunodeficiency virus type-1 infection and the development of AIDS in subjects with hemophilia. New England Journal of Medicine 321, 1141-1148.
Kroner, B., Rosenberg, P., Adedort, L., Alvord, W., Goedert, J., 1994. HIV-1 infection incidence among people with hemophilia in the United States and Western Europe, 1978?1990. Journal of Acquired Immune Deficiency Syndromes 7, 279-286.
Generates the I-spline basis matrix associated with integrated spline basis functions. Created by Cai and Wang in October, 2009.
Ispline(x, order, knots)
Ispline(x, order, knots)
x |
the predictor variables. |
order |
the order of the basis functions. |
knots |
a sequence of increasing points specifying the placement of the knots. |
An I-spline basis matrix of dimension c(length(knots)+order-2, length(x)).
Wang, L., McMahan, C., and Hudgens, M. (2014+). A flexible and computationally efficient method for fitting the proportional hazards model to interval censored data. Submitted.
Ramsay, J. (1988). Monotone regression splines in action. Statistical Science 3, 425-441.
Fits the semiparametric proportional hazards model (PH), proposed in Wang et al. (2014+), to interval censored data via an EM algorithm.
PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int, order, g0, b0, tol, t.seq, equal = TRUE)
PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int, order, g0, b0, tol, t.seq, equal = TRUE)
d1 |
vector indicating whether an observation is left-censored (1) or not (0). |
d2 |
vector indicating whether an observation is interval-censored (1) or not (0). |
d3 |
vector indicating whether an observation is right-censored (1) or not (0). |
Li |
the left endpoint of the observed interval, if an observation is left-censored its corresponding entry should be 0. |
Ri |
the right endpoint of the observed interval, if an observation is right-censored its corresponding entry should be Inf. |
Xp |
design matrix of predictor variables (in columns), should be specified without an intercept term. |
n.int |
the number of interior knots to be used. |
order |
the order of the basis functions. |
g0 |
initial estimate of the spline coefficients; should be of length n.int+order. |
b0 |
initial estimate of regression coefficients; should be of length dim(Xp)[2]. |
tol |
the convergence criterion of the EM algorithm, see details for further description. |
t.seq |
an increasing sequence of points at which the cumulative baseline hazard function is evaluated. |
equal |
logical, if TRUE knots are spaced evenly across the range of the endpoints of the observed intervals and if FALSE knots are placed at quantiles. |
The above function fits the semiparametric proportional hazards model (PH), proposed in Wang et al. (2014+), to interval censored data via an EM algorithm. For a discussion of starting values, number of interior knots, order, and further details please see Wang et al. (2014+). The EM algorithm converges when the maximum of the absolute difference in the parameter estimates (to include the regression and spline coefficients) is less than tol. The Hessian matrix of the observed likelihood is given in the output. The variance-covariance matrix of the estimated regression and spline coefficients can be obtained by taking the inverse of the Hessian matrix. When the Hessian matrix is singular, the variance matrix of the regression parameters is obtained by using the inverse of blocked matrix, which only involves taking inverse of lower dimensional matrices. To further provide robustness, the generalized inverse function "ginv" in the supporting package "MASS" is used in this case. A function in the supporting R package "matrixcalc" is used to check whether the Hessian matrix is singular.
b |
estimates of the regression coefficients. |
g |
estimates of the spline coefficients. |
hz |
estimated cumulative baseline hazard function evaluated at the points t.seq. |
Hessian |
the Hessian matrix. Its inverse is the variance covariance matrix of b and g. |
var.b |
the variance covariance matrix of b |
flag |
the indicator whether the Hessian matrix is non-singular. When flag=0, the variance estimate may not be accurate. |
AIC |
the Akaike information criterion. |
BIC |
the Bayesian information/Schwarz criterion. |
ll |
the value of the maximized log-likelihood. |
Wang, L., McMahan, C., and Hudgens, M. (2014+). A flexible and computationally efficient method for fitting the proportional hazards model to interval censored data. Submitted.
data(Hemophilia) d1<-Hemophilia[,1] d2<-Hemophilia[,2] d3<-Hemophilia[,3] Li<-Hemophilia[,4] Ri<-Hemophilia[,5] Xp<-as.matrix(Hemophilia[,c(6,7,8)]) fit <- PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int=8, order=3, g0=rep(1,11), b0=rep(0,3), tol=0.001, t.seq=seq(0,57,1), equal = TRUE) fit$b # [1] 1.837590 3.018500 3.418981 fit$var.b # [,1] [,2] [,3] # [1,] 0.008526765 -0.01090578 0.01199610 # [2,] -0.010905780 0.01265952 -0.01462116 # [3,] 0.011996095 -0.01462116 0.08624411
data(Hemophilia) d1<-Hemophilia[,1] d2<-Hemophilia[,2] d3<-Hemophilia[,3] Li<-Hemophilia[,4] Ri<-Hemophilia[,5] Xp<-as.matrix(Hemophilia[,c(6,7,8)]) fit <- PH.ICsurv.EM(d1, d2, d3, Li, Ri, Xp, n.int=8, order=3, g0=rep(1,11), b0=rep(0,3), tol=0.001, t.seq=seq(0,57,1), equal = TRUE) fit$b # [1] 1.837590 3.018500 3.418981 fit$var.b # [,1] [,2] [,3] # [1,] 0.008526765 -0.01090578 0.01199610 # [2,] -0.010905780 0.01265952 -0.01462116 # [3,] 0.011996095 -0.01462116 0.08624411
Calculates the negative of the Hessian of the log of the observed data likelihood, obtained via Louis's method, evaluated at the last step of the EM algorithm described in Wang et al. (2014+). This
is a support function for PH.ICsurv.EM
.
PH.Louis.ICsurv(b, g, bLi, bRi, d1, d2, d3, Xp)
PH.Louis.ICsurv(b, g, bLi, bRi, d1, d2, d3, Xp)
b |
estimates of the regression coefficients obtained at the convergence of the EM algorithm. |
g |
estimates of the spline coefficients obtained from at the convergence of the EM algorithm. |
bLi |
an I-spline basis matrix of dimension c(length(knots)+order-2, length(x)), corresponding to the left end points of the observed intervals. |
bRi |
an I-spline basis matrix of dimension c(length(knots)+order-2, length(x),), corresponding to the left end points of the observed intervals. |
d1 |
vector indicating whether an observation is left-censored (1) or not (0). |
d2 |
vector indicating whether an observation is interval-censored (1) or not (0). |
d3 |
vector indicating whether an observation is right-censored (1) or not (0). |
Xp |
design matrix of predictor variables (in columns), should be specified without an intercept term. |
To obtain the Hessian matrix of the observed likelihood evaluated at the last step of the EM algorithm.
Hessian matrix.
Louis, T. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B 44, 226-233.
Wang, L., McMahan, C., and Hudgens, M. (2014+). A flexible and computationally efficient method for fitting the proportional hazards model to interval censored data. Submitted.