Package 'SurvImpute'

Title: Multiple Imputation for Missing Covariates in Time-to-Event Data
Description: Generates multiple imputed datasets from a substantive model compatible fully conditional specification model for time-to-event data. Our method assumes that the censoring process also depends on the covariates with missing values. Details will be available in an upcoming publication.
Authors: Qinghua Lian [aut, cre, cph], Soyoung Kim [aut], Michael J. Martens [aut], Kwang Woo Ahn [aut]
Maintainer: Qinghua Lian <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2025-01-31 18:16:17 UTC
Source: CRAN

Help Index


Simulate competing risks endpoints data where censoring depends on missing covariates.

Description

We are assuming three covariates, two continuous and one binary, where we assume the first continous and the binary covariates have missing values, and the second continuous covariates is fully observed.

Usage

generate.compt(n, beta1, beta2, phi, gamma, seed)

Arguments

n

Sample Size.

beta1

Regression Coefficients for the event of interest process (length of 3)

beta2

Regression Coefficients for the event of competing risk process (length of 3).

phi

Regression Coefficients for the censoring process (length of 3).

gamma

Regression Coefficients for the missingness (length of 3).

seed

A random seed for data generation.

Value

A data frame with missing values.

Examples

# Simulate a data set with approximately 45% censoring, 35% cause 1,
# and 20% causes 2, and half of the subjects with at least one missing covariates.
generate.compt(n = 100, beta1 = c(1,1,-1), beta2 = c(2, 2, -2),
phi= c(-1,-1,-0.5), gamma = c(1,1,1,-1), seed = 112358)

Simulate survival endpoints data where censoring depends on missing covariates.

Description

We are assuming three covariates, two continuous and one binary, where we assume the first continous and the binary covariates have missing values, and the second continuous covariates is fully observed.

Usage

generate.surv(n, beta, phi, gamma, seed)

Arguments

n

Sample Size.

beta

Regression Coefficients for the event process (length of 3).

phi

Regression Coefficients for the censoring process (length of 3).

gamma

Regression Coefficients for the missingness (length of 3).

seed

A random seed for data generation.

Value

A data frame with missing values. For the delta column, 0 = censored, and 1 = event.

Examples

# Simulate a data set with approximately 50% censoring
# and 50% of the subjects with at least one missing covariates.
generate.surv(n = 100, beta = c(1,1,-1), phi= c(1,-1,-0.5), gamma = c(3,2,-1), seed = 112358)

Multiple imputation method for missing covariates in competing risks endpoint.

Description

Generates multiple imputed datasets from a substantive model compatible fully conditional specification model. This method incorporates the cause-specific hazards regression model into the imputation stage, and assumes that the censoring process also depends on the covariates with missing values. Without loss of generality, we assumed there is only one competing event. Our method is an extension of Bartlett et. al. (2015) and Bartlett and Taylor (2016).

Usage

new.smcfcs.compt(data, smformula, method, m = 10, rjlimit = 5000)

Arguments

data

The input data with missing values. Note: all missing values should be coded as "NA", all binary covariates should be coded as 0/1, and all categorical covariates with more than two categories should have a class of "factor".

smformula

The substantive model formula. For competing risks data, this should be a list of two Cox models for both the event of interest and the competing event (see example).

method

A vector of strings for each covariate specifying the the type of regression model to impute them. The length of this vector should match the number of columns in the input dataset, and also match the position of each column (Including the outcomes). If a covariate is fully observed, the value should be blank (""). Other possible options are "norm"(linear regression for continuous covariates), "lm"(logistic regression for binary covariates), and "mlogit"(multinomial logistic regression for categorical covariates).

m

Number of complete datasets to generate, default is 10.

rjlimit

Maximum number of rejection sampling attempts, default is 5000. If there are subjects who did not get a success sampled value for the missing after reaching the limit, a warning message will be issued suggesting to increase the limit.

Value

A list containing the imputed datasets.

Author(s)

Qinghua Lian [email protected]

References

Bartlett JW, Taylor JM. (2016) Missing covariates in competing risks analysis. Biostatistics. 17(4), 751-63.

Examples

# Generate data with missing values
compt <- generate.compt(n = 500, beta1 = c(1,1,-1), beta2 = c(2, 2, -2),
phi= c(-1,-1,-0.5), gamma = c(1,1,1,-1), seed = 112358)
# Impute
imputed <- new.smcfcs.compt(data = compt,
smformula = c("Surv(time, delta==1)~X1 +X2+X3", "Surv(time, delta==2)~X1 +X2+X3"),
method = c("","", "norm","logreg",""),m = 10, rjlimit = 10000)
# Fit a Cox regression on each imputed dataset, then produce the final estimates using Rubin's rule.
require(mitools)
library(survival)
imputed.fit <- with(imputationList(imputed), expr = coxph(Surv(time, delta==1) ~ X1+X2+X3))
summary(MIcombine(imputed.fit))
#' @author Qinghua Lian \email{[email protected]}

Multiple imputation method for missing covariates in survival endpoint.

Description

Generates multiple imputed datasets from a substantive model compatible fully conditional specification model. This method incorporates the cause-specific hazards regression model into the imputation stage, and assumes that the censoring process also depends on the covariates with missing values. Without loss of generality, we assumed there is only one competing event. Our method is an extension of Bartlett et. al. (2015) and Bartlett and Taylor (2016).

Usage

new.smcfcs.surv(data, smformula, method, m = 10, rjlimit = 5000)

Arguments

data

The input data with missing values. Note: all missing values should be coded as "NA", all binary covariates should be coded as 0/1, and all categorical covariates with more than two categories should have a class of factor.

smformula

The substantive model formula. For survival data, this should be a formula of "Surv(time, delta) ~ Covariates list" type. (see example).

method

A vector of strings for each covariate specifying the the type of regression model to impute them. The length of this vector should match the number of columns in the input dataset, and also match the position of each column (Including the outcomes). If a covariate is fully observed, the value should be blank (""). Other possible options are "norm"(linear regression for continuous covariates), "lm"(logistic regression for binary covariates), and "mlogit"(multinomial logistic regression for categorical covariates).

m

Number of complete datasets to generate, default is 10.

rjlimit

Maximum number of rejection sampling attempts, default is 5000. If there are subjects who did not get a success sampled value for the missing after reaching the limit, a warning message will be issued suggesting to increase the limit.

Value

A list containing the imputed datasets.

References

Bartlett JW, Seaman SR, White IR, Carpenter JR. (2015). Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Statistical Methods in Medical Research. 24(4), 462-487.

Examples

# Generate data with missing values
cox <- generate.surv(n = 500, beta = c(1,1,-1), phi= c(1,-1,-0.5), gamma = c(3,2,-1), seed = 112358)
# Impute
imputed <- new.smcfcs.surv(data = cox, smformula = "Surv(time, delta)~X1 +X2+X3",
method = c("","", "norm","logreg",""), m = 10, rjlimit = 10000)
# Fit a Cox regression on each imputed dataset, then produce the final estimates using Rubin's rule.
require(mitools)
library(survival)
imputed.fit <- with(imputationList(imputed), expr = coxph(Surv(time, delta) ~ X1+X2+X3))
summary(MIcombine(imputed.fit))