Package 'simsl' reference manual

Title:	Single-Index Models with a Surface-Link
Description:	An implementation of a single-index regression for optimizing individualized dose rules from an observational study. To model interaction effects between baseline covariates and a treatment variable defined on a continuum, we employ two-dimensional penalized spline regression on an index-treatment domain, where the index is defined as a linear combination of the covariates (a single-index). An unspecified main effect for the covariates is allowed, which can also be modeled through a parametric model. A unique contribution of this work is in the parsimonious single-index parametrization specifically defined for the interaction effect term. We refer to Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1111/biom.13320> (for the case of a discrete treatment) and Park, Petkova, Tarpey, and Ogden (2021) "A single-index model with a surface-link for optimizing individualized dose rules" <arXiv:2006.00267v2> for detail of the method. The model can take a member of the exponential family as a response variable and can also take an ordinal categorical response. The main function of this package is simsl().
Authors:	Hyung Park, Eva Petkova, Thaddeus Tarpey, R. Todd Ogden
Maintainer:	Hyung Park <parkh15@nyu.edu>
License:	GPL-3
Version:	0.2.1
Built:	2025-03-08 06:05:41 UTC
Source:	CRAN

Air pollution dataset

Description

Daily air pollution and death rate data for Chicago

Format

A data frame with 7 columns and 5114 rows; each row refers to one day; the columns correspond to:

death: total deaths (per day).
pm10median: median particles in 2.5-10 per cubic m
pm25median: median particles < 2.5 mg per cubic m (more dangerous).
o3median: Ozone in parts per billion
so2median: Median Sulpher dioxide measurement
time: time in days
tmpd: temperature in fahrenheit

Details

The data are from Peng and Welty (2004) and are available from R (R Core Team, 2019) package gamair (Wood, 2019).

The daily death in the city of Chicago is recorded over a number of years (about 14 years). Each observation is a time series of daily mortality counts, indicating the number of deaths that occurred on each day.

Source

The chicago dataset is available from package gamair (Wood, 2019).

References

Peng, R.D. and Welty, L.J. (2004) The NMMAPSdata package. R News 4(2)

Wood, S.N. (2017) Generalized Additive Models: An Introduction with R

Wood, S.N. (2019) gamair: Data for ’GAMs: An introduction with R’. R package version 1.0.2

A subfunction used in estimation

Description

This function computes the 1st derivative of the surface-link function with respect to the argument associated with the pure interaction effect term of the smooth, using finite difference.

Usage

der.link(g.fit, eps = 10^(-4))
der.link(g.fit, eps = 10^(-4))

Arguments

`g.fit`	a `mgcv::gam` object
`eps`	a small finite difference used in numerical differentiation.

Single-index models with a surface-link (workhorse function)

Description

fit.simsl is the workhorse function for Single-index models with a surface-link (SIMSL).

Usage

fit.simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE)
fit.simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE)

Arguments

`y`	a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by `mgcv::gam`; y can also be an ordinal categorial response with `R` categories taking a value from 1 to `R`.
`A`	a n-by-1 vector of treatment variable; each element is assumed to take a value on a continuum.
`X`	a n-by-p matrix of baseline covarates.
`Xm`	a n-by-q design matrix associated with an X main effect model; the defult is `NULL` and it is taken as a vector of zeros
`family`	specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by `mgcv::gam`; can also be "ordinal", for an ordinal categorical response y.
`R`	the number of response categories for the case of family = "ordinal".
`bs`	basis type for the treatment (A) and single-index domains, respectively; the defult is "ps" (p-splines); any basis supported by `mgcv::gam` can be used, e.g., "cr" (cubic regression splines); see `mgcv::s` for detail.
`k`	basis dimension for the treatment (A) and single-index domains, respectively.
`m`	a length 2 list (e.g., m=list(c(2,3), c(2,2))), for the treatment (A) and single-index domains, respectively, where each element specifies the order of basis and penalty (note, for bs="ps", c(2,3) means a 2nd order P-spline basis (cubic spline) and a 3rd order difference penalty; the default "NA" sets c(2,2) for each domain); see `mgcv::s` for details.
`sp`	a vector of smoothing parameters; Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula (i.e., A, and then the single-index); negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible; see `mgcv::gam` for detail.
`knots`	a list containing user-specified knot values to be used for basis construction, for the treatment (A) and single-index domains, respectively.
`sep.A.effect`	If `TRUE`, the g term of SIMSL is further decomposed into: the A main effect + the A-by-X interaction effect; the default is `FALSE`.
`mc`	a length 2 vector indicating which marginals (i.e., A and the single-index, respectively) should have centering (i.e., the sum-to-zero) constraints applied; the default is `mc = c(TRUE, FALSE)` (see `mgcv::te` for detail of the constraint), which is sufficient for the so-called "orthogonality" constraint of the SIMSL.
`method`	the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by `mgcv::gam` can be used.
`beta.ini`	an initial value for `beta.coef`; a p-by-1 vector; the defult is `NULL`, in which case a linear model estimate is used.
`ind.to.be.positive`	for identifiability of the solution `beta.coef`, the user can restrict the jth (e.g., j=1) component of `beta.coef` to be positive; by default, we match the "overall" sign of `beta.coef` with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.
`random.effect`	if `TRUE`, as part of the main effects, the user can incorporate z-specific random intercepts.
`z`	a factor that specifies the random intercepts when `random.effect = TRUE`.
`gamma`	increase this beyond 1 to produce smoother models. `gamma` multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see `mgcv::gam` for detail); the default is 1.
`pen.order`	0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of `beta.coef`.
`lambda`	a regularization parameter associated with the penalized LS for `beta.coef` update.
`max.iter`	an integer specifying the maximum number of iterations for `beta.coef` update.
`eps.iter`	a value specifying the convergence criterion of algorithm.
`trace.iter`	if `TRUE`, trace the estimation process and print the differences in `beta.coef`.
`center.X`	if `TRUE`, center X to have zero mean.
`scale.X`	if `TRUE`, scale X to have unit variance.
`uncons.final.fit`	if `TRUE`, once the convergence in the estimates of `beta.coef` is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final surface-link estimate.

Details

The function estimates a linear combination (a single-index) of covariates X, and captures a nonlinear interactive structure between the single-index and the treatment defined on a continuum via a smooth surface-link on the index-treatment domain.

SIMSL captures the effect of covariates via a single-index and their interaction with the treatment via a 2-dimensional smooth link function. Interaction effects are determined by shapes of the link function. The model allows comparing different individual treatment levels and constructing individual treatment rules, as functions of a biomarker signature (single-index), efficiently utilizing information on patient’s characteristics. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with pretreatment clinical information.

Value

a list of information of the fitted SIMSL including

`beta.coef`	the estimated single-index coefficients.
`g.fit`	a `mgcv:gam` object containing information about the estimated 2-dimensional link function as well as the X main effect model.
`beta.ini`	the initial value used in the estimation of `beta.coef`
`beta.path`	solution path of `beta.coef` over the iterations
`d.beta`	records the change in `beta.coef` over the solution path, `beta.path`
`X.scale`	sd of pretreatment covariates X
`X.center`	mean of pretreatment covariates X
`A.range`	range of the observed treatment variable A
`p`	number of baseline covariates X
`n`	number of subjects

Author(s)

Park, Petkova, Tarpey, Ogden

SIMSL prediction function

Description

This function makes predictions from an estimated SIMSL, given a (new) set of covariates. The function returns a set of predicted outcomes given the treatment values in a dense grid of treatment levels for each individual, and a recommended treatment level (assuming a larger value of the outcome is better).

Usage

pred.simsl(simsl.obj, newX = NULL, newA = NULL, newXm = NULL,
  single.index = NULL, L = 50, type = "link", maximize = TRUE)
pred.simsl(simsl.obj, newX = NULL, newA = NULL, newXm = NULL,
  single.index = NULL, L = 50, type = "link", maximize = TRUE)

Arguments

`simsl.obj`	a `simsl` object
`newX`	a (n-by-p) matrix of new values for the covariates X at which predictions are to be made.
`newA`	a (n-by-L) matrix of new values for the treatment A at which predictions are to be made.
`newXm`	a (n-by-q) matrix of new values for the covariates associated with the fitted main effect Xm at which predictions are to be made.
`single.index`	a length n vector specifying new values for the single-index at which predictions are to be made; the default is `NULL`.
`L`	when `newA=NULL`, a value specifying the length of the grid of A at which predictions are to be made.
`type`	the type of prediction required; the default "response" is on the scale of the response variable; the alternative "link" is on the scale of the linear predictors.
`maximize`	the default is `TRUE`, assuming a larger value of the outcome is better; if `FALSE`, a smaller value is assumed to be prefered.

Value

`pred.new`	a (n-by-L) matrix of predicted values; each column represents a treatment dose.
`trt.rule`	a (n-by-1) vector of suggested treatment assignments

Author(s)

Park, Petkova, Tarpey, Ogden

Single-index models with a surface-link (main function)

Description

simsl is the wrapper function for fitting a single-index model with a surface-link (SIMSL). The function estimates a linear combination (a single-index) of baseline covariates X, and models a nonlinear interactive structure between the single-index and a treatment variable defined on a continuum, by estimating a smooth link function on the index-treatment domain. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Usage

simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE,
  bootstrap = FALSE, nboot = 200, boot.conf = 0.95, seed = 1357)
simsl(y, A, X, Xm = NULL, family = "gaussian", R = NULL,
  bs = c("ps", "ps"), k = c(8, 8), m = list(NA, NA), sp = NULL,
  knots = NULL, sep.A.effect = FALSE, mc = c(TRUE, FALSE),
  method = "GCV.Cp", beta.ini = NULL, ind.to.be.positive = NULL,
  random.effect = FALSE, z = NULL, gamma = 1, pen.order = 0,
  lambda = 0, max.iter = 10, eps.iter = 0.01, trace.iter = TRUE,
  center.X = TRUE, scale.X = TRUE, uncons.final.fit = TRUE,
  bootstrap = FALSE, nboot = 200, boot.conf = 0.95, seed = 1357)

Arguments

`y`	a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by `mgcv::gam`; y can also be an ordinal categorial response with `R` categories taking a value from 1 to `R`.
`A`	a n-by-1 vector of treatment variable; each element is assumed to take a value on a continuum.
`X`	a n-by-p matrix of baseline covarates.
`Xm`	a n-by-q design matrix associated with an X main effect model; the defult is `NULL` and it is taken as a vector of zeros
`family`	specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by `mgcv::gam`; can also be "ordinal", for an ordinal categorical response y.
`R`	the number of response categories for the case of family = "ordinal".
`bs`	basis type for the treatment (A) and single-index domains, respectively; the defult is "ps" (p-splines); any basis supported by `mgcv::gam` can be used, e.g., "cr" (cubic regression splines); see `mgcv::s` for detail.
`k`	basis dimension for the treatment (A) and single-index domains, respectively.
`m`	a length 2 list (e.g., m=list(c(2,3), c(2,2))), for the treatment (A) and single-index domains, respectively, where each element specifies the order of basis and penalty (note, for bs="ps", c(2,3) means a 2nd order P-spline basis (cubic spline) and a 3rd order difference penalty; the default "NA" sets c(2,2) for each domain); see `mgcv::s` for details.
`sp`	a vector of smoothing parameters; Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula (i.e., A, and then the single-index); negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible; see `mgcv::gam` for detail.
`knots`	a list containing user-specified knot values to be used for basis construction, for the treatment (A) and single-index domains, respectively.
`sep.A.effect`	If `TRUE`, the g term of SIMSL is further decomposed into: the A main effect + the A-by-X interaction effect; the default is `FALSE`.
`mc`	a length 2 vector indicating which marginals (i.e., A and the single-index, respectively) should have centering (i.e., the sum-to-zero) constraints applied; the default is `mc = c(TRUE, FALSE)` (see `mgcv::te` for detail of the constraint), which is sufficient for the so-called "orthogonality" constraint of the SIMSL.
`method`	the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by `mgcv::gam` can be used.
`beta.ini`	an initial value for `beta.coef`; a p-by-1 vector; the defult is `NULL`, in which case a linear model estimate is used.
`ind.to.be.positive`	for identifiability of the solution `beta.coef`, the user can restrict the jth (e.g., j=1) component of `beta.coef` to be positive; by default, we match the "overall" sign of `beta.coef` with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.
`random.effect`	if `TRUE`, as part of the main effects, the user can incorporate z-specific random intercepts.
`z`	a factor that specifies the random intercepts when `random.effect = TRUE`.
`gamma`	increase this beyond 1 to produce smoother models. `gamma` multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see `mgcv::gam` for detail); the default is 1.
`pen.order`	0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of `beta.coef`.
`lambda`	a regularization parameter associated with the penalized LS for `beta.coef` update.
`max.iter`	an integer specifying the maximum number of iterations for `beta.coef` update.
`eps.iter`	a value specifying the convergence criterion of algorithm.
`trace.iter`	if `TRUE`, trace the estimation process and print the differences in `beta.coef`.
`center.X`	if `TRUE`, center X to have zero mean.
`scale.X`	if `TRUE`, scale X to have unit variance.
`uncons.final.fit`	if `TRUE`, once the convergence in the estimates of `beta.coef` is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final surface-link estimate.
`bootstrap`	if `TRUE`, compute bootstrap confidence intervals for the single-index coefficients, `beta.coef`; the default is `FALSE`.
`nboot`	when `bootstrap=TRUE`, a value specifying the number of bootstrap replications.
`boot.conf`	a value specifying the confidence level of the bootstrap confidence intervals; the defult is `boot.conf = 0.95`.
`seed`	when `bootstrap=TRUE`, randomization seed used in bootstrap resampling.

Details

SIMSL captures the effect of covariates via a single-index and their interaction with the treatment via a 2-dimensional smooth link function. Interaction effects are determined by shapes of the link surface. The SIMSL allows comparing different individual treatment levels and constructing individual treatment rules, as functions of a biomarker signature (single-index), efficiently utilizing information on patient’s characteristics. The resulting simsl object can be used to estimate an optimal dose rule for a new patient with baseline clinical information.

Value

a list of information of the fitted SIMSL including

`beta.coef`	the estimated single-index coefficients.
`g.fit`	a `mgcv:gam` object containing information about the estimated 2-dimensional link function.
`beta.ini`	the initial value used in the estimation of `beta.coef`
`beta.path`	solution path of `beta.coef` over the iterations
`d.beta`	records the change in `beta.coef` over the solution path, `beta.path`
`X.scale`	sd of pretreatment covariates X
`X.center`	mean of pretreatment covariates X
`A.range`	range of the observed treatment variable A
`p`	number of baseline covariates X
`n`	number of subjects
`boot.ci`	`boot.conf`-level bootstrap CIs (LB, UB) associated with `beta.coef`
`boot.mat`	a (nboot x p) matrix of bootstrap estimates of `beta.coef`

Author(s)

Park, Petkova, Tarpey, Ogden

Examples


set.seed(1234)
n.test <- 500


## simulation 1
# generate training data
p <- 30
n <- 200
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
D_opt <- 1 + 0.5*X[,2] + 0.5*X[,1]
mean.fn <- function(X, D_opt, A){ 8 + 4*X[,1] - 2*X[,2] - 2*X[,3] - 25*((D_opt-A)^2) }
mu <-   mean.fn(X, D_opt, A)
y <- rnorm(length(mu),mu,1)
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=X)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- 1 + 0.5*X.test[,2] + 0.5*X.test[,1]
pred <- pred.simsl(simsl.obj, newX= X.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*X.test[,1] - 2*X.test[,2] - 2*X.test[,3] - 25*((f_opt.test- pred$trt.rule)^2))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.


## simulation 2
p <- 10
n <- 400
# generate training data
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
f_opt <- I(X[,1] > -0.5)*I(X[,1] < 0.5)*0.6 + 1.2*I(X[,1] > 0.5) +
 1.2*I(X[,1] < -0.5) + X[,4]^2 + 0.5*log(abs(X[,7])+1) - 0.6
mu <-   8 + 4*cos(2*pi*X[,2]) - 2*X[,4] - 8*X[,5]^3 - 15*abs(f_opt-A)
y  <- rnorm(length(mu),mu,1)
Xq <- cbind(X, X^2)  # include a quadratic term
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=Xq)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- I(X.test[,1] > -0.5)*I(X.test[,1] < 0.5)*0.6 + 1.2*I(X.test[,1] > 0.5) +
 1.2*I(X.test[,1] < -0.5) + X.test[,4]^2 + 0.5*log(abs(X.test[,7])+1) - 0.6
Xq.test <- cbind(X.test, X.test^2)
pred <- pred.simsl(simsl.obj, newX= Xq.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*cos(2*pi*X.test[,2]) - 2*X.test[,4] - 8*X.test[,5]^3 -
              15*abs(f_opt.test-pred$trt.rule))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.



 ### air pollution data application
 data(chicago); head(chicago)
 chicago <- chicago[,-3][complete.cases(chicago[,-3]), ]
 chicago <- chicago[-c(2856:2859), ]  # get rid of the gross outliers in y
 chicago <- chicago[-which.max(chicago$pm10median), ]  # get rid of the gross outliers in x

 # create lagged variables
 lagard <- function(x,n.lag=5) {
   n <- length(x); X <- matrix(NA,n,n.lag)
   for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
   X
 }
 chicago$pm10 <- lagard(chicago$pm10median)
 chicago <- chicago[complete.cases(chicago), ]
 # create season varaible
 chicago$time.day <- round(chicago$time %%  365)

 # fit SIMSL for modeling the season-by-pm10 interactions on their effects on outcomes
 simsl.obj <- simsl(y=chicago$death, A=chicago$time.day, X=chicago[,7], bs=c("cc","ps"),
                    ind.to.be.positive = 1, family="poisson", method = "REML",
                    bootstrap =FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef  # the estimated single-index coefficients
 summary(simsl.obj$g.fit)
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=-135, phi = 30,
               color="heat", se=2,ylab = "single-index", zlab = " ",
               main=expression(paste("Interaction surface g")))



 ### Warfarin data application
 data(warfarin)
 X <- warfarin$X
 A <- warfarin$A
 y <- -abs(warfarin$INR - 2.5)  # the target INR is 2.5
 X[,1:3] <- scale(X[,1:3]) # standardize continuous variables

 # Estimate the main effect, using an additive model
 mu.fit <- mgcv::gam(y-mean(y)  ~ X[, 4:13] +
                       s(X[,1], k=5, bs="ps")+
                       s(X[,2], k=5, bs="ps") +
                       s(X[,3], k=5, bs="ps"), method="REML")
 summary(mu.fit)
 mu.hat <- predict(mu.fit)
 # fit SIMSL
 simsl.obj <- simsl(y, A, X, Xm= mu.hat, scale.X = FALSE, center.X=FALSE, method="REML",
                    bootstrap = FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=52, phi = 18,
               color="heat", se=2, ylab = "single-index", zlab = "Y",
               main=expression(paste("Interaction surface g")))


set.seed(1234)
n.test <- 500


## simulation 1
# generate training data
p <- 30
n <- 200
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
D_opt <- 1 + 0.5*X[,2] + 0.5*X[,1]
mean.fn <- function(X, D_opt, A){ 8 + 4*X[,1] - 2*X[,2] - 2*X[,3] - 25*((D_opt-A)^2) }
mu <-   mean.fn(X, D_opt, A)
y <- rnorm(length(mu),mu,1)
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=X)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- 1 + 0.5*X.test[,2] + 0.5*X.test[,1]
pred <- pred.simsl(simsl.obj, newX= X.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*X.test[,1] - 2*X.test[,2] - 2*X.test[,3] - 25*((f_opt.test- pred$trt.rule)^2))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.


## simulation 2
p <- 10
n <- 400
# generate training data
X <- matrix(runif(n*p,-1,1),ncol=p)
A <- runif(n,0,2)
f_opt <- I(X[,1] > -0.5)*I(X[,1] < 0.5)*0.6 + 1.2*I(X[,1] > 0.5) +
 1.2*I(X[,1] < -0.5) + X[,4]^2 + 0.5*log(abs(X[,7])+1) - 0.6
mu <-   8 + 4*cos(2*pi*X[,2]) - 2*X[,4] - 8*X[,5]^3 - 15*abs(f_opt-A)
y  <- rnorm(length(mu),mu,1)
Xq <- cbind(X, X^2)  # include a quadratic term
# fit SIMSL
simsl.obj <- simsl(y=y, A=A, X=Xq)

# generate testing data
X.test <- matrix(runif(n.test*p,-1,1),ncol=p)
A.test <- runif(n.test,0,2)
f_opt.test <- I(X.test[,1] > -0.5)*I(X.test[,1] < 0.5)*0.6 + 1.2*I(X.test[,1] > 0.5) +
 1.2*I(X.test[,1] < -0.5) + X.test[,4]^2 + 0.5*log(abs(X.test[,7])+1) - 0.6
Xq.test <- cbind(X.test, X.test^2)
pred <- pred.simsl(simsl.obj, newX= Xq.test)  # make prediction based on the estimated SIMSL
value <- mean(8 + 4*cos(2*pi*X.test[,2]) - 2*X.test[,4] - 8*X.test[,5]^3 -
              15*abs(f_opt.test-pred$trt.rule))
value  # "value" of the estimated treatment rule; the "oracle" value is 8.



 ### air pollution data application
 data(chicago); head(chicago)
 chicago <- chicago[,-3][complete.cases(chicago[,-3]), ]
 chicago <- chicago[-c(2856:2859), ]  # get rid of the gross outliers in y
 chicago <- chicago[-which.max(chicago$pm10median), ]  # get rid of the gross outliers in x

 # create lagged variables
 lagard <- function(x,n.lag=5) {
   n <- length(x); X <- matrix(NA,n,n.lag)
   for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
   X
 }
 chicago$pm10 <- lagard(chicago$pm10median)
 chicago <- chicago[complete.cases(chicago), ]
 # create season varaible
 chicago$time.day <- round(chicago$time %%  365)

 # fit SIMSL for modeling the season-by-pm10 interactions on their effects on outcomes
 simsl.obj <- simsl(y=chicago$death, A=chicago$time.day, X=chicago[,7], bs=c("cc","ps"),
                    ind.to.be.positive = 1, family="poisson", method = "REML",
                    bootstrap =FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef  # the estimated single-index coefficients
 summary(simsl.obj$g.fit)
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=-135, phi = 30,
               color="heat", se=2,ylab = "single-index", zlab = " ",
               main=expression(paste("Interaction surface g")))



 ### Warfarin data application
 data(warfarin)
 X <- warfarin$X
 A <- warfarin$A
 y <- -abs(warfarin$INR - 2.5)  # the target INR is 2.5
 X[,1:3] <- scale(X[,1:3]) # standardize continuous variables

 # Estimate the main effect, using an additive model
 mu.fit <- mgcv::gam(y-mean(y)  ~ X[, 4:13] +
                       s(X[,1], k=5, bs="ps")+
                       s(X[,2], k=5, bs="ps") +
                       s(X[,3], k=5, bs="ps"), method="REML")
 summary(mu.fit)
 mu.hat <- predict(mu.fit)
 # fit SIMSL
 simsl.obj <- simsl(y, A, X, Xm= mu.hat, scale.X = FALSE, center.X=FALSE, method="REML",
                    bootstrap = FALSE) # bootstrap = TRUE
 simsl.obj$beta.coef
 #round(simsl.obj$boot.ci,3)
 mgcv::vis.gam(simsl.obj$g.fit, view=c("A","single.index"), theta=52, phi = 18,
               color="heat", se=2, ylab = "single-index", zlab = "Y",
               main=expression(paste("Interaction surface g")))

Warfarin dataset

Description

The dataset provided by International Warfarin Pharmacogenetics Consortium et al. (2009). Warfarin is an anticoagulant agent widely used as a medicine to treat blood clots and prevent forming new harmful blood clots.

Format

A list containing INR, A, X:

INR: a vector of treatment outcomes of the study (INR; International Normalized Ratio)
A: a vector of therapeutic warfarin dosages
X: a data frame consist of 13 patient characteristics

Details

The dataset onsists of 1780 subjects (after removing patients with missing data and data cleaning), including information on patient covariates (X), final therapeutic warfarin dosages (A), and patient outcomes (INR, International Normalized Ratio).

There are 13 covariates in the dataset: weight (X1), height (X2), age (X3), use of the cytochrome P450 enzyme inducers (X4; the enzyme inducers considered in this analysis includes phenytoin, carbamazepine, and rifampin), use of amiodarone (X5), gender (X6; 1 for male, 0 for female), African or black race (X7), Asian race (X8), the VKORC1 A/G genotype (X9), the VKORC1 A/A genotype (X10), the CYP2C9 1/2 genotype (X11), the CYP2C9 1/3 genotype (X12), and the other CYP2C9 genotypes (except the CYP2C9 1/1 genotype which is taken as the baseline genotype) (X13).

The details of these covariate information are given in International Warfarin Pharmacogenetics Consortium et al. (2009).

Source

The data can be downloaded from https://www.pharmgkb.org/downloads/.

References

International Warfarin Pharmacogenetics Consortium, Klein, T., Altman, R., Eriksson, N., Gage, B., Kimmel, S., Lee, M., Limdi, N., Page, D., Roden, D., Wagner, M., Caldwell, M., and Johnson, J. (2009). Estimation of the warfarin dose with clinical and pharmacogenetic data. The New England Journal of Medicine 360:753–674

Chen, G., Zeng, D., and Kosorok, M. R. (2016). Personalized dose finding using outcome wieghted learning. Journal of the American Medical Association 111:1509–1547.

Package 'simsl'

Help Index

Air pollution dataset

Description

Format

Details

Source

References

A subfunction used in estimation

Description

Usage

Arguments

See Also

Single-index models with a surface-link (workhorse function)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

SIMSL prediction function

Description

Usage

Arguments

Value

Author(s)

See Also

Single-index models with a surface-link (main function)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Warfarin dataset

Description

Format

Details

Source

References