Package 'Qindex'

Title: Continuous and Dichotomized Index Predictors Based on Distribution Quantiles
Description: Select optimal functional regression or dichotomized quantile predictors for survival/logistic/numeric outcome and perform optimistic bias correction for any optimally dichotomized numeric predictor(s), as in Yi, et. al. (2023) <doi:10.1016/j.labinv.2023.100158>.
Authors: Tingting Zhan [aut, cre, cph] , Misung Yi [aut, cph] , Inna Chervoneva [aut, cph]
Maintainer: Tingting Zhan <[email protected]>
License: GPL-2
Version: 0.1.7
Built: 2024-12-15 07:47:23 UTC
Source: CRAN

Help Index


Continuous and Dichotomized Index Predictors Based on Distribution Quantiles

Description

Continuous and dichotomized index predictors based on distribution quantiles.

Author(s)

Maintainer: Tingting Zhan [email protected] (ORCID) [copyright holder]

Authors:

References

Selection of optimal quantile protein biomarkers based on cell-level immunohistochemistry data. Misung Yi, Tingting Zhan, Amy P. Peck, Jeffrey A. Hooke, Albert J. Kovatich, Craig D. Shriver, Hai Hu, Yunguang Sun, Hallgeir Rui and Inna Chervoneva. BMC Bioinformatics, 2023. doi:10.1186/s12859-023-05408-8

Quantile index biomarkers based on single-cell expression data. Misung Yi, Tingting Zhan, Amy P. Peck, Jeffrey A. Hooke, Albert J. Kovatich, Craig D. Shriver, Hai Hu, Yunguang Sun, Hallgeir Rui and Inna Chervoneva. Laboratory Investigation, 2023. doi:10.1016/j.labinv.2023.100158

Examples

### Data Preparation

library(survival)
data(Ki67, package = 'Qindex.data')
Ki67c = within(Ki67[complete.cases(Ki67), , drop = FALSE], expr = {
  marker = log1p(Marker); Marker = NULL
  PFS = Surv(RECFREESURV_MO, RECURRENCE)
})
(npt = length(unique(Ki67c$PATIENT_ID))) # 592

### Step 1: Cluster-Specific Sample Quantiles

Ki67q = clusterQp(marker ~ . - tissueID - inner_x - inner_y | PATIENT_ID, data = Ki67c)
stopifnot(is.matrix(Ki67q$marker))
head(Ki67q$marker, n = c(4L, 6L))

set.seed(234); id = sort.int(sample.int(n = npt, size = 480L))
Ki67q_0 = Ki67q[id, , drop = FALSE] # training set
Ki67q_1 = Ki67q[-id, , drop = FALSE] # test set

### Step 2 (after Step 1)

## Step 2a: Linear Sign-Adjusted Quantile Indices
(fr = Qindex(PFS ~ marker, data = Ki67q_0))
stopifnot(all.equal.numeric(c(fr), predict(fr)))
integrandSurface(fr)
integrandSurface(fr, newdata = Ki67q_1)

## Step 2b: Non-Linear Sign-Adjusted Quantile Indices
(nlfr = Qindex(PFS ~ marker, data = Ki67q_0, nonlinear = TRUE))
stopifnot(all.equal.numeric(c(nlfr), predict(nlfr)))
integrandSurface(nlfr)
integrandSurface(nlfr, newdata = Ki67q_1)

## view linear and non-linear sign-adjusted quantile indices together
integrandSurface(fr, nlfr)


### Step 2c: Optimal Dichotomizing
set.seed(14837); (m1 = optimSplit_dichotom(
  PFS ~ marker, data = Ki67q_0, nsplit = 20L, top = 2L)) 
predict(m1)
predict(m1, boolean = FALSE)
predict(m1, newdata = Ki67q_1)


### Step 3 (after Step 1 & 2)

Ki67q_0a = within.data.frame(Ki67q_0, expr = {
  FR = std_IQR(fr) 
  nlFR = std_IQR(nlfr)
  optS = std_IQR(marker[,'0.27'])
})
Ki67q_1a = within.data.frame(Ki67q_1, expr = {
  FR = std_IQR(predict(fr, newdata = Ki67q_1))
  nlFR = std_IQR(predict(nlfr, newdata = Ki67q_1))
  optS = std_IQR(marker[,'0.27']) 
})
# `optS`: use the best quantile but discard the cutoff identified by [optimSplit_dichotom]
# all models below can also be used on training data `Ki67q_0a`
# naive use
summary(coxph(PFS ~ NodeSt + Tstage + FR, data = Ki67q_1a))
summary(coxph(PFS ~ NodeSt + Tstage + nlFR, data = Ki67q_1a))
summary(coxph(PFS ~ NodeSt + Tstage + optS, data = Ki67q_1a))
# set.seed if necessary
summary(BBC_dichotom(PFS ~ NodeSt + Tstage ~ FR, data = Ki67q_1a))
# `NodeSt`, `Tstage`: predctors to be used as-is
# `FR` to be dichotomized
# set.seed if necessary
summary(BBC_dichotom(PFS ~ NodeSt + Tstage ~ nlFR, data = Ki67q_1a))
# set.seed if necessary
summary(BBC_dichotom(PFS ~ NodeSt + Tstage ~ optS, data = Ki67q_1a)) # statistically rigorous 

# Option 1
summary(BBC_dichotom(PFS ~ NodeSt + Tstage ~ FR, data = Ki67q_1a))

# Option 2:
summary(tmp <- BBC_dichotom(PFS ~ NodeSt + Tstage ~ FR, data = Ki67q_0a))
#coxph(PFS ~ NodeSt + Tstage + I(FR > attr(tmp, 'apparent_cutoff')), data = Ki67q_1a)
coxph(PFS ~ NodeSt + Tstage + I(FR > matrixStats::colMedians(BBC_cutoff(tmp))), data = Ki67q_1a)


# Option 1 and 2 are also applicable to `nlFR` and `optS`

Bootstrap-based Optimism Correction for Dichotomization

Description

Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.

Usage

BBC_dichotom(formula, data, ...)

optimism_dichotom(fom, X, data, R = 100L, ...)

coef_dichotom(fom, X., data)

Arguments

formula

formula, e.g., y~z~x or y~1~x. Response yy may be double, logical and Surv. Predictors xx's to be dichotomized may be one or more numeric vectors and/or one matrix. Additional predictors zz's, if any, may be of any type.

data

data.frame

...

additional parameters, currently not in use

fom

formula, e.g., y~z or y~1, for helper functions, with the response yy and additional predictors zz's, if any

X

numeric matrix of kk columns, numeric predictors x1,,xkx_1,\cdots,x_k to be dichotomized

R

positive integer scalar, number of bootstrap replicates RR, default 100L

X.

logical matrix X~\tilde{X} of kk columns, dichotomized predictors x~1,,x~k\tilde{x}_1,\cdots,\tilde{x}_k

Details

Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,

  1. Obtain the dichotomizing rules D\mathbf{\mathcal{D}} of predictors x1,,xkx_1,\cdots,x_k based on response yy (via m_rpartD). Multivariable regression (with additional predictors zz, if any) with dichotomized predictors (x~1,,x~k)=D(x1,,xk)\left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right) (via helper function coef_dichotom) is the apparent performance.

  2. Obtain the bootstrap-based optimism based on RR copies of bootstrap samples (via helper function optimism_dichotom). The median of bootstrap-based optimism over RR bootstrap copies is the optimism-correction of the dichotomized predictors x~1,,x~k\tilde{x}_1,\cdots,\tilde{x}_k.

  3. Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for x~1,,x~k\tilde{x}_1,\cdots,\tilde{x}_k. The apparent performance estimates for additional predictors zz's, if any, are not modified. Neither the variance-covariance (vcov) estimates nor the other regression diagnostics, e.g., residuals, logLikelihood, etc., of the apparent performance are modified for now. This coefficient-only, partially-modified regression model is the optimism-corrected performance.

Value

Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,

attr(,'optimism')

the returned object from optimism_dichotom

attr(,'apparent_cutoff')

a double vector, cutoff thresholds for the kk predictors in the apparent model

Details on Helper Functions

Bootstrap-Based Optimism

Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,

  1. RR copies of bootstrap samples are generated. In the jj-th bootstrap sample,

    1. obtain the dichotomizing rules D(j)\mathbf{\mathcal{D}}^{(j)} of predictors x1(j),,xk(j)x_1^{(j)},\cdots,x_k^{(j)} based on response y(j)y^{(j)} (via m_rpartD)

    2. multivariable regression (with additional predictors z(j)z^{(j)}, if any) coefficient estimates β^(j)=(β^1(j),,β^k(j))t\mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t of the dichotomized predictors (x~1(j),,x~k(j))=D(j)(x1(j),,xk(j))\left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right) (via coef_dichotom) are the bootstrap performance estimate.

  2. Dichotomize x1,,xkx_1,\cdots,x_k in the entire data using each of the bootstrap rules D(1),,D(R)\mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}. Multivariable regression (with additional predictors zz, if any) coefficient estimates β^[j]=(β^1[j],,β^k[j])t\mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t of the dichotomized predictors (x~1[j],,x~k[j])=D(j)(x1,,xk)\left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right) (via coef_dichotom) are the test performance estimate.

  3. Difference between the bootstrap and test performance estimates, an R×kR\times k matrix of (β^(1),,β^(R))\left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right) minus another R×kR\times k matrix of (β^[1],,β^[R])\left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right), are the bootstrap-based optimism.

Multivariable Regression Coefficient Estimates of Dichotomized Predictors x~\tilde{x}'s

Helper function coef_dichotom fits a multivariable Cox proportional hazards (coxph) model for Surv response, logistic (glm) regression model for logical response, or linear (lm) regression model for gaussian response, with the dichotomized predictors x~1,,x~k\tilde{x}_1,\cdots,\tilde{x}_k as well as the additional predictors zz's.

It is almost inevitable to have duplicates among the dichotomized predictors x~1,,x~k\tilde{x}_1,\cdots,\tilde{x}_k. In such case, the multivariable model is fitted using the unique x~\tilde{x}'s.

Returns of Helper Functions

Of helper function optimism_dichotom

Helper function optimism_dichotom returns an R×kR\times k double matrix of bootstrap-based optimism, with attributes

attr(,'cutoff')

an R×kR\times k double matrix, the RR copies of bootstrap cutoff thresholds for the kk predictors. See attribute 'cutoff' of function m_rpartD

Of helper function coef_dichotom

Helper function coef_dichotom returns a double vector of the regression coefficients of dichotomized predictors x~\tilde{x}'s, with attributes

attr(,'model')

the coxph, glm or lm regression model

In the case of duplicated x~\tilde{x}'s, the regression coefficients of the unique x~\tilde{x}'s are duplicated for those duplicates in x~\tilde{x}'s.

References

For helper function optimism_dichotom

Ewout W. Steyerberg (2009) Clinical Prediction Models. doi:10.1007/978-0-387-77244-8

Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

Examples

library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
  mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))

m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda, 
 data = flchain_Circulatory, R = 1e2L)
summary(m1)
matrixStats::colMedians(BBC_cutoff(m1)) # median bootstrap cutoff
attr(m1, 'apparent_cutoff')

Cluster-Specific Sample Quantiles

Description

Sample quantiles in each cluster of observations.

Usage

clusterQp(
  formula,
  data,
  f_sum_ = mean.default,
  probs = seq.int(from = 0.01, to = 0.99, by = 0.01),
  ...
)

Arguments

formula

formula, including response yy, cluster(s) cc's, cluster-specific covariate(s) xx's to be retained, and cluster-specific covariate(s) zz's to be removed from data, e.g.,

y ~ 1 | c1

cluster c1c_1, without cluster-specific covariate

y ~ 1 | c1/c2

cluster c1c_1, and cluster c2c_2 nested in c1c_1, without cluster-specific covariate

y ~ x1 + x2 | c1

cluster c1c_1, and cluster-specific covariates x1x_1 and x2x_2

y ~ . | c1

cluster c1c_1, and all (supposedly cluster-specific) covariates from data

y ~ . - z1 - z2 | c1

cluster c1c_1, and all (supposedly cluster-specific) covariates, except for z1z_1 and z2z_2, from data

data

data.frame

f_sum_

function to summarize the sample quantiles from lower-level cluster c2c_2 (if present), such as mean.default (default), median.default, max, min, etc.

probs

double vector, probabilities p=(p1,,pN)\mathbf{p} = (p_1,\cdots,p_N)' shared across all clusters, where the cluster-specific sample quantiles of response yy are calculated. Default seq(.01, .99, by = .01)

...

additional parameters of function quantile

Value

Function clusterQp returns an aggregated data.frame, in which

  • the highest cluster c1c_1 and cluster-specific covariate(s) xx's are retained.

    • If the input formula takes form of y ~ . | c1 or y ~ . - z1 | c1, then all covariates (except for z1z_1) are considered cluster-specific;

    • Sample quantiles from lower-level clusters (e.g., c2c_2) are point-wise summarized using function f_sum_.

  • response yy is removed; instead, a double matrix of NN columns stores the cluster-specific sample quantiles. This matrix

    • is named after the parsed expression of response yy in formula;

    • colnames are the probabilities p\mathbf{p}, for the ease of subsequent programming.

Examples

# see ?`Qindex-package` for examples

Integrand Surface(s) of Sign-Adjusted Quantile Indices Qindex

Description

An interactive htmlwidgets of the perspective plot for Qindex model(s) using package plotly.

Usage

integrandSurface(
  ...,
  newdata = data,
  proj_Q_p = TRUE,
  proj_S_p = TRUE,
  proj_beta = TRUE,
  n = 501L,
  newid = seq_len(min(50L, .row_names_info(newdata, type = 2L))),
  qlim = range(X, newX),
  axis_col = c("dodgerblue", "deeppink", "darkolivegreen"),
  beta_col = "purple",
  surface_col = c("white", "lightgreen")
)

Arguments

...

one or more Qindex models based on a same training set.

newdata

data.frame, with at least the response ynewy^{\text{new}} and the double matrix of functional predictor values XnewX^{\text{new}} of the test set. The predictor XnewX^{\text{new}} are tabulated on the same pp-grid as the training functional predictor values XX. If missing, the training set will be used.

proj_Q_p

logical scalar, whether to show the projection of S^(p,Qi(p))\hat{S}\big(p, Q_i(p)\big) (see sections Details and Value) to the (p,q)(p,q)-plain, default TRUE

proj_S_p

logical scalar, whether to show the projection of S^(p,Qi(p))\hat{S}\big(p, Q_i(p)\big) to the (p,s)(p,s)-plain, default TRUE

proj_beta

logical scalar, whether to show β^(p)\hat{\beta}(p) on the (p,s)(p,s)-plain when applicable, default TRUE

n

integer scalar, fineness of visualization, default 501L. See parameter n.grid of function vis.gam.

newid

integer scalar or vector, row indices of newdata to be visualized. Default 1:2, i.e., the first two test subjects. Use newid = NULL to disable visualization of newdata.

qlim

length-2 double vector, range on qq-axis. Default is the range of XX and XnewX^{\text{new}} combined.

axis_col

length-3 character vector, colors of the (p,q,s)(p,q,s) axes

beta_col

character scalar, color of β(p)^\hat{\beta(p)}

surface_col

length-2 character vector, color of the integrand surface(s), for lowest and highest surface values

Value

Function integrandSurface returns a pretty htmlwidgets created by R package plotly to showcase the perspective plot of the estimated sign-adjusted integrand surface S^(p,q)\hat{S}(p,q).

If a set of training/test subjects is selected (via parameter newid), then

  • the estimated sign-adjusted line integrand curve S^(p,Qi(p))\hat{S}\big(p, Q_i(p)\big) of subject ii is displayed on the surface S^(p,q)\hat{S}(p,q);

  • the quantile curve Qi(p)Q_i(p) is projected on the (p,q)(p,q)-plain of the 3-dimensional (p,q,s)(p,q,s) cube, if proj_Q_p=TRUE (default);

  • the user-specified p~\tilde{p} is marked on the (p,q)(p,q)-plain of the 3D cube, if proj_Q_p=TRUE (default);

  • S^(p,Qi(p))\hat{S}\big(p, Q_i(p)\big) is projected on the (p,s)(p,s)-plain of the 3-dimensional (p,q,s)(p,q,s) cube, if one and only one Qindex model is provided in in put argument ... and proj_S_p=TRUE (default);

  • the estimated linear functional coefficient β^(p)\hat{\beta}(p) is shown on the (p,s)(p,s)-plain of the 3D cube, if one and only one linear Qindex model is provided in input argument ... and proj_beta=TRUE (default).

Integrand Surface

The quantile index (QI),

QI=01β(p)Q(p)dp\text{QI}=\displaystyle\int_0^1\beta(p)\cdot Q(p)\,dp

with a linear functional coefficient β(p)\beta(p) can be estimated by fitting a functional generalized linear model (FGLM, James, 2002) to exponential-family outcomes, or by fitting a linear functional Cox model (LFCM, Gellar et al., 2015) to survival outcomes. More flexible non-linear quantile index (nlQI)

nlQI=01F(p,Q(p))dp\text{nlQI}=\displaystyle\int_0^1 F\big(p, Q(p)\big)\,dp

with a bivariate twice differentiable function F(,)F(\cdot,\cdot) can be estimated by fitting a functional generalized additive model (FGAM, McLean et al., 2014) to exponential-family outcomes, or by fitting an additive functional Cox model (AFCM, Cui et al., 2021) to survival outcomes.

The estimated integrand surface of quantile indices and non-linear quantile indices, defined on p[0,1]p\in[0,1] and qrange(Qi(p))q\in\text{range}\big(Q_i(p)\big) for all training subjects i=1,,ni=1,\cdots,n, is

S^0(p,q)={β^(p)qfor QIF^(p,q)for nlQI\hat{S}_0(p,q) = \begin{cases} \hat{\beta}(p)\cdot q & \text{for QI}\\ \hat{F}(p,q) & \text{for nlQI} \end{cases}

Sign-Adjustment

Ideally, we would wish that, in the training set, the estimated linear and/or non-linear quantile indices

QI^i=01S^0(p,Qi(p))dp\widehat{\text{QI}}_i = \displaystyle\int_0^1 \hat{S}_0\big(p, Q_i(p)\big)dp

be positively correlated with a more intuitive quantity, e.g., quantiles Qi(p~)Q_i(\tilde{p}) at a user-specified p~\tilde{p}, for the interpretation of downstream analysis, Therefore, we define the sign-adjustment term

c^=sign(corr(Qi(p~),QI^i)),i=1,,n\hat{c} = \text{sign}\left(\text{corr}\left(Q_i(\tilde{p}), \widehat{\text{QI}}_i\right)\right),\quad i =1,\cdots,n

as the sign of the correlation between the estimated quantile index QI^i\widehat{\text{QI}}_i and the quantile Qi(p~)Q_i(\tilde{p}), for training subjects i=1,,ni=1,\cdots,n.

The estimated sign-adjusted integrand surface is S^(p,q)=c^S^0(p,q)\hat{S}(p,q) = \hat{c} \cdot \hat{S}_0(p,q).

The estimated sign-adjusted quantile indices 01S^(p,Qi(p))dp\int_0^1 \hat{S}\big(p, Q_i(p)\big)dp are positively correlated with subject-specific sample medians (default p~=.5\tilde{p} = .5) in the training set.

Note

The maintainer is not aware of any functionality of projection of arbitrary curves in package plotly. Currently, the projection to (p,q)(p,q)-plain is hard coded on (p,q,s=min(s))(p,q,s=\text{min}(s))-plain.

References

James, G. M. (2002). Generalized Linear Models with Functional Predictors, doi:10.1111/1467-9868.00342

Gellar, J. E., et al. (2015). Cox regression models with functional covariates for survival data, doi:10.1177/1471082X14565526

Mathew W. M., et al. (2014) Functional Generalized Additive Models, doi:10.1080/10618600.2012.729985

Cui, E., et al. (2021). Additive Functional Cox Model, doi:10.1080/10618600.2020.1853550

Examples

# see ?`Qindex-package`

Optimal Dichotomizing Predictors via Repeated Sample Splits

Description

To identify the optimal dichotomizing predictors using repeated sample splits.

Usage

optimSplit_dichotom(
  formula,
  data,
  include = quote(p1 > 0.15 & p1 < 0.85),
  top = 1L,
  nsplit,
  ...
)

split_dichotom(y, x, id, ...)

splits_dichotom(y, x, ids = rSplit(y, ...), ...)

## S3 method for class 'splits_dichotom'
quantile(x, probs = 0.5, ...)

Arguments

formula, y, x

formula, e.g., y~X or y~x1+x2. Response yy may be double, logical and Surv. Candidate numeric predictors xx's may be specified as the columns of one matrix column, e.g., y~X; or as several vector columns, e.g., y~x1+x2. In helper functions, x is a numeric vector.

data

data.frame

include

(optional) language, inclusion criteria. Default (p1>.15 & p1<.85) specifies a user-desired range of p1p_1 for the candidate dichotomizing predictors. See explanation of p1p_1 in section Returns of Helper Functions.

top

positive integer scalar, number of optimal dichotomizing predictors, default 1L

nsplit, ...

additional parameters for function rSplit

id

logical vector for helper function split_dichotom, indices of training (TRUE) and test (FALSE) subjects

ids

(optional) list of logical vectors for helper function splits_dichotom, multiple copies of indices of repeated training-test sample splits.

probs

double scalar for helper function quantile.splits_dichotom, see quantile

Details

Function optimSplit_dichotom identifies the optimal dichotomizing predictors via repeated sample splits. Specifically,

  1. Generate multiple, i.e., repeated, training-test sample splits (via rSplit)

  2. For each candidate predictor xix_i, find the median-split-dichotomized regression model based on the repeated sample splits, see details in section Details on Helper Functions

  3. Limit the selection of the candidate predictors xx's to a user-desired range of p1p_1 of the split-dichotomized regression models, see explanations of p1p_1 in section Returns of Helper Functions

  4. Rank the candidate predictors xx's by the decreasing order of the absolute values of the regression coefficient estimate of the median-split-dichotomized regression models. On the top of this rank are the optimal dichotomizing predictors.

Value

Function optimSplit_dichotom returns an object of class 'optimSplit_dichotom', which is a list of dichotomizing functions, with the input formula and data as additional attributes.

Details on Helper Functions

Split-Dichotomized Regression Model

Helper function split_dichotom performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set. Specifically, given a training-test sample split,

  1. find the dichotomizing rule D\mathcal{D} of the predictor x0x_0 given the response y0y_0 in the training set (via rpartD);

  2. fit a univariable regression model of the response y1y_1 with the dichotomized predictor D(x1)\mathcal{D}(x_1) in the test set.

Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.

Split-Dichotomized Regression Models based on Repeated Training-Test Sample Splits

Helper function splits_dichotom fits multiple split-dichotomized regression models split_dichotom on the response yy and predictor xx, based on each copy of the repeated training-test sample splits.

Quantile of Split-Dichotomized Regression Models

Helper function quantile.splits_dichotom is a method dispatch of the S3 generic function quantile on splits_dichotom object. Specifically,

  1. collect the univariable regression coefficient estimate from each one of the split-dichotomized regression models;

  2. find the nearest-even (i.e., type = 3) quantile of the coefficients from Step 1. By default, we use the median (i.e., prob = .5);

  3. the split-dichotomized regression model corresponding to the selected coefficient quantile in Step 2, is returned.

Returns of Helper Functions

Helper function split_dichotom returns a split-dichotomized regression model, which is either a Cox proportional hazards (coxph), a logistic (glm), or a linear (lm) regression model, with additional attributes

attr(,'rule')

function, dichotomizing rule D\mathcal{D} based on the training set

attr(,'text')

character scalar, human-friendly description of D\mathcal{D}

attr(,'p1')

double scalar, p1=Pr(D(x1)=1)p_1 = \text{Pr}(\mathcal{D}(x_1)=1)

attr(,'coef')

double scalar, univariable regression coefficient estimate of y1D(x1)y_1\sim\mathcal{D}(x_1)

Helper function splits_dichotom returns a list of split-dichotomized regression models (split_dichotom).

Helper function quantile.splits_dichotom returns a split-dichotomized regression model (split_dichotom).

Examples

# see ?`Qindex-package`

Regression Models with Optimal Dichotomizing Predictors

Description

Regression models with optimal dichotomizing predictor(s), used either as boolean or continuous predictor(s).

Usage

## S3 method for class 'optimSplit_dichotom'
predict(
  object,
  formula = attr(object, which = "formula", exact = TRUE),
  newdata = attr(object, which = "data", exact = TRUE),
  boolean = TRUE,
  ...
)

Arguments

object

an optimSplit_dichotom object

formula

(optional) formula to specify the response in test data. If missing, the model formula of training data is used

newdata

(optional) test data.frame, candidate numeric predictors xx's must have the same name and dimension as the training data. If missing, the training data is used

boolean

logical scalar, whether to use the dichotomized predictor (default, TRUE), or the continuous predictor (FALSE)

...

additional parameters, currently not in use

Value

Function predict.optimSplit_dichotom returns a list of regression models, coxph model for Surv response, glm for logical response, and lm model for numeric response.

Examples

# see ?`Qindex-package`

Predicted Sign-Adjusted Quantile Indices

Description

To predict sign-adjusted quantile indices of a test set.

Usage

## S3 method for class 'Qindex'
predict(object, newdata = object@gam$data, ...)

Arguments

object

an Qindex object based on the training set.

newdata

test data.frame, with at least the response ynewy^{\text{new}} and the double matrix of functional predictor values XnewX^{\text{new}} of the test set, tabulated on the same pp-grid as the training set XX. If missing, the training set object@gam$data will be used.

...

additional parameters, currently not in use.

Details

Function predict.Qindex computes the predicted sign-adjusted quantile indices on the test set, which is the product of function predict.gam return and the correlation sign based on training set (object@sign, see Step 3 of section Details of function Qindex). Multiplication by object@sign is required to ensure that the predicted sign-adjusted quantile indices are positively associated with the training functional predictor values at the selected tabulating grid.

Value

Function predict.Qindex returns a double vector, which is the predicted sign-adjusted quantile indices on the test set.


Sign-Adjusted Quantile Indices

Description

Sign-adjusted quantile indices based on linear and/or nonlinear functional predictors.

Usage

Qindex(formula, data, sign_prob = 0.5, ...)

Qindex_prefit_(formula, data, family, nonlinear = FALSE, ...)

Arguments

formula

formula, e.g., y~X. Response yy may be double, logical and Surv. Functional predictor XX is a tabulated double matrix; the rows of XX correspond to the subjects, while the columns of XX correspond to a common tabulating grid shared by all subjects. The numeric values of the grid are in the colnames of XX

data

data.frame, must be a returned object from function clusterQp

sign_prob

double scalar between 0 and 1, user-specified probability p~\tilde{p} for the nearest-even quantile in the grid, which is used to determine the sign-adjustment. Default is .5, i.e., the nearest-even median of the grid

...

additional parameters for functions s and ti, most importantly k

family

family object, see function gam. Default values are

  • mgcv::cox.ph() for Surv response yy;

  • binomial(link = 'logit') for logical response yy;

  • gaussian(link = 'identity') for double response yy

nonlinear

logical scalar, whether to use nonlinear or linear functional model. Default FALSE

Value

Function Qindex returns an Qindex object, which is an instance of an S4 class. See section Slots for details.

Slots

.Data

double vector, sign-adjusted quantile indices, see section Details of function integrandSurface

formula

see section Arguments, parameter formula

gam

a gam object

gpf

a 'gam.prefit' object, which is the returned object from function gam with argument fit = FALSE

p.value

numeric scalar, pp-value for the test of significance of the functional predictor, based on slot ⁠@gam⁠

sign

double scalar of either 1 or -1, sign-adjustment, see section Details of function integrandSurface

sign_prob

double scalar, section Arguments, parameter sign_prob

Examples

# see ?`Qindex-package`