Package 'SubgrpID'

Title:	Patient Subgroup Identification for Clinical Drug Development
Description:	Implementation of Sequential BATTing (bootstrapping and aggregating of thresholds from trees) for developing threshold-based multivariate (prognostic/predictive) biomarker signatures. Variable selection is automatically built-in. Final signatures are returned with interaction plots for predictive signatures. Cross-validation performance evaluation and testing dataset results are also output. Detail algorithms are described in Huang et al (2017) <doi:10.1002/sim.7236>.
Authors:	Xin Huang [aut, cre, cph], Yan Sun [aut], Saptarshi Chatterjee [aut], Paul Trow [aut]
Maintainer:	Xin Huang <[email protected]>
License:	GPL (>= 2)
Version:	0.12
Built:	2024-10-31 06:26:16 UTC
Source:	CRAN

balanced.folds

Description

Create balanced folds for cross-validation.

Usage

balanced.folds(y, nfolds = min(min(table(y)), 10))
balanced.folds(y, nfolds = min(min(table(y)), 10))

Arguments

`y`	the response vector
`nfolds`	number of folds

Details

Create balanced folds for cross-validation.

Value

This function returns balanced folds

batting.pred

Description

Main predictive BATTing function

Usage

batting.pred(
  dataset,
  ids,
  yvar,
  censorvar,
  trtvar,
  type,
  class.wt,
  xvar,
  n.boot,
  des.res,
  min.sigp.prcnt
)
batting.pred(
  dataset,
  ids,
  yvar,
  censorvar,
  trtvar,
  type,
  class.wt,
  xvar,
  n.boot,
  des.res,
  min.sigp.prcnt
)

Arguments

`dataset`	input dataset in data frame
`ids`	training indices
`yvar`	response variable name
`censorvar`	censoring variable name 1:event; 0: censor.
`trtvar`	treatment variable name
`type`	"c" continuous; "s" survival; "b" binary
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1)
`xvar`	name of predictor for which cutpoint needs to be obtained
`n.boot`	number of bootstraps for BATTing step.
`des.res`	the desired response. "larger": prefer larger response. "smaller": prefer smaller response.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Main predictive BATTing function

Value

a signature rule consisting of variable name, direction, optimal cutpoint and the corresponding p-value.

batting.prog

Description

Main prognostic BATTing function

Usage

batting.prog(
  dataset,
  ids,
  yvar,
  censorvar,
  type,
  class.wt,
  xvar,
  n.boot,
  des.res,
  min.sigp.prcnt
)
batting.prog(
  dataset,
  ids,
  yvar,
  censorvar,
  type,
  class.wt,
  xvar,
  n.boot,
  des.res,
  min.sigp.prcnt
)

Arguments

`dataset`	input dataset in data frame
`ids`	training indices
`yvar`	response variable name
`censorvar`	censoring variable name 1:event; 0: censor.
`type`	"c" continuous; "s" survival; "b" binary
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1)
`xvar`	name of predictor for which cutpoint needs to be obtained
`n.boot`	number of bootstraps for BATTing step.
`des.res`	the desired response. "larger": prefer larger response. "smaller": prefer smaller response.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Main prognostic BATTing function

Value

a signature rule consisting of variable name, direction, optimal cutpoint and the corresponding p-value.

binary.stats

Description

A function for binary statistics

Usage

binary.stats(pred.class, y.vec)
binary.stats(pred.class, y.vec)

Arguments

`pred.class`	predicted output for each subject
`y.vec`	response vector

Details

A function for binary statistics

Value

a data frame with sensitivity, specificity, NPV, PPV and accuracy

cv.folds

Description

Cross-validation folds.

Usage

cv.folds(n, folds = 10)
cv.folds(n, folds = 10)

Arguments

`n`	number of observations.
`folds`	number of folds.

Details

Cross-validation folds.

Value

a list containing the observation numbers for each fold.

cv.pval

Description

p-value calculation for each iteration of cross validation.

Usage

cv.pval(yvar, censorvar = NULL, trtvar = NULL, data, type = "s")
cv.pval(yvar, censorvar = NULL, trtvar = NULL, data, type = "s")

Arguments

`yvar`	response variable name.
`censorvar`	censor-variable name.
`trtvar`	treatment variable name. For prognostic case trtvar=NULL.
`data`	dataset containing response and predicted output.
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c".

Details

p-value calculation for each iteration of cross validation.

Value

p-value based on response and prediction vector for each iteration.

cv.seqlr.batting

Description

Cross Validation for Sequential BATTing

Usage

cv.seqlr.batting(
  y,
  x,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type = "c",
  n.boot = 50,
  des.res = "larger",
  class.wt = c(1, 1),
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL,
  k.fold = 5,
  cv.iter = 50,
  max.iter = 500
)
cv.seqlr.batting(
  y,
  x,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type = "c",
  n.boot = 50,
  des.res = "larger",
  class.wt = c(1, 1),
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL,
  k.fold = 5,
  cv.iter = 50,
  max.iter = 500
)

Arguments

`y`	data frame containing the response
`x`	data frame containing the predictors
`censor.vec`	vector giving the censor status (only for TTE data , censor=0,event=1) : default = NULL
`trt.vec`	vector containing values of treatment variable ( for predictive signature). Set trt.vec to NULL for prognostic signature.
`trtref`	code for treatment arm.
`type`	data type. "c" - continuous , "b" - binary, "s" - time to event : default = "c".
`n.boot`	number of bootstraps in BATTing step.
`des.res`	the desired response. "larger": prefer larger response. "smaller": prefer smaller response
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1)
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.
`pre.filter`	NULL, no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected.
`filter.method`	NULL, no prefiltering, "univariate", univaraite filtering; "glmnet", glmnet filtering, "unicart": univariate rpart filtering for prognostic case.
`k.fold`	number of folds for CV.
`cv.iter`	algorithm terminates after cv.iter successful iterations of cross-validation.
`max.iter`	total number of iterations allowed (including unsuccessful ones).

Details

Cross Validation for Sequential BATTing

Value

a list containing with following entries:

stats.summary: Summary of performance statistics.
pred.classes: Data frame containing the predictive clases (TRUE/FALSE) for each iteration.
folds: Data frame containing the fold indices (index of the fold for each row) for each iteration.
sig.list: List of length cv.iter * k.fold containing the signature generated at each of the k folds, for all iterations.
error.log: List of any error messages that are returned at an iteration.
interplot: Treatment*subgroup interaction plot for predictive case

data.gen

Description

Function for simulated data generation

Usage

data.gen(
  n,
  k,
  prevalence = sqrt(0.5),
  prog.eff = 1,
  sig2,
  y.sig2,
  rho,
  rhos.bt.real,
  a.constent
)
data.gen(
  n,
  k,
  prevalence = sqrt(0.5),
  prog.eff = 1,
  sig2,
  y.sig2,
  rho,
  rhos.bt.real,
  a.constent
)

Arguments

`n`	Total sample size
`k`	Number of markers
`prevalence`	prevalence of predictive biomarkers with values above the cutoff
`prog.eff`	effect size $beta$ for prognostic biomarker
`sig2`	standard deviation of each marker
`y.sig2`	Standard Deviation of the error term in the linear component
`rho`	rho*sig2 is the entries for covariance matrix between pairs of different k markers
`rhos.bt.real`	correlation between each prognostic and predictive markers
`a.constent`	a constant is set such that there is no overall treatment effect

Details

Function for simulated data generation

Value

A list of simulated clinical trial data with heterogeneous prognostic and predictive biomarkers

Examples

n <- 500
k <- 10
prevalence <- sqrt(0.5)
rho<-0.2
sig2 <- 2
rhos.bt.real <- c(0, rep(0.1, (k-3)))*sig2
y.sig2 <- 1
prog.eff <- 0.5
effect.size <- 1
a.constent <- effect.size/(2*(1-prevalence))
ObsData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                    sig2=sig2, y.sig2=y.sig2, rho=rho,
                    rhos.bt.real=rhos.bt.real, a.constent=a.constent)
n <- 500
k <- 10
prevalence <- sqrt(0.5)
rho<-0.2
sig2 <- 2
rhos.bt.real <- c(0, rep(0.1, (k-3)))*sig2
y.sig2 <- 1
prog.eff <- 0.5
effect.size <- 1
a.constent <- effect.size/(2*(1-prevalence))
ObsData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                    sig2=sig2, y.sig2=y.sig2, rho=rho,
                    rhos.bt.real=rhos.bt.real, a.constent=a.constent)

evaluate.cv.results

Description

Take the raw output of kfold.cv and calculate performance statistics for each iteration of the cross-validation.

Usage

evaluate.cv.results(cv.data, y, censor.vec, trt.vec, type)
evaluate.cv.results(cv.data, y, censor.vec, trt.vec, type)

Arguments

`cv.data`	output of prediction function from kfold.cv
`y`	data frame of the response variable from CV data.
`censor.vec`	data frame indicating censoring for survival data. For binary or continuous data, set censor.vec <- NULL.
`trt.vec`	data frame indicating whether or not the patient was treated. For the pronostic case, set trt.vec <- NULL.
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c"

Details

Cross-validation Performance Evaluation

Value

a list containing raw statistics and fold information

evaluate.results

Description

Get statistics for a single set of predictions.

Usage

evaluate.results(
  y,
  predict.data,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type
)
evaluate.results(
  y,
  predict.data,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type
)

Arguments

`y`	data frame of the response variable.
`predict.data`	output of prediction function from kfold.cv.
`censor.vec`	data frame indicating censoring for survival data. For binary or continuous data, set censor.vec <- NULL.
`trt.vec`	data frame indicating whether or not the patient was treated. For the pronostic case, set trt.vec <- NULL.
`trtref`	treatment reference.
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c".

Details

Get statistics for a single set of predictions.

Value

a list containing p-value and group statistics.

filter

Description

Filter function for Prognostic and preditive biomarker signature development for Exploratory Subgroup Identification in Randomized Clinical Trials

Usage

filter(
  data,
  type = "c",
  yvar,
  xvars,
  censorvar = NULL,
  trtvar = NULL,
  trtref = 1,
  n.boot = 50,
  cv.iter = 20,
  pre.filter = length(xvars),
  filter.method = NULL
)
filter(
  data,
  type = "c",
  yvar,
  xvars,
  censorvar = NULL,
  trtvar = NULL,
  trtref = 1,
  n.boot = 50,
  cv.iter = 20,
  pre.filter = length(xvars),
  filter.method = NULL
)

Arguments

`data`	input data frame
`type`	type of response variable: "c" continuous; "s" survival; "b" binary
`yvar`	variable (column) name for response variable
`xvars`	vector of variable names for predictors (covariates)
`censorvar`	variable name for censoring (1: event; 0: censor), default = NULL
`trtvar`	variable name for treatment variable, default = NULL (prognostic signature)
`trtref`	coding (in the column of trtvar) for treatment arm, default = 1 (no use for prognostic signature)
`n.boot`	number of bootstrap for the BATTing procedure
`cv.iter`	Algotithm terminates after cv.iter successful iterations of cross-validation, or after max.iter total iterations, whichever occurs first
`pre.filter`	NULL (default), no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected
`filter.method`	NULL (default), no prefiltering; "univariate", univaraite filtering; "glmnet", glmnet filtering

Details

Filter function for predictive/prognostic biomarker candidates for signature development

The function contains two algorithms for filtering high-dimentional multivariate (prognostic/predictive) biomarker candidates via univariate fitering (used p-values of group difference for prognostic case, p-values of interaction term for predictive case); LASSO/Elastic Net method. (Tian L. et al 2012)

Value

var

a vector of filter results of variable names

References

Tian L, Alizadeh A, Gentles A, Tibshirani R (2012) A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates. J Am Stat Assoc. 2014 Oct; 109(508): 1517-1532.

Examples

# no run
# no run

filter.glmnet

Description

Flitering using MC glmnet

Usage

filter.glmnet(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref,
  n.boot = 50,
  cv.iter = 20,
  pre.filter = length(xvars)
)
filter.glmnet(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref,
  n.boot = 50,
  cv.iter = 20,
  pre.filter = length(xvars)
)

Arguments

`data`	input data frame
`type`	"c" continuous; "s" survival; "b" binary
`yvar`	response variable name
`xvars`	covariates variable name
`censorvar`	censoring variable name 1:event; 0: censor.
`trtvar`	treatment variable name
`trtref`	code for treatment arm
`n.boot`	number of bootstrap for filtering
`cv.iter`	number of iterations required for MC glmnet filtering
`pre.filter`	NULL, no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected

Details

Flitering using MC glmnet

Value

variables selected after glmnet filtering

filter.unicart

Description

rpart filtering

Usage

filter.unicart(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref = 1,
  pre.filter = length(xvars)
)
filter.unicart(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref = 1,
  pre.filter = length(xvars)
)

Arguments

`data`	input data frame
`type`	"c" continuous; "s" survival; "b" binary
`yvar`	response variable name
`xvars`	covariates variable name
`censorvar`	censoring variable name 1:event; 0: censor.
`trtvar`	treatment variable name
`trtref`	code for treatment arm
`pre.filter`	NULL, no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected

Details

rpart filtering (only for prognostic case)

Value

selected covariates after rpart filtering

filter.univariate

Description

Univariate Filtering

Usage

filter.univariate(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref = 1,
  pre.filter = length(xvars)
)
filter.univariate(
  data,
  type,
  yvar,
  xvars,
  censorvar,
  trtvar,
  trtref = 1,
  pre.filter = length(xvars)
)

Arguments

`data`	input data frame
`type`	"c" continuous; "s" survival; "b" binary
`yvar`	response variable name
`xvars`	covariates variable name
`censorvar`	censoring variable name 1:event; 0: censor.
`trtvar`	treatment variable name
`trtref`	code for treatment arm
`pre.filter`	NULL, no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected

Details

Univariate Filtering

Value

covariate names after univariate filtering.

find.pred.stats

Description

Find predictive stats from response and prediction vector

Usage

find.pred.stats(data, yvar, trtvar, type, censorvar)
find.pred.stats(data, yvar, trtvar, type, censorvar)

Arguments

`data`	data frame with response and prediction vector
`yvar`	response variable name
`trtvar`	treatment variable name
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c".
`censorvar`	censoring variable name

Details

Find predictive stats from response and prediction vector

Value

a data frame of predictive statistics

find.prog.stats

Description

Find prognostic stats from response and prediction vector

Usage

find.prog.stats(data, yvar, type, censorvar)
find.prog.stats(data, yvar, type, censorvar)

Arguments

`data`	data frame with response and prediction vector
`yvar`	response variable name
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c".
`censorvar`	censoring variable name

Details

Find prognostic stats from response and prediction vector

Value

a data frame of predictive statistics

get.var.counts.seq

Description

Get signature variables from output of seqlr.batting.

Usage

get.var.counts.seq(sig.list, xvars)
get.var.counts.seq(sig.list, xvars)

Arguments

`sig.list`	signature list returned by seqlr.batting.
`xvars`	predictor variable names

Value

the variables included in signature rules returned by seqlr.batting

interaction.plot

Description

A function for interaction plot

Usage

interaction.plot(
  data.eval,
  type,
  main = "Interaction Plot",
  trt.lab = c("Trt.", "Ctrl.")
)
interaction.plot(
  data.eval,
  type,
  main = "Interaction Plot",
  trt.lab = c("Trt.", "Ctrl.")
)

Arguments

`data.eval`	output of evaluate.results or summarize.cv.stats
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c".
`main`	title of the plot
`trt.lab`	treatment label

Details

A function for interaction plot

Value

A ggplot object.

kfold.cv

Description

Perform k-fold cross-validation of a model.

Usage

kfold.cv(
  data,
  model.Rfunc,
  model.Rfunc.args,
  predict.Rfunc,
  predict.Rfunc.args,
  k.fold = 5,
  cv.iter = 50,
  strata,
  max.iter = 500
)
kfold.cv(
  data,
  model.Rfunc,
  model.Rfunc.args,
  predict.Rfunc,
  predict.Rfunc.args,
  k.fold = 5,
  cv.iter = 50,
  strata,
  max.iter = 500
)

Arguments

`data`	the CV data
`model.Rfunc`	Name of the model function.
`model.Rfunc.args`	List of input arguments to model.Rfunc.
`predict.Rfunc`	Name of the prediction function, which takes the prediction rule returned by model.Rfunc along with any input data (not necessarily the input data to kfold.cv) and returns a TRUE-FALSE predictionvector specifying the positive and negative classes for the data.
`predict.Rfunc.args`	List containing input arguments to predict.Rfunc, except for data and predict.rule.
`k.fold`	Number of folds of the cross-validation.
`cv.iter`	Number of iterations of the cross-validation. If model.Rfunc returns an error at any of the k.fold calls, the current iteration is aborted. Iterations are repeated until cv.iter successful iterations have occurred.
`strata`	Stratification vector of length the number of rows of data, usually corresponding to the vector of events.
`max.iter`	Function stops after max.iter iterations even if cv.iter successful iterations have not occurred.

Details

Perform k-fold cross-validation of a model.

Value

List of length 2 with the following fields:

cv.data - List of length cv.iter. Entry i contains the output of predict.Rfunc at the ith iteration.

sig.list - list of length cv.iter * k.fold, whose entries are the prediction.rules (signatures) returned by model.Rfunc at each k.fold iteration.

make.arg.list

Description

Create a list of variables corresponding to the arguments of the function func.name and assigns values.

Usage

make.arg.list(func.name)
make.arg.list(func.name)

Arguments

func.name

function name

Details

Create a list of variables corresponding to the arguments of the function func.name and assigns values.

Value

list of variables corresponding to the arguments of the function

permute.rows

Description

Randomly permute the rows of a matrix.

Usage

permute.rows(A)
permute.rows(A)

Arguments

`A`	a matrix for which its rows have to be permuted.

Details

Randomly permute the rows of a matrix.

Value

the matrix with permuted rows.

permute.vector

Description

Randomly permute the entries of a vector.

Usage

permute.vector(x)
permute.vector(x)

Arguments

`x`	the vector for which its entries have to be permuted

Details

Randomly permute the entries of a vector.

Value

the permuted vector

pred.seqlr

Description

Assign positive and negative groups based on predict.rule, the output of seqlr.batting.

Usage

pred.seqlr(x, predict.rule)
pred.seqlr(x, predict.rule)

Arguments

`x`	input predictors matrix
`predict.rule`	Prediction rule returned by seqlr.batting.

Details

Prediction function for Sequential BATTing

Value

a logical vector indicating the prediction for each row of data.

pred.seqlr.cv

Description

Assign positive and negative groups for cross-validation data given prediction rule in predict.rule.

Usage

pred.seqlr.cv(data, predict.rule, args)
pred.seqlr.cv(data, predict.rule, args)

Arguments

`data`	input data frame
`predict.rule`	Prediction rule returned by seqlr.batting.
`args`	Prediction rule arguments

Details

Prediction function for CV Sequential BATTing

Value

a logical vector indicating the prediction for each row of data.

query.data

Description

internal function used in seqlr.batting

Usage

query.data(data, rule)
query.data(data, rule)

Arguments

`data`	the given dataset
`rule`	rule is a vector of the form [x-variable, direction, cutoff, p-value]

Details

internal function used in seqlr.batting

Value

a logical variable indicating whether rules are satisfied or not.

resample

Description

Creates a permutation of given size.

Usage

resample(x, size, ...)
resample(x, size, ...)

Arguments

`x`	the x vector.
`size`	resampling size.
`...`	optional argument.

Details

Creates a permutation of given size.

Value

A resample of x is returned.

seqlr.batting

Description

Perform sequential BATTing method.

Usage

seqlr.batting(
  y,
  x,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type = "c",
  n.boot = 50,
  des.res = "larger",
  class.wt = c(1, 1),
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL
)
seqlr.batting(
  y,
  x,
  censor.vec = NULL,
  trt.vec = NULL,
  trtref = NULL,
  type = "c",
  n.boot = 50,
  des.res = "larger",
  class.wt = c(1, 1),
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL
)

Arguments

`y`	data frame containing the response.
`x`	data frame containing the predictors.
`censor.vec`	vector containing the censor status (only for TTE data , censor=0,event=1) - default = NULL.
`trt.vec`	vector containing values of treatment variable ( for predictive signature). Set trt.vec to NULL for prognostic signature.
`trtref`	code for treatment arm.
`type`	data type. "c" - continuous , "b" - binary, "s" - time to event : default = "c".
`n.boot`	number of bootstraps in BATTing step.
`des.res`	the desired response. "larger": prefer larger response. "smaller": prefer smaller response
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1)
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.
`pre.filter`	NULL, no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected
`filter.method`	NULL, no prefiltering, "univariate", univaraite filtering; "glmnet", glmnet filtering, "unicart": univariate rpart filtering for prognostic case.

Details

Perform sequential BATTing method.

Value

it returns a list of signature rules consisting of variable names, directions, thresholds and the loglikelihood at each step the signatures are applied.

seqlr.batting.wrapper

Description

Wrapper function for seqlr.batting, to be passed to kfold.cv.

Usage

seqlr.batting.wrapper(data, args)
seqlr.batting.wrapper(data, args)

Arguments

`data`	data frame equal to cbind(y, x, trt, censor), where y and x are inputs to seqlr.batting.
`args`	list containing all other input arguments to seq.batting except for x and y. Also contains xvars=names(x) and yvar=names(y).

Details

Wrapper function for seqlr.batting, to be passed to kfold.cv.

Value

prediction rule returned by seqlr.batting.

seqlr.find.cutoff.pred

Description

Find cutoff for predictive case.

Usage

seqlr.find.cutoff.pred(
  data,
  yvar,
  censorvar,
  xvar,
  trtvar,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)
seqlr.find.cutoff.pred(
  data,
  yvar,
  censorvar,
  xvar,
  trtvar,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)

Arguments

`data`	input data frame.
`yvar`	response variable name.
`censorvar`	censoring variable name.
`xvar`	name of predictor for which cutpoint needs to be obtained.
`trtvar`	treatment variable name.
`type`	"c" continuous; "s" survival; "b" binary.
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1).
`dir`	direction of cut.
`nsubj`	number of subjects.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Find cutoff for predictive case.

Value

the optimal score (p-value of subgroup*treatment interaction) for a predictor variable.

seqlr.find.cutoff.prog

Description

Find cutoff for prognostic case.

Usage

seqlr.find.cutoff.prog(
  data,
  yvar,
  censorvar,
  xvar,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)
seqlr.find.cutoff.prog(
  data,
  yvar,
  censorvar,
  xvar,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)

Arguments

`data`	input data frame.
`yvar`	response variable name.
`censorvar`	censoring variable name.
`xvar`	name of predictor for which cutpoint needs to be obtained.
`type`	"c" continuous; "s" survival; "b" binary.
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1).
`dir`	direction of cut.
`nsubj`	number of subjects.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Find cutoff for prognostic case.

Value

the optimal score (p-value of main effect) for a predictor variable.

seqlr.score.pred

Description

Compute score of cutoff for predictive case

Usage

seqlr.score.pred(
  data,
  yvar,
  censorvar,
  xvar,
  trtvar,
  cutoff,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)
seqlr.score.pred(
  data,
  yvar,
  censorvar,
  xvar,
  trtvar,
  cutoff,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)

Arguments

`data`	input data frame.
`yvar`	response variable name.
`censorvar`	censoring variable name.
`xvar`	name of predictor for which cutpoint needs to be obtained.
`trtvar`	treatment variable name.
`cutoff`	a specific cutpoint for which the score needs to be computed.
`type`	"c" continuous; "s" survival; "b" binary.
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1).
`dir`	direction of cut.
`nsubj`	number of subjects.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Compute score of cutoff for predictive case

Value

score (p-value of treatment*subgroup interaction) for the given cutoff.

seqlr.score.prog

Description

Compute score of cutoff for prognostic case

Usage

seqlr.score.prog(
  data,
  yvar,
  censorvar,
  xvar,
  cutoff,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)
seqlr.score.prog(
  data,
  yvar,
  censorvar,
  xvar,
  cutoff,
  type,
  class.wt,
  dir,
  nsubj,
  min.sigp.prcnt
)

Arguments

`data`	input data frame.
`yvar`	response variable name.
`censorvar`	censoring variable name.
`xvar`	name of predictor for which cutpoint needs to be obtained.
`cutoff`	a specific cutpoint for which the score needs to be computed.
`type`	"c" continuous; "s" survival; "b" binary.
`class.wt`	vector of length 2 used to weight the accuracy score , useful when there is class imbalance in binary data defaults to c(1,1).
`dir`	direction of cut.
`nsubj`	number of subjects.
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff.

Details

Compute score of cutoff for prognostic case

Value

score (p-value of main effect) for the given cutoff.

SubgrpID

Description

Exploratory Subgroup Identification main function

Usage

SubgrpID(
  data.train,
  data.test = NULL,
  yvar,
  censorvar = NULL,
  trtvar = NULL,
  trtref = NULL,
  xvars,
  type = "c",
  n.boot = 25,
  des.res = "larger",
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL,
  k.fold = 5,
  cv.iter = 20,
  max.iter = 500,
  mc.iter = 20,
  method = c("Seq.BT"),
  do.cv = FALSE,
  out.file = NULL,
  file.path = "",
  plots = FALSE
)
SubgrpID(
  data.train,
  data.test = NULL,
  yvar,
  censorvar = NULL,
  trtvar = NULL,
  trtref = NULL,
  xvars,
  type = "c",
  n.boot = 25,
  des.res = "larger",
  min.sigp.prcnt = 0.2,
  pre.filter = NULL,
  filter.method = NULL,
  k.fold = 5,
  cv.iter = 20,
  max.iter = 500,
  mc.iter = 20,
  method = c("Seq.BT"),
  do.cv = FALSE,
  out.file = NULL,
  file.path = "",
  plots = FALSE
)

Arguments

`data.train`	data frame for training dataset
`data.test`	data frame for testing dataset, default = NULL
`yvar`	variable (column) name for response variable
`censorvar`	variable name for censoring (1: event; 0: censor), default = NULL
`trtvar`	variable name for treatment variable, default = NULL (prognostic signature)
`trtref`	coding (in the column of trtvar) for treatment arm
`xvars`	vector of variable names for predictors (covariates)
`type`	type of response variable: "c" continuous; "s" survival; "b" binary
`n.boot`	number of bootstrap for batting procedure, or the variable selection procedure for PRIM; for PRIM, when n.boot=0, bootstrapping for variable selection is not conducted
`des.res`	the desired response. "larger": prefer larger response. "smaller": prefer smaller response
`min.sigp.prcnt`	desired proportion of signature positive group size for a given cutoff
`pre.filter`	NULL (default), no prefiltering conducted;"opt", optimized number of predictors selected; An integer: min(opt, integer) of predictors selected
`filter.method`	NULL (default), no prefiltering; "univariate", univaraite filtering; "glmnet", glmnet filtering; "unicart", univariate rpart filtering for prognostic case
`k.fold`	cross-validation folds
`cv.iter`	Algotithm terminates after cv.iter successful iterations of cross-validation, or after max.iter total iterations, whichever occurs first
`max.iter`	total iterations, whichever occurs first
`mc.iter`	number of iterations for the Monte Carlo procedure to get a stable "best number of predictors"
`method`	current version only supports sequential-BATTing ("Seq.BT") for subgroup identification
`do.cv`	whether to perform cross validation for performance evaluation. TRUE or FALSE (Default)
`out.file`	Name of output result files excluding method name. If NULL no output file would be saved
`file.path`	default: current working directory. When specifying a dir, use "/" at the end. e.g. "TEMP/"
`plots`	default: FALSE. whether to save plots

Details

Function for SubgrpID

Value

A list with SubgrpID output

res: list of all results from the algorithm
train.stat: list of subgroup statistics on training dataset
test.stat: list of subgroup statistics on testing dataset
cv.res: list of all results from cross-validation on training dataset
train.plot: interaction plot for training dataset
test.plot: interaction plot for testing dataset

Examples

# no run
n <- 40
k <- 5
prevalence <- sqrt(0.5)
rho<-0.2
sig2 <- 2
rhos.bt.real <- c(0, rep(0.1, (k-3)))*sig2
y.sig2 <- 1
yvar="y.binary"
xvars=paste("x", c(1:k), sep="")
trtvar="treatment"
prog.eff <- 0.5
effect.size <- 1
a.constent <- effect.size/(2*(1-prevalence))
set.seed(888)
ObsData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                    sig2=sig2, y.sig2=y.sig2, rho=rho,
                    rhos.bt.real=rhos.bt.real, a.constent=a.constent)
TestData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                     sig2=sig2, y.sig2=y.sig2, rho=rho,
                     rhos.bt.real=rhos.bt.real, a.constent=a.constent)
subgrp <- SubgrpID(data.train=ObsData$data,
                   data.test=TestData$data,
                   yvar=yvar,
                   trtvar=trtvar,
                   trtref="1",
                   xvars=xvars,
                   type="b",
                   n.boot=5, # suggest n.boot > 25, depends on sample size
                   des.res = "larger",
 #                 do.cv = TRUE,
 #                 cv.iter = 2, # uncomment to run CV
                   method="Seq.BT")
subgrp$res
subgrp$train.stat
subgrp$test.stat
subgrp$train.plot
subgrp$test.plot
#subgrp$cv.res$stats.summary #CV estimates of all results
# no run
n <- 40
k <- 5
prevalence <- sqrt(0.5)
rho<-0.2
sig2 <- 2
rhos.bt.real <- c(0, rep(0.1, (k-3)))*sig2
y.sig2 <- 1
yvar="y.binary"
xvars=paste("x", c(1:k), sep="")
trtvar="treatment"
prog.eff <- 0.5
effect.size <- 1
a.constent <- effect.size/(2*(1-prevalence))
set.seed(888)
ObsData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                    sig2=sig2, y.sig2=y.sig2, rho=rho,
                    rhos.bt.real=rhos.bt.real, a.constent=a.constent)
TestData <- data.gen(n=n, k=k, prevalence=prevalence, prog.eff=prog.eff,
                     sig2=sig2, y.sig2=y.sig2, rho=rho,
                     rhos.bt.real=rhos.bt.real, a.constent=a.constent)
subgrp <- SubgrpID(data.train=ObsData$data,
                   data.test=TestData$data,
                   yvar=yvar,
                   trtvar=trtvar,
                   trtref="1",
                   xvars=xvars,
                   type="b",
                   n.boot=5, # suggest n.boot > 25, depends on sample size
                   des.res = "larger",
 #                 do.cv = TRUE,
 #                 cv.iter = 2, # uncomment to run CV
                   method="Seq.BT")
subgrp$res
subgrp$train.stat
subgrp$test.stat
subgrp$train.plot
subgrp$test.plot
#subgrp$cv.res$stats.summary #CV estimates of all results

summarize.cv.stats

Description

Calculate summary statistics from raw statistics returned by evaluate.cv.results.

Usage

summarize.cv.stats(raw.stats, trtvar, type)
summarize.cv.stats(raw.stats, trtvar, type)

Arguments

`raw.stats`	raw statistics from evaluate.cv.results
`trtvar`	treatment variable name
`type`	data type - "c" - continuous , "b" - binary, "s" - time to event - default = "c"

Details

Calculate summary statistics from raw statistics returned by evaluate.cv.results.

Value

a list containing p-values, summary statistics and group statistics.

Package 'SubgrpID'

Help Index

balanced.folds

Description

Usage

Arguments

Details

Value

batting.pred

Description

Usage

Arguments

Details

Value

batting.prog

Description

Usage

Arguments

Details

Value

binary.stats

Description

Usage

Arguments

Details

Value

cv.folds

Description

Usage

Arguments

Details

Value

cv.pval

Description

Usage

Arguments

Details

Value

cv.seqlr.batting

Description

Usage

Arguments

Details

Value

data.gen

Description

Usage

Arguments

Details

Value

Examples

evaluate.cv.results

Description

Usage

Arguments

Details

Value

evaluate.results

Description

Usage

Arguments

Details

Value

filter

Description

Usage

Arguments

Details

Value

References

Examples

filter.glmnet

Description

Usage

Arguments

Details

Value

filter.unicart

Description

Usage