Package 'DynForest'

Title: Random Forest with Multivariate Longitudinal Predictors
Description: Based on random forest principle, 'DynForest' is able to include multiple longitudinal predictors to provide individual predictions. Longitudinal predictors are modeled through the random forest. The methodology is fully described for a survival outcome in: Devaux, Helmer, Genuer & Proust-Lima (2023) <doi: 10.1177/09622802231206477>.
Authors: Anthony Devaux [aut, cre] , Robin Genuer [aut] , Cécile Proust-Lima [aut] , Louis Capitaine [aut]
Maintainer: Anthony Devaux <[email protected]>
License: LGPL (>= 3)
Version: 1.2.0
Built: 2024-10-24 04:23:58 UTC
Source: CRAN

Help Index


Compute the grouped importance of variables (gVIMP) statistic

Description

Compute the grouped importance of variables (gVIMP) statistic

Usage

compute_gvimp(
  dynforest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  group = NULL,
  ncores = NULL,
  seed = 1234
)

Arguments

dynforest_obj

dynforest_obj dynforest object

IBS.min

(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.

IBS.max

(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.

group

A list of groups with the name of the predictors assigned in each group

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

Value

compute_gvimp() function returns a list with the following elements:

Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
group A list of each group defined in group argument
gVIMP A numeric vector containing the gVIMP for each group defined in group argument
tree_oob_err A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic
IBS.range A vector containing the IBS min and max

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2, seed = 1234)

Compute the Out-Of-Bag error (OOB error)

Description

Compute the Out-Of-Bag error (OOB error)

Usage

compute_ooberror(dynforest_obj, IBS.min = 0, IBS.max = NULL, ncores = NULL)

Arguments

dynforest_obj

dynforest_obj dynforest object

IBS.min

(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.

IBS.max

(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

Value

compute_ooberror() function return a list with the following elements:

data A list containing the data used to grow the trees
rf A table with each tree in column. Provide multiple characteristics about the tree building
type Outcome type
times A numeric vector containing the time-to-event for all subjects
cause Indicating the cause of interest
causes A numeric vector containing the causes indicator
Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
Longitudinal.model A list of longitudinal markers containing the formula used for modeling in the random forest
param A list containing the hyperparameters
oob.err A numeric vector containing the OOB error for each subject
oob.pred Outcome prediction for all subjects
IBS.range A vector containing the IBS min and max

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute OOB error
res_dyn_OOB <- compute_ooberror(dynforest_obj = res_dyn, ncores = 2)

Extract characteristics from the trees building process

Description

Extract characteristics from the trees building process

Usage

compute_vardepth(dynforest_obj)

Arguments

dynforest_obj

dynforest_obj dynforest object

Value

compute_vardepth function return a list with the following elements:

min_depth A table providing for each feature in row: the average depth and the rank
var_node_depth A table providing for each tree in column the minimal depth for each feature in row. NA indicates that the feature was not used for the corresponding tree
var_count A table providing for each tree in column the number of times where the feature is used (in row). 0 value indicates that the feature was not used for the corresponding tree

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Run compute_vardepth function
res_varDepth <- compute_vardepth(res_dyn)

Compute the importance of variables (VIMP) statistic

Description

Compute the importance of variables (VIMP) statistic

Usage

compute_vimp(
  dynforest_obj,
  IBS.min = 0,
  IBS.max = NULL,
  ncores = NULL,
  seed = 1234
)

Arguments

dynforest_obj

dynforest_obj dynforest object

IBS.min

(Only with survival outcome) Minimal time to compute the Integrated Brier Score. Default value is set to 0.

IBS.max

(Only with survival outcome) Maximal time to compute the Integrated Brier Score. Default value is set to the maximal time-to-event found.

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

Value

compute_vimp() function returns a list with the following elements:

Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
Importance A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains a numeric vector of VIMP statistic predictor in Inputs value
tree_oob_err A numeric vector containing the OOB error for each tree needed to compute the VIMP statistic
IBS.range A vector containing the IBS min and max

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2, seed = 1234)

data_simu1 dataset

Description

Simulated dataset 1 with continuous outcome

Format

Longitudinal dataset with 1200 rows and 13 columns for 200 subjects

id

Subject identifier

time

Time measurement

cont_covar1

Continuous time-fixed predictor 1

cont_covar2

Continuous time-fixed predictor 2

bin_covar1

Binary time-fixed predictor 1

bin_covar2

Binary time-fixed predictor 2

marker1

Continuous time-dependent predictor 1

marker2

Continuous time-dependent predictor 2

marker3

Continuous time-dependent predictor 3

marker4

Continuous time-dependent predictor 4

marker5

Continuous time-dependent predictor 5

marker6

Continuous time-dependent predictor 6

Y_res

Continuous outcome

Examples

data(data_simu1)

data_simu2 dataset

Description

Simulated dataset 2 with continuous outcome

Format

Longitudinal dataset with 1200 rows and 13 columns for 200 subjects

id

Subject identifier

time

Time measurement

cont_covar1

Continuous time-fixed predictor 1

cont_covar2

Continuous time-fixed predictor 2

bin_covar1

Binary time-fixed predictor 1

bin_covar2

Binary time-fixed predictor 2

marker1

Continuous time-dependent predictor 1

marker2

Continuous time-dependent predictor 2

marker3

Continuous time-dependent predictor 3

marker4

Continuous time-dependent predictor 4

marker5

Continuous time-dependent predictor 5

marker6

Continuous time-dependent predictor 6

Y_res

Continuous outcome

Examples

data(data_simu2)

Random forest with multivariate longitudinal endogenous covariates

Description

Build a random forest using multivariate longitudinal endogenous covariates

Usage

dynforest(
  timeData = NULL,
  fixedData = NULL,
  idVar = NULL,
  timeVar = NULL,
  timeVarModel = NULL,
  Y = NULL,
  ntree = 200,
  mtry = NULL,
  nodesize = 1,
  minsplit = 2,
  cause = 1,
  nsplit_option = "quantile",
  ncores = NULL,
  seed = 1234,
  verbose = TRUE
)

Arguments

timeData

A data.frame containing the id and time measurements variables and the time-dependent predictors.

fixedData

A data.frame containing the id variable and the time-fixed predictors. Categorical variables should be characterized as factor.

idVar

A character indicating the name of variable to identify the subjects

timeVar

A character indicating the name of time variable

timeVarModel

A list for each time-dependent predictors containing a list of formula for fixed and random part from the mixed model

Y

A list of output which should contain: type defines the nature of the outcome, can be "surv", "numeric" or "factor"; .

ntree

Number of trees to grow. Default value set to 200.

mtry

Number of candidate variables randomly drawn at each node of the trees. This parameter should be tuned by minimizing the OOB error. Default is defined as the square root of the number of predictors.

nodesize

Minimal number of subjects required in both child nodes to split. Cannot be smaller than 1.

minsplit

(Only with survival outcome) Minimal number of events required to split the node. Cannot be smaller than 2.

cause

(Only with competing events) Number indicates the event of interest.

nsplit_option

A character indicates how the values are chosen to build the two groups for the splitting rule (only for continuous predictors). Values are chosen using deciles (nsplit_option="quantile") or randomly (nsplit_option="sample"). Default value is "quantile".

ncores

Number of cores used to grow trees in parallel. Default value is the number of cores of the computer-1.

seed

Seed to replicate results

verbose

A logical controlling the function progress. Default is TRUE

Details

The function currently supports survival (competing or single event), continuous or categorical outcome.

FUTUR IMPLEMENTATIONS:

  • Continuous longitudinal outcome

  • Functional data analysis

Value

dynforest function returns a list with the following elements:

data A list containing the data used to grow the trees
rf A table with each tree in column. Provide multiple characteristics about the tree building
type Outcome type
times A numeric vector containing the time-to-event for all subjects
cause Indicating the cause of interest
causes A numeric vector containing the causes indicator
Inputs A list of 3 elements: Longitudinal, Numeric and Factor. Each element contains the names of the predictors
Longitudinal.model A list of longitudinal markers containing the formula used for modeling in the random forest
param A list containing the hyperparameters
comput.time Computation time

Author(s)

Anthony Devaux ([email protected])

References

  • Devaux A., Helmer C., Genuer R., Proust-Lima C. (2023). Random survival forests with multivariate longitudinal endogenous covariates. SMMR doi:10.1177/09622802231206477

  • Devaux A., Proust-Lima C., Genuer R. (2023). Random Forests for time-fixed and time-dependent predictors: The DynForest R package. arXiv doi:10.48550/arXiv.2302.02670

See Also

summary.dynforest() compute_ooberror() compute_vimp() compute_gvimp() predict.dynforest() plot.dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

Extract some information about the split for a tree by user

Description

Extract some information about the split for a tree by user

Usage

get_tree(dynforest_obj, tree)

Arguments

dynforest_obj

dynforest_obj dynforest object

tree

Integer indicating the tree identifier

Value

A table sorted by the node/leaf identifier with each row representing a node/leaf. Each column provides information about the splits:

type The nature of the predictor (Longitudinal for longitudinal predictor, Numeric for continuous predictor or Factor for categorical predictor) if the node was split, Leaf otherwise
var_split The predictor used for the split defined by its order in timeData and fixedData
feature The feature used for the split defined by its position in random statistic
threshold The threshold used for the split (only with Longitudinal and Numeric). No information is returned for Factor
N The number of subjects in the node/leaf
Nevent The number of events of interest in the node/leaf (only with survival outcome)
depth the depth level of the node/leaf

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Extract split information from tree 4
res_tree4 <- get_tree(dynforest_obj = res_dyn, tree = 4)

Extract nodes identifiers for a given tree

Description

Extract nodes identifiers for a given tree

Usage

get_treenodes(dynforest_obj, tree = NULL)

Arguments

dynforest_obj

dynforest_obj dynforest object

tree

Integer indicating the tree identifier

Value

Extract nodes identifiers for a given tree

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Extract nodes identifiers for a given tree
get_treenodes(dynforest_obj = res_dyn, tree = 1)

pbc2 dataset

Description

pbc2 data from Mayo clinic

Format

Longitudinal dataset with 1945 rows and 19 columns for 312 patients

id

Patient identifier

time

Time measurement

ascites

Presence of ascites (Yes/No)

hepatomegaly

Presence of hepatomegaly (Yes/No)

spiders

Blood vessel malformations in the skin (Yes/No)

edema

Edema levels (No edema/edema no diuretics/edema despite diuretics)

serBilir

Level of serum bilirubin

serChol

Level of serum cholesterol

albumin

Level of albumin

alkaline

Level of alkaline phosphatase

SGOT

Level of aspartate aminotransferase

platelets

Platelet count

prothrombin

Prothrombin time

histologic

Histologic stage of disease

drug

Drug treatment (D-penicillmain/Placebo)

age

Age at enrollment

sex

Sex of patient

years

Time-to-event in years

event

Event indicator: 0 (alive), 1 (transplanted) and 2 (dead)

Source

pbc2 joineRML

Examples

data(pbc2)

Plot function in dynforest

Description

This function displays a plot of CIF for a given node and tree (for class dynforest), the most predictive variables with the minimal depth (for class dynforestvardepth), the variable importance (for class dynforestvimp) or the grouped variable importance (for class dynforestgvimp).

Usage

## S3 method for class 'dynforest'
plot(x, tree = NULL, nodes = NULL, id = NULL, max_tree = NULL, ...)

## S3 method for class 'dynforestvardepth'
plot(x, plot_level = c("predictor", "feature"), ...)

## S3 method for class 'dynforestvimp'
plot(x, PCT = FALSE, ordering = TRUE, ...)

## S3 method for class 'dynforestgvimp'
plot(x, PCT = FALSE, ...)

## S3 method for class 'dynforestpred'
plot(x, id = NULL, ...)

Arguments

x

Object inheriting from classes dynforest, dynforestvardepth, dynforestvimp or dynforestgvimp, to respectively plot the CIF, the minimal depth, the variable importance or grouped variable importance.

tree

For dynforest class, integer indicating the tree identifier

nodes

For dynforest class, identifiers for the selected nodes

id

For dynforest and dynforestpred classes, identifier for a given subject

max_tree

For dynforest class, integer indicating the number of tree to display while using id argument

...

Optional parameters to be passed to the low level function

plot_level

For dynforestvardepth class, compute the statistic at predictor (plot_level="predictor") or feature (plot_level="feature") level

PCT

For dynforestvimp or dynforestgvimp class, display VIMP statistic in percentage. Default value is FALSE.

ordering

For dynforestvimp class, order predictors according to VIMP value. Default value is TRUE.

Value

plot() function displays:

With dynforestvardepth the minimal depth for each predictor/feature
With dynforestvimp the VIMP for each predictor
With dynforestgvimp the grouped-VIMP for each given group

See Also

dynforest() compute_ooberror() compute_vimp() compute_gvimp() compute_vardepth()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Plot estimated CIF at nodes 17 and 32
plot(x = res_dyn, tree = 1, nodes = c(17,32))

# Run var_depth function
res_varDepth <- compute_vardepth(res_dyn)

# Plot minimal depth
plot(x = res_varDepth, plot_level = "feature")

# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2)

# Plot VIMP
plot(x = res_dyn_VIMP, PCT = TRUE)

# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2)

# Plot gVIMP
plot(x = res_dyn_gVIMP, PCT = TRUE)

# Sample 5 subjects to predict the event
set.seed(123)
id_pred <- sample(id, 5)

# Create predictors objects
pbc2_pred <- pbc2[which(pbc2$id%in%id_pred),]
timeData_pred <- pbc2_pred[,c("id", "time", "serBilir", "SGOT", "albumin", "alkaline")]
fixedData_pred <- unique(pbc2_pred[,c("id","age","drug","sex")])

# Predict the CIF function for the new subjects with landmark time at 4 years
pred_dyn <- predict(object = res_dyn,
                    timeData = timeData_pred, fixedData = fixedData_pred,
                    idVar = "id", timeVar = "time",
                    t0 = 4)

# Plot predicted CIF for subjects 26 and 110
plot(x = pred_dyn, id = c(26, 110))

Prediction using dynamic random forests

Description

Prediction using dynamic random forests

Usage

## S3 method for class 'dynforest'
predict(
  object,
  timeData = NULL,
  fixedData = NULL,
  idVar,
  timeVar,
  t0 = NULL,
  ...
)

Arguments

object

dynforest object containing the dynamic random forest used on train data

timeData

A data.frame containing the id and time measurements variables and the time-dependent predictors.

fixedData

A data.frame containing the id variable and the time-fixed predictors. Non-continuous variables should be characterized as factor.

idVar

A character indicating the name of variable to identify the subjects

timeVar

A character indicating the name of time variable

t0

Landmark time

...

Optional parameters to be passed to the low level function

Value

Return the outcome of interest for the new subjects: matrix of probability of event of interest in survival mode, average value in regression mode and most likely value in classification mode

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Sample 5 subjects to predict the event
set.seed(123)
id_pred <- sample(id, 5)

# Create predictors objects
pbc2_pred <- pbc2[which(pbc2$id%in%id_pred),]
timeData_pred <- pbc2_pred[,c("id", "time", "serBilir", "SGOT", "albumin", "alkaline")]
fixedData_pred <- unique(pbc2_pred[,c("id","age","drug","sex")])

# Predict the CIF function for the new subjects with landmark time at 4 years
pred_dyn <- predict(object = res_dyn,
                    timeData = timeData_pred, fixedData = fixedData_pred,
                    idVar = "id", timeVar = "time",
                    t0 = 4)

Print function

Description

This function displays a brief summary regarding the trees (for class dynforest), a data frame with variable importance (for class dynforestvimp) or the grouped variable importance (for class dynforestgvimp).

Usage

## S3 method for class 'dynforest'
print(x, ...)

## S3 method for class 'dynforestvimp'
print(x, ...)

## S3 method for class 'dynforestgvimp'
print(x, ...)

## S3 method for class 'dynforestvardepth'
print(x, ...)

## S3 method for class 'dynforestoob'
print(x, ...)

## S3 method for class 'dynforestpred'
print(x, ...)

Arguments

x

Object inheriting from classes dynforest, dynforestvimp or dynforestgvimp.

...

Optional parameters to be passed to the low level function

See Also

dynforest() compute_ooberror() compute_vimp() compute_gvimp() compute_vardepth() predict.dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Print function
print(res_dyn)

# Compute VIMP statistic
res_dyn_VIMP <- compute_vimp(dynforest_obj = res_dyn, ncores = 2, seed = 1234)

# Print function
print(res_dyn_VIMP)

# Compute gVIMP statistic
res_dyn_gVIMP <- compute_gvimp(dynforest_obj = res_dyn,
                               group = list(group1 = c("serBilir","SGOT"),
                                            group2 = c("albumin","alkaline")),
                               ncores = 2, seed = 1234)

# Print function
print(res_dyn_gVIMP)

# Run var_depth function
res_varDepth <- compute_vardepth(res_dyn)

# Print function
print(res_varDepth)

Display the summary of dynforest

Description

Display the summary of dynforest

Usage

## S3 method for class 'dynforest'
summary(object, ...)

## S3 method for class 'dynforestoob'
summary(object, ...)

Arguments

object

dynforest or dynforestOOB object

...

Optional parameters to be passed to the low level function

Value

Return some information about the random forest

See Also

dynforest()

Examples

data(pbc2)

# Get Gaussian distribution for longitudinal predictors
pbc2$serBilir <- log(pbc2$serBilir)
pbc2$SGOT <- log(pbc2$SGOT)
pbc2$albumin <- log(pbc2$albumin)
pbc2$alkaline <- log(pbc2$alkaline)

# Sample 100 subjects
set.seed(1234)
id <- unique(pbc2$id)
id_sample <- sample(id, 100)
id_row <- which(pbc2$id%in%id_sample)

pbc2_train <- pbc2[id_row,]

timeData_train <- pbc2_train[,c("id","time",
                                "serBilir","SGOT",
                                "albumin","alkaline")]

# Create object with longitudinal association for each predictor
timeVarModel <- list(serBilir = list(fixed = serBilir ~ time,
                                     random = ~ time),
                     SGOT = list(fixed = SGOT ~ time + I(time^2),
                                 random = ~ time + I(time^2)),
                     albumin = list(fixed = albumin ~ time,
                                    random = ~ time),
                     alkaline = list(fixed = alkaline ~ time,
                                     random = ~ time))

# Build fixed data
fixedData_train <- unique(pbc2_train[,c("id","age","drug","sex")])

# Build outcome data
Y <- list(type = "surv",
          Y = unique(pbc2_train[,c("id","years","event")]))

# Run dynforest function
res_dyn <- dynforest(timeData = timeData_train, fixedData = fixedData_train,
                     timeVar = "time", idVar = "id",
                     timeVarModel = timeVarModel, Y = Y,
                     ntree = 50, nodesize = 5, minsplit = 5,
                     cause = 2, ncores = 2, seed = 1234)

# Compute OOB error
res_dyn_OOB <- compute_ooberror(dynforest_obj = res_dyn, ncores = 2)

# dynforest summary
summary(object = res_dyn_OOB)