Title: | Clustering Analysis Using Survival Tree and Forest Algorithms |
---|---|
Description: | An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in <https://github.com/luyouepiusf/SurvivalClusteringTree>. |
Authors: | Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Yuting Yang [aut] (Added some data frame features.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.) |
Maintainer: | Lu You <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.1 |
Built: | 2024-12-21 06:30:22 UTC |
Source: | CRAN |
An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in <https://github.com/luyouepiusf/SurvivalClusteringTree>.
Index of help topics:
SurvivalClusteringTree-package Clustering Analysis Using Survival Tree and Forest Algorithms plot_survival_tree Visualize the Fitted Survival Tree predict_distance_forest Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe) predict_distance_forest_matrix Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) predict_distance_tree Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_distance_tree_matrix Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) predict_weights Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_weights_matrix Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices) survival_forest Build a Survival Forest (Data Supplied as a Dataframe) survival_forest_matrix Build a Survival Forest (Data Supplied as Matrices) survival_tree Build a Survival Tree (Data Supplied as a Dataframe) survival_tree_matrix Build a Survival Tree (Data Supplied as Matrices)
Further information is available in the following vignettes:
user-guide |
User Guide to SurvivalClusteringTree (source, pdf) |
Lu You <[email protected]>
Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Yuting Yang [aut] (Added some data frame features.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.)
Visualize the Fitted Survival Tree
plot_survival_tree(survival_tree, cex = 0.75)
plot_survival_tree(survival_tree, cex = 0.75)
survival_tree |
a fitted survival tree object. |
cex |
numeric character expansion factor. |
No return value, called for generating graphical outputs.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) plot_survival_tree(a_survival_tree)
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) plot_survival_tree(a_survival_tree)
The function
predict_distance_forest
predicts distances between samples based on a survival forest fit.
predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_forest |
a fitted survival forest |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe)
A list.
mean_distance
is the mean distance matrix.
sum_distance
is the matrix that sums the distances between samples.
sum_non_na
is the matrix of the number of non NA distances being averaged.
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20) a_distance<- predict_distance_forest( a_survival_forest, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20) a_distance<- predict_distance_forest( a_survival_forest, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_distance_forest_matrix
predicts distances between samples based on a survival forest fit.
predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )
predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )
survival_forest |
a fitted survival forest |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) (Works for raw matrices)
A list.
mean_distance
is the mean distance matrix.
sum_distance
is the matrix that sums the distances between samples.
sum_non_na
is the matrix of the number of non NA distances being averaged.
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=F]), nboot=20) a_distance<- predict_distance_forest_matrix( a_survival_forest, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=F]))
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=F]), nboot=20) a_distance<- predict_distance_forest_matrix( a_survival_forest, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=F]))
The function
predict_distance_tree
predicts distances between samples based on a survival tree fit.
predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe)
A list.
node_distance
gives the distance matrix between nodes.
ind_distance
gives the distance matrix between samples.
ind_weights
gives the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_distance<- predict_distance_tree( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_distance<- predict_distance_tree( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_distance_tree_matrix
predicts distances between samples based on a survival tree fit.
predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )
predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) (Works for raw matrices)
A list.
node_distance
gives the distance matrix between nodes.
ind_distance
gives the distance matrix between samples.
ind_weights
gives the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_distance<- predict_distance_tree_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_distance<- predict_distance_tree_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
The function
predict_weights
predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe)
A weight matrix representing the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_weight<- predict_weights( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_weight<- predict_weights( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_weights_matrix
predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )
predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices)
A weight matrix representing the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_weight<- predict_weights_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_weight<- predict_weights_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
The function
survival_forest
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0 )
survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0 )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Build a Survival Forest (Data Supplied as a Dataframe)
A list containing the information of the survival forest fit.
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20)
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20)
The function
survival_forest_matrix
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0 )
survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0 )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Build a Survival Forest (Data Supplied as Matrices)
A list containing the information of the survival forest fit.
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20)
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20)
The function
survival_tree
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Build a Survival Tree (Data Supplied as a Dataframe)
A list containing the information of the survival tree fit.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
survival_tree_matrix
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Build a Survival Tree (Data Supplied as Matrices)
A list containing the information of the survival tree fit.