| Title: | Clustering Analysis Using Survival Tree and Forest Algorithms |
|---|---|
| Description: | An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in <https://github.com/luyouepiusf/SurvivalClusteringTree>. |
| Authors: | Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Yuting Yang [aut] (Added some data frame features.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.) |
| Maintainer: | Lu You <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.1.3 |
| Built: | 2026-05-26 07:13:13 UTC |
| Source: | https://github.com/cran/SurvivalClusteringTree |
An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in <https://github.com/luyouepiusf/SurvivalClusteringTree>.
Index of help topics:
plot_survival_tree Visualize the Fitted Survival Tree
predict_distance_forest
Predict Distances Between Samples Based on a
Survival Forest Fit (Data Supplied as a
Dataframe)
predict_distance_forest_matrix
Predict Distances Between Samples Based on a
Survival Forest Fit (Data Supplied as Matrices)
predict_distance_tree Predict Distances Between Samples Based on a
Survival Tree Fit (Data Supplied as a
Dataframe)
predict_distance_tree_matrix
Predict Distances Between Samples Based on a
Survival Tree Fit (Data Supplied as Matrices)
predict_weights Predict Weights of Samples in Terminal Nodes
Based on a Survival Tree Fit (Data Supplied as
a Dataframe)
predict_weights_matrix
Predict Weights of Samples in Terminal Nodes
Based on a Survival Tree Fit (Data Supplied as
Matrices)
survival_forest Build a Survival Forest (Data Supplied as a
Dataframe)
survival_forest_matrix
Build a Survival Forest (Data Supplied as
Matrices)
survival_tree Build a Survival Tree (Data Supplied as a
Dataframe)
survival_tree_matrix Build a Survival Tree (Data Supplied as
Matrices)
SurvivalClusteringTree-package
Clustering Analysis Using Survival Tree and
Forest Algorithms
Further information is available in the following vignettes:
user-guide |
User Guide to SurvivalClusteringTree (source, pdf) |
Lu You <[email protected]>
Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Yuting Yang [aut] (Added some data frame features.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.)
Visualize the Fitted Survival Tree
plot_survival_tree(survival_tree, cex = 0.75)plot_survival_tree(survival_tree, cex = 0.75)
survival_tree |
a fitted survival tree object. |
cex |
numeric character expansion factor. |
No return value, called for generating graphical outputs.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) plot_survival_tree(a_survival_tree)library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) plot_survival_tree(a_survival_tree)
The function
predict_distance_forest predicts distances between samples based on a survival forest fit.
predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_forest |
a fitted survival forest |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe)
A list.
mean_distance is the mean distance matrix.
sum_distance is the matrix that sums the distances between samples.
sum_non_na is the matrix of the number of non NA distances being averaged.
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20) a_distance<- predict_distance_forest( a_survival_forest, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20) a_distance<- predict_distance_forest( a_survival_forest, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_distance_forest_matrix predicts distances between samples based on a survival forest fit.
predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )
survival_forest |
a fitted survival forest |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) (Works for raw matrices)
A list.
mean_distance is the mean distance matrix.
sum_distance is the matrix that sums the distances between samples.
sum_non_na is the matrix of the number of non NA distances being averaged.
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20) a_distance<- predict_distance_forest_matrix( a_survival_forest, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20) a_distance<- predict_distance_forest_matrix( a_survival_forest, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
The function
predict_distance_tree predicts distances between samples based on a survival tree fit.
predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe)
A list.
node_distance gives the distance matrix between nodes.
ind_distance gives the distance matrix between samples.
ind_weights gives the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_distance<- predict_distance_tree( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_distance<- predict_distance_tree( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_distance_tree_matrix predicts distances between samples based on a survival tree fit.
predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) (Works for raw matrices)
A list.
node_distance gives the distance matrix between nodes.
ind_distance gives the distance matrix between samples.
ind_weights gives the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_distance<- predict_distance_tree_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_distance<- predict_distance_tree_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
The function
predict_weights predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe)
A weight matrix representing the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_weight<- predict_weights( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung) a_weight<- predict_weights( a_survival_tree, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
predict_weights_matrix predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices)
A weight matrix representing the weights of samples in each node.
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_weight<- predict_weights_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE])) a_weight<- predict_weights_matrix( a_survival_tree, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))
The function
survival_forest build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0, args_miceRanger = NULL )survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0, args_miceRanger = NULL )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
args_miceRanger |
a list specifying additional arguments to be used to impute missing data using |
Build a Survival Forest (Data Supplied as a Dataframe)
A list containing the information of the survival forest fit.
library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20)library(survival) a_survival_forest<- survival_forest( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung,nboot=20)
The function
survival_forest_matrix build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0, args_miceRanger = NULL )survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0, nboot = 100, seed = 0, args_miceRanger = NULL )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
args_miceRanger |
a list specifying additional arguments to be used to impute missing data using |
Build a Survival Forest (Data Supplied as Matrices)
A list containing the information of the survival forest fit.
library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20)library(survival) a_survival_forest<- survival_forest_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]), nboot=20)
The function
survival_tree build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Build a Survival Tree (Data Supplied as a Dataframe)
A list containing the information of the survival tree fit.
library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)library(survival) a_survival_tree<- survival_tree( survival_outcome=Surv(time,status==2)~1, numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal, factor_predictor=~as.factor(sex), data=lung)
The function
survival_tree_matrix build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", cut_type = 0 )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Build a Survival Tree (Data Supplied as Matrices)
A list containing the information of the survival tree fit.
library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))library(survival) a_survival_tree<- survival_tree_matrix( time=lung$time, event=lung$status==2, matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]), matrix_factor=data.matrix(lung[,5,drop=FALSE]))