Title: | Multivariate Random Forest with Compositional Responses |
---|---|
Description: | Non linear regression with compositional responses and Euclidean predictors is performed. The compositional data are first transformed using the additive log-ratio transformation, and then the multivariate random forest of Rahman R., Otridge J. and Pal R. (2017), <doi:10.1093/bioinformatics/btw765>, is applied. |
Authors: | Michail Tsagris [aut, cre] |
Maintainer: | Michail Tsagris <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2024-11-02 06:39:57 UTC |
Source: | CRAN |
Multivariate random forest with compositional response variables and continuous predictor variables. The data are first transformed using the additive log-ratio transformation and then the multivariate random forest of Rahman R., Otridge J. and Pal R. (2017), <doi:10.1093/bioinformatics/btw765>, is applied.
Package: | CompositionalRF |
Type: | Package |
Version: | 1.0 |
Date: | 2024-10-01 |
License: | GPL-2 |
Michail Tsagris <[email protected]>
Michail Tsagris [email protected].
Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.
Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.
Alenazi A. (2023). A review of compositional data analysis and recent advances. Communications in Statistics–Theory and Methods, 52(16): 5535–5567.
Friedman Jerome, Trevor Hastie and Robert Tibshirani (2009). The elements of statistical learning, 2nd edition. Springer, Berlin.
Compositional Random Forests.
comp.rf(xnew = x, y, x, type = "alr", ntrees, nfeatures, minleaf)
comp.rf(xnew = x, y, x, type = "alr", ntrees, nfeatures, minleaf)
xnew |
A matrix with the new predictor variables whose compositional response values are to be predicted. |
y |
The response compositional data. Zero values are not allowed. |
x |
A matrix with the predictor variables data. |
type |
If the responses are alreay transformed with the additive log-ratio transformation type 0, otherwise, if they are compositional data, leave it equal to "alr", so that the data will be transformed. |
ntrees |
The number of trees to construct in the random forest. |
nfeatures |
The number of randomly selected predictor variables considered for a split in each regression tree node, which must be less than the number of input precictors. |
minleaf |
Minimum number of observations in the leaf node. If a node has less than or equal to minleaf observations, there will be no splitting in that node and this node will be considered as a leaf node. The number evidently must be less than or equal to the sample size. |
The compositional are first log-transformed using the additive log-ratio transformation and then the multivariate random forest algorithm of Rahman, Otridge and Pal (2017) is applied.
A matrix with the estimated compositional response values.
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.
Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.
y <- as.matrix(iris[, 1:4]) y <- y/ rowSums(y) x <- matrix( rnorm(150 * 10), ncol = 10 ) mod <- comp.rf(x[1:10, ], y, x, ntrees = 2, nfeatures = 5, minleaf = 10) mod
y <- as.matrix(iris[, 1:4]) y <- y/ rowSums(y) x <- matrix( rnorm(150 * 10), ncol = 10 ) mod <- comp.rf(x[1:10, ], y, x, ntrees = 2, nfeatures = 5, minleaf = 10) mod
Cross-Validation of the Compositional Random Forests.
cv.comprf(y, x, ntrees = c(50, 100, 500, 1000), nfeatures, minleaf, folds = NULL, nfolds = 10, seed = NULL, ncores = 1)
cv.comprf(y, x, ntrees = c(50, 100, 500, 1000), nfeatures, minleaf, folds = NULL, nfolds = 10, seed = NULL, ncores = 1)
y |
The response compositional data. Zero values are not allowed. |
x |
A matrix with the predictor variables data. |
ntrees |
A vector with the possible number of trees to consider each time. |
nfeatures |
A vector with the number of randomly selected predictor variables considered for a split in each regression tree node. |
minleaf |
A vector with the minimum number of observations in the leaf node. |
folds |
If you have the list with the folds supply it here. You can also leave it NULL and it will create folds. |
nfolds |
The number of folds in the cross validation. |
seed |
You can specify your own seed number here or leave it NULL. |
ncores |
The number of cores to use. If more than 1, parallel computing will take place. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down the process. |
K-fold cross-validation for the multivariate random forest with compositional responses is performed.
A list including:
kl |
A matrix with the configurations of hyper-parameters tested and the estimated Kullback-Leibler divergence, for each configuration. |
js |
A matrix with the configurations of hyper-parameters tested and the estimated Jensen-Shannon divergence, for each configuration. |
Michail Tsagris.
R implementation and documentation: Michail Tsagris [email protected].
Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.
Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.
y <- as.matrix(iris[, 1:4]) y <- y/ rowSums(y) x <- matrix( rnorm(150 * 10), ncol = 10 ) mod <- cv.comprf(y, x, ntrees = 2, nfeatures = 5, minleaf = 10, nfolds = 2)
y <- as.matrix(iris[, 1:4]) y <- y/ rowSums(y) x <- matrix( rnorm(150 * 10), ncol = 10 ) mod <- cv.comprf(y, x, ntrees = 2, nfeatures = 5, minleaf = 10, nfolds = 2)