Package 'CompositionalRF'

Title: Multivariate Random Forest with Compositional Responses
Description: Non linear regression with compositional responses and Euclidean predictors is performed. The compositional data are first transformed using the additive log-ratio transformation, and then the multivariate random forest of Rahman R., Otridge J. and Pal R. (2017), <doi:10.1093/bioinformatics/btw765>, is applied.
Authors: Michail Tsagris [aut, cre]
Maintainer: Michail Tsagris <[email protected]>
License: GPL (>= 2)
Version: 1.0
Built: 2024-11-02 06:39:57 UTC
Source: CRAN

Help Index


Multivariate Random Forests with Compositional Responses

Description

Multivariate random forest with compositional response variables and continuous predictor variables. The data are first transformed using the additive log-ratio transformation and then the multivariate random forest of Rahman R., Otridge J. and Pal R. (2017), <doi:10.1093/bioinformatics/btw765>, is applied.

Details

Package: CompositionalRF
Type: Package
Version: 1.0
Date: 2024-10-01
License: GPL-2

Maintainers

Michail Tsagris <[email protected]>

Author(s)

Michail Tsagris [email protected].

References

Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.

Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.

Alenazi A. (2023). A review of compositional data analysis and recent advances. Communications in Statistics–Theory and Methods, 52(16): 5535–5567.

Friedman Jerome, Trevor Hastie and Robert Tibshirani (2009). The elements of statistical learning, 2nd edition. Springer, Berlin.


Compositional Random Forests

Description

Compositional Random Forests.

Usage

comp.rf(xnew = x, y, x, type = "alr", ntrees, nfeatures, minleaf)

Arguments

xnew

A matrix with the new predictor variables whose compositional response values are to be predicted.

y

The response compositional data. Zero values are not allowed.

x

A matrix with the predictor variables data.

type

If the responses are alreay transformed with the additive log-ratio transformation type 0, otherwise, if they are compositional data, leave it equal to "alr", so that the data will be transformed.

ntrees

The number of trees to construct in the random forest.

nfeatures

The number of randomly selected predictor variables considered for a split in each regression tree node, which must be less than the number of input precictors.

minleaf

Minimum number of observations in the leaf node. If a node has less than or equal to minleaf observations, there will be no splitting in that node and this node will be considered as a leaf node. The number evidently must be less than or equal to the sample size.

Details

The compositional are first log-transformed using the additive log-ratio transformation and then the multivariate random forest algorithm of Rahman, Otridge and Pal (2017) is applied.

Value

A matrix with the estimated compositional response values.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris [email protected].

References

Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.

Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.

See Also

cv.comprf

Examples

y <- as.matrix(iris[, 1:4])
y <- y/ rowSums(y)
x <- matrix( rnorm(150 * 10), ncol = 10 )
mod <- comp.rf(x[1:10, ], y, x, ntrees = 2, nfeatures = 5, minleaf = 10)
mod

Cross-Validation of the Compositional Random Forests

Description

Cross-Validation of the Compositional Random Forests.

Usage

cv.comprf(y, x, ntrees = c(50, 100, 500, 1000), nfeatures, minleaf,
folds = NULL, nfolds = 10, seed = NULL, ncores = 1)

Arguments

y

The response compositional data. Zero values are not allowed.

x

A matrix with the predictor variables data.

ntrees

A vector with the possible number of trees to consider each time.

nfeatures

A vector with the number of randomly selected predictor variables considered for a split in each regression tree node.

minleaf

A vector with the minimum number of observations in the leaf node.

folds

If you have the list with the folds supply it here. You can also leave it NULL and it will create folds.

nfolds

The number of folds in the cross validation.

seed

You can specify your own seed number here or leave it NULL.

ncores

The number of cores to use. If more than 1, parallel computing will take place. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down the process.

Details

K-fold cross-validation for the multivariate random forest with compositional responses is performed.

Value

A list including:

kl

A matrix with the configurations of hyper-parameters tested and the estimated Kullback-Leibler divergence, for each configuration.

js

A matrix with the configurations of hyper-parameters tested and the estimated Jensen-Shannon divergence, for each configuration.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris [email protected].

References

Rahman R., Otridge J. and Pal R. (2017). IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics, 33(9): 1407–1410.

Segal M. and Xiao Y. (2011). Multivariate random forests. Wiley Interdisciplinary Reviews: Data mining and Knowledge Discovery, 1(1): 80–87.

See Also

comp.rf

Examples

y <- as.matrix(iris[, 1:4])
y <- y/ rowSums(y)
x <- matrix( rnorm(150 * 10), ncol = 10 )
mod <- cv.comprf(y, x, ntrees = 2, nfeatures = 5, minleaf = 10, nfolds = 2)