Title: | Fit Log-Ratio Lasso Regression for Compositional Data |
---|---|
Description: | Log-ratio Lasso regression for continuous, binary, and survival outcomes with compositional features. See Fei and others (2023) <doi:10.1101/2023.05.02.538599>. |
Authors: | Teng Fei [aut, cre, cph] , Tyler Funnell [aut] , Nicholas Waters [aut] , Sandeep Raj [aut] |
Maintainer: | Teng Fei <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.3.0 |
Built: | 2024-11-19 06:45:28 UTC |
Source: | CRAN |
Summarizing FLORAL
outputs from various choices of a
a.FLORAL( a = c(0.1, 0.5, 1), ncore = 1, seed = NULL, x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, mu = 1, maxiter = 100, ncv = 5, intercept = FALSE, step2 = FALSE, progress = TRUE )
a.FLORAL( a = c(0.1, 0.5, 1), ncore = 1, seed = NULL, x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, mu = 1, maxiter = 100, ncv = 5, intercept = FALSE, step2 = FALSE, progress = TRUE )
a |
vector of scalars between 0 and 1 for comparison. |
ncore |
Number of cores used for parallel computation. Default is to use only 1 core. |
seed |
A random seed for reproducibility of the results. By default the seed is the numeric form of |
x |
Feature matrix, where rows specify subjects and columns specify features. The first |
y |
Outcome. For a continuous or binary outcome, |
ncov |
An integer indicating the number of first |
family |
Available options are |
longitudinal |
|
id |
If |
tobs |
If |
failcode |
If |
corstr |
If a GEE model is specified, then |
scalefix |
|
scalevalue |
Specify the scale parameter if |
pseudo |
Pseudo count to be added to |
length.lambda |
Number of penalty parameters used in the path |
lambda.min.ratio |
Ratio between the minimum and maximum choice of lambda. Default is |
ncov.lambda.weight |
Weight of the penalty lambda applied to the first |
mu |
Value of penalty for the augmented Lagrangian |
maxiter |
Number of iterations needed for the outer loop of the augmented Lagrangian algorithm. |
ncv |
Folds of cross-validation. Use |
intercept |
|
step2 |
|
progress |
|
A ggplot2
object of cross-validated prediction metric versus lambda
, stratified by a
. Detailed data can be retrieved from the ggplot2
object itself.
Teng Fei. Email: [email protected]
Fei T, Funnell T, Waters N, Raj SS et al. Scalable Log-ratio Lasso Regression Enhances Microbiome Feature Selection for Predictive Models. bioRxiv 2023.05.02.538599.
set.seed(23420) dat <- simu(n=50,p=30,model="linear") pmetric <- a.FLORAL(a=c(0.1,1),ncore=1,x=dat$xcount,y=dat$y,family="gaussian",ncv=2,progress=FALSE)
set.seed(23420) dat <- simu(n=50,p=30,model="linear") pmetric <- a.FLORAL(a=c(0.1,1),ncore=1,x=dat$xcount,y=dat$y,family="gaussian",ncv=2,progress=FALSE)
Conduct log-ratio lasso regression for continuous, binary and survival outcomes.
FLORAL( x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, a = 1, mu = 1, maxiter = 100, ncv = 5, ncore = 1, intercept = FALSE, foldid = NULL, step2 = TRUE, progress = TRUE, plot = TRUE )
FLORAL( x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, a = 1, mu = 1, maxiter = 100, ncv = 5, ncore = 1, intercept = FALSE, foldid = NULL, step2 = TRUE, progress = TRUE, plot = TRUE )
x |
Feature matrix, where rows specify subjects and columns specify features. The first |
y |
Outcome. For a continuous or binary outcome, |
ncov |
An integer indicating the number of first |
family |
Available options are |
longitudinal |
|
id |
If |
tobs |
If |
failcode |
If |
corstr |
If a GEE model is specified, then |
scalefix |
|
scalevalue |
Specify the scale parameter if |
pseudo |
Pseudo count to be added to |
length.lambda |
Number of penalty parameters used in the path |
lambda.min.ratio |
Ratio between the minimum and maximum choice of lambda. Default is |
ncov.lambda.weight |
Weight of the penalty lambda applied to the first |
a |
A scalar between 0 and 1: |
mu |
Value of penalty for the augmented Lagrangian |
maxiter |
Number of iterations needed for the outer loop of the augmented Lagrangian algorithm. |
ncv |
Folds of cross-validation. Use |
ncore |
Number of cores for parallel computing for cross-validation. Default is 1. |
intercept |
|
foldid |
A vector of fold indicator. Default is |
step2 |
|
progress |
|
plot |
|
A list with path-specific estimates (beta), path (lambda), and others. Details can be found in README.md
.
Teng Fei. Email: [email protected]
Fei T, Funnell T, Waters N, Raj SS et al. Enhanced Feature Selection for Microbiome Data using FLORAL: Scalable Log-ratio Lasso Regression bioRxiv 2023.05.02.538599.
set.seed(23420) # Continuous outcome dat <- simu(n=50,p=30,model="linear") fit <- FLORAL(dat$xcount,dat$y,family="gaussian",ncv=2,progress=FALSE,step2=TRUE) # Binary outcome # dat <- simu(n=50,p=30,model="binomial") # fit <- FLORAL(dat$xcount,dat$y,family="binomial",progress=FALSE,step2=TRUE) # Survival outcome # dat <- simu(n=50,p=30,model="cox") # fit <- FLORAL(dat$xcount,survival::Surv(dat$t,dat$d),family="cox",progress=FALSE,step2=TRUE) # Competing risks outcome # dat <- simu(n=50,p=30,model="finegray") # fit <- FLORAL(dat$xcount,survival::Surv(dat$t,dat$d,type="mstate"),failcode=1, # family="finegray",progress=FALSE,step2=FALSE)
set.seed(23420) # Continuous outcome dat <- simu(n=50,p=30,model="linear") fit <- FLORAL(dat$xcount,dat$y,family="gaussian",ncv=2,progress=FALSE,step2=TRUE) # Binary outcome # dat <- simu(n=50,p=30,model="binomial") # fit <- FLORAL(dat$xcount,dat$y,family="binomial",progress=FALSE,step2=TRUE) # Survival outcome # dat <- simu(n=50,p=30,model="cox") # fit <- FLORAL(dat$xcount,survival::Surv(dat$t,dat$d),family="cox",progress=FALSE,step2=TRUE) # Competing risks outcome # dat <- simu(n=50,p=30,model="finegray") # fit <- FLORAL(dat$xcount,survival::Surv(dat$t,dat$d,type="mstate"),failcode=1, # family="finegray",progress=FALSE,step2=FALSE)
Summarizing FLORAL
outputs from multiple random k-fold cross validations
mcv.FLORAL( mcv = 10, ncore = 1, seed = NULL, x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, a = 1, mu = 1, maxiter = 100, ncv = 5, intercept = FALSE, step2 = TRUE, progress = TRUE, plot = TRUE )
mcv.FLORAL( mcv = 10, ncore = 1, seed = NULL, x, y, ncov = 0, family = "gaussian", longitudinal = FALSE, id = NULL, tobs = NULL, failcode = NULL, corstr = "exchangeable", scalefix = FALSE, scalevalue = 1, pseudo = 1, length.lambda = 100, lambda.min.ratio = NULL, ncov.lambda.weight = 0, a = 1, mu = 1, maxiter = 100, ncv = 5, intercept = FALSE, step2 = TRUE, progress = TRUE, plot = TRUE )
mcv |
Number of random 'ncv'-fold cross-validation to be performed. |
ncore |
Number of cores used for parallel computation. Default is to use only 1 core. |
seed |
A random seed for reproducibility of the results. By default the seed is the numeric form of |
x |
Feature matrix, where rows specify subjects and columns specify features. The first |
y |
Outcome. For a continuous or binary outcome, |
ncov |
An integer indicating the number of first |
family |
Available options are |
longitudinal |
|
id |
If |
tobs |
If |
failcode |
If |
corstr |
If a GEE model is specified, then |
scalefix |
|
scalevalue |
Specify the scale parameter if |
pseudo |
Pseudo count to be added to |
length.lambda |
Number of penalty parameters used in the path |
lambda.min.ratio |
Ratio between the minimum and maximum choice of lambda. Default is |
ncov.lambda.weight |
Weight of the penalty lambda applied to the first |
a |
A scalar between 0 and 1: |
mu |
Value of penalty for the augmented Lagrangian |
maxiter |
Number of iterations needed for the outer loop of the augmented Lagrangian algorithm. |
ncv |
Folds of cross-validation. Use |
intercept |
|
step2 |
|
progress |
|
plot |
|
A list with relative frequencies of a certain feature being selected over mcv
ncv
-fold cross-validations.
Teng Fei. Email: [email protected]
Fei T, Funnell T, Waters N, Raj SS et al. Scalable Log-ratio Lasso Regression Enhances Microbiome Feature Selection for Predictive Models. bioRxiv 2023.05.02.538599.
set.seed(23420) dat <- simu(n=50,p=30,model="linear") fit <- mcv.FLORAL(mcv=2,ncore=1,x=dat$xcount,y=dat$y,ncv=2,progress=FALSE,step2=TRUE,plot=FALSE)
set.seed(23420) dat <- simu(n=50,p=30,model="linear") fit <- mcv.FLORAL(mcv=2,ncore=1,x=dat$xcount,y=dat$y,ncv=2,progress=FALSE,step2=TRUE,plot=FALSE)
Simulate a dataset from log-ratio model.
simu( n = 100, p = 200, model = "linear", weak = 4, strong = 6, weaksize = 0.125, strongsize = 0.25, pct.sparsity = 0.5, rho = 0, timedep_slope = NULL, timedep_cor = NULL, longitudinal_stability = TRUE, ncov = 0, betacov = 0, intercept = FALSE )
simu( n = 100, p = 200, model = "linear", weak = 4, strong = 6, weaksize = 0.125, strongsize = 0.25, pct.sparsity = 0.5, rho = 0, timedep_slope = NULL, timedep_cor = NULL, longitudinal_stability = TRUE, ncov = 0, betacov = 0, intercept = FALSE )
n |
An integer of sample size |
p |
An integer of number of features (taxa). |
model |
Type of models associated with outcome variable, can be "linear", "binomial", "cox", "finegray", or "timedep" (survival endpoint with time-dependent features). |
weak |
Number of features with |
strong |
Number of features with |
weaksize |
Actual effect size for |
strongsize |
Actual effect size for |
pct.sparsity |
Percentage of zero counts for each sample. |
rho |
Parameter controlling the correlated structure between taxa. Ranges between 0 and 1. |
timedep_slope |
If |
timedep_cor |
If |
longitudinal_stability |
If |
ncov |
Number of covariates that are not compositional features. |
betacov |
Coefficients corresponding to the covariates that are not compositional features. |
intercept |
Boolean. If TRUE, then a random intercept will be generated in the model. Only works for |
A list with simulated count matrix xcount
, log1p-transformed count matrix x
, outcome (continuous y
, continuous centered y0
, binary y
, or survival t
, d
), true coefficient vector beta
, list of non-zero features idx
, value of intercept intercept
(if applicable).
Teng Fei. Email: [email protected]
Fei T, Funnell T, Waters N, Raj SS et al. Enhanced Feature Selection for Microbiome Data using FLORAL: Scalable Log-ratio Lasso Regression bioRxiv 2023.05.02.538599.
set.seed(23420) dat <- simu(n=50,p=30,model="linear")
set.seed(23420) dat <- simu(n=50,p=30,model="linear")