Title: | Compositional Models to Longitudinal Microbiome Data |
---|---|
Description: | Implementation of models to analyse compositional microbiome time series taking into account the interaction between groups of bacteria. The models implemented are described in Creus-Martí et al (2018, ISBN:978-84-09-07541-6), Creus-Martí et al (2021) <doi:10.1155/2021/9951817> and Creus-Martí et al (2022) <doi:10.1155/2022/4907527>. |
Authors: | Irene Creus Martí [aut, cre] |
Maintainer: | Irene Creus Martí <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-20 21:29:37 UTC |
Source: | CRAN |
This package contain functions to model compositional and longitudinal microbiome datasets. This package contains the functions needed to execute the models published in the articles:
Creus-Martí, I., Moya, A., Santonja, F. J. (2018), A Statistical Model with a Lotka-Volterra Structure for Microbiota Data, Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling for engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar, ISBN: 978-84-09-07541-6.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
In addition, the package contains one real dataset extracted from:
Marín-Miret, J., Pérez-Cobas, A. E., Domínguez-Santos, R., Pérez-Rocher, B., Latorre, A., & Moya, A. (2024). Adaptability of the gut microbiota of the German cockroach Blattella germanica to a periodic antibiotic treatment. Microbiological Research, 287, 127863.
We refer to the model described in Creus-Martí (2018) as Dirich-gLV, we refer to the model described in Creus-Martí (2021) as FBM and we refer to the model described in Creus-Martí (2022) as BPBM.
This package includes files in the directory 'inst/extdata'. Users can access these files using 'system.file()'. The following files are available:
README.pdf
, README.Rmd
, README.R
: Basic instructions for using the package.
1-s2.0-S0944501324002647-mmc6.xlsx
: Original cockroach dataset extracted from Marín-Miret et al (2024).
Simulated.R
: Code use to obtain the Simulated dataset.
cockroach.R
: Code use to obtain the cockroach dataset.
To access these files, use:
base::system.file("extdata", "filename", package = "CoDaLoMic")
For instance:
base::system.file("extdata", "README.pdf", package = "CoDaLoMic")
On Windows, the PDF can be opened as follows:
base::shell.exec(system.file("extdata", "README.pdf", package = "CoDaLoMic"))
Maintainer: Irene Creus Martí [email protected] (ORCID)
Defining the part of the ridge regression matrix that carries the information of the bacteria especieII
.
B1MODImatrizP(Tt, especieII, especie, E, EspecieMaxima)
B1MODImatrizP(Tt, especieII, especie, E, EspecieMaxima)
Tt |
Number of time points available |
especieII |
Number. The number of the row in which the bacteria that we want to use is placed in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
E |
Number of bacteria available |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
Returns a matrix. The first column contain the number 1 repeated Tt
times. The second column contains
the alr transformation of the especieII
in all time points. The third column
contains the balance (whose numerator has all the bacteria except especieII
and EspecieMaxima
and the denominator
contains the EspecieMaxima
) in all time points.
Tt=2 especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) especieII=1 E=3 EspecieMaxima=3 B1MODImatrizP(Tt, especieII,especie1, E, EspecieMaxima)
Tt=2 especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) especieII=1 E=3 EspecieMaxima=3 B1MODImatrizP(Tt, especieII,especie1, E, EspecieMaxima)
This function calculates the value of the FBM regression, defined by:
B1MODImodel(A, especie, E, EspecieMaxima, Tt)
B1MODImodel(A, especie, E, EspecieMaxima, Tt)
A |
Matrix of dimensions ( |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
E |
Number of bacteria available |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
Tt |
Number of time points available. |
Returns a matrix. The row i contains the regression values of the bacteria i at all time points.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
df<-data.frame(cbind(c(0.1,0.1,0.8),c(0.2,0.1,0.7))) E=3 EspecieMaxima=3 set.seed(724) A=matrix(c(-2:3),2,3) Tt=2 B1MODImodel(A,df, E, EspecieMaxima,Tt)
df<-data.frame(cbind(c(0.1,0.1,0.8),c(0.2,0.1,0.7))) E=3 EspecieMaxima=3 set.seed(724) A=matrix(c(-2:3),2,3) Tt=2 B1MODImodel(A,df, E, EspecieMaxima,Tt)
Defining a balance where we compare all the bacteria (except the one chosen as reference and the especieI
) with the one chosen as reference.
Balance(A, especieI, especie, E, EspecieMaxima)
Balance(A, especieI, especie, E, EspecieMaxima)
A |
Number of time points for which we calculate the balance |
especieI |
Number. The bacteria that we do not include in the balance. We must write the number of the row in which the bacteria is placed in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
E |
Number of bacteria available |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
Returns a vector with the value of the balance for all the time points indicated.
Balance(2,2,cbind(c(0.1,0.1,0.8),c(0.2,0.1,0.7)),3,3)
Balance(2,2,cbind(c(0.1,0.1,0.8),c(0.2,0.1,0.7)),3,3)
This function writes the matrix of covariates of the BPBM.
BPBM_Matrix(rows.position, PB, Tt)
BPBM_Matrix(rows.position, PB, Tt)
rows.position |
Vector. Vector with the number of the rows where the SPBal are in the matrix PB. We write first the row in which the balance with higher variance is placed, then the row in which the balance with second higher variance is placed... |
PB |
Matrix. Each line os the matrix PB contains the values of one principal balance at all time points. |
Tt |
Number of time points available. |
In an example with two SPBal and three time points, the covariates are written in the following order:
1 | 1 | 1 |
|
|
|
|
|
|
Returns a matrix with the covariates of the model.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
matt=matrix(c(1:12),3,4) rows.position=c(2,3) BPBM_Matrix(rows.position,matt,4)
matt=matrix(c(1:12),3,4) rows.position=c(2,3) BPBM_Matrix(rows.position,matt,4)
Gut microbiome dataset of a Blatella germanica cockroach treated by kanamycin during three periods of time (days: 1–10, 36–45, 71–80). The data is extracted from Marín-Miret et al (2024), more specifically, the data is the information of the K3 cockroach in the article. The dataset contains 105 time points and 210 genera.
data(cockroach)
data(cockroach)
A data frame with 105 rows and 211 columns.
Marín-Miret, J., Pérez-Cobas, A. E., Domínguez-Santos, R., Pérez-Rocher, B., Latorre, A., & Moya, A. (2024). Adaptability of the gut microbiota of the German cockroach Blattella germanica to a periodic antibiotic treatment. Microbiological Research, 287, 127863.
This function calculates the estimated parameters of the Dirich-gLV model.
Estimate_Param_EstParmFunc(Iter.EstParmFunc, paramini, especie, seed = NULL)
Estimate_Param_EstParmFunc(Iter.EstParmFunc, paramini, especie, seed = NULL)
Iter.EstParmFunc |
Number. Number of iterations. |
paramini |
Initial values of the parameters. Vector equal to |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of the matrix will be used as reference in the alr transformation. |
seed |
Number. Set a seed. Default
|
Maximum likelihood estimation is used. This function makes an iterative process, it obtains the value of the parameter that maximize the Dirichlet loglikelihood (defined in EstParmFunc) using the Nelder-Mead method and some initial parameters. Then it uses this value as initial parameters and repeats the process Iter.EstParmFunc
times.
Returns a list with:
All.iter: Matrix. Each row has the parameters obtained in each iteration. The parameters are in the columns written in the same order that they are written in paramini
. In this matrix we must observe that in the last iterations the values has really similar or equal values, it not, we need to increase the value of Iter.EstParmFunc
.
Param.Estimates: The estimated parameters. The parameters are in the columns written in the same order that they are written in paramini
.
Creus-Martí, I. and Moya, A. and Santonja, F. J. (2018). A Statistical Model with a Lotka-Volterra Structure for Microbiota Data. Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling for engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar. ISBN: 978-84-09-07541-6
especie=cbind(c(0.5,0.3,0.2),c(0.1,0.3,0.6)) paramini=c(100,2,3,4,5,6,7) Estimate_Param_EstParmFunc(5, paramini , especie,714)
especie=cbind(c(0.5,0.3,0.2),c(0.1,0.3,0.6)) paramini=c(100,2,3,4,5,6,7) Estimate_Param_EstParmFunc(5, paramini , especie,714)
This function estimates the parameters of the FBM model.
Estimate_Param_FBM( tau, ridge.final, Iter.EstParmFunc = 80, especie, EspecieMaxima, Tt, E, seed = NULL )
Estimate_Param_FBM( tau, ridge.final, Iter.EstParmFunc = 80, especie, EspecieMaxima, Tt, E, seed = NULL )
tau |
Number. Value of the tau parameter. |
ridge.final |
Object of class "ridgelm". Values obtained with the ridge regression. |
Iter.EstParmFunc |
Number. Number of iterations. Default: 80 iterations. |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of the matrix will be used as reference in the alr transformation and will be at the denominator of the balance. |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
Tt |
Number of time points available |
E |
Number. Number of bacteria available. |
seed |
Number. Set a seed. Default |
Maximum likelihood estimation is used. This function makes an iterative process, for a given
value of tau, it obtains the value of the rest of the parameters that maximize the dirichlet loglikelihood
(defined in EstParmFunc_FBM) using the Nelder-Mead method and the values obtained in the ridge regression as initial parameters.
Then it uses the values obtained as initial parameters and repeats the process Iter.EstParmFunc
times.
The regression of this model is defined by
Returns a list with:
All.iter: Matrix. Each row has the parameters obtained in each iteration. The parameters are in the columns written in following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix especie
. In this matrix we must observe that in the last iterations the values has really similar or equal values, it not, we need to increase the value of Iter.EstParmFunc
.
Param.Estimates: Vector with the estimated parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix especie
.
AIC Number: Value of the AIC.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
set.seed(123) especie=t(gtools::rdirichlet(5,c(1,3,1))) Tt=5 E=3 EspecieMaxima=3 ridge.final=ridgeregression(Tt,especie, E, EspecieMaxima) tau=20 Iter.EstParmFunc=40 Estimate_Param_FBM(tau,ridge.final,Iter.EstParmFunc, especie,EspecieMaxima,Tt,E, 714)
set.seed(123) especie=t(gtools::rdirichlet(5,c(1,3,1))) Tt=5 E=3 EspecieMaxima=3 ridge.final=ridgeregression(Tt,especie, E, EspecieMaxima) tau=20 Iter.EstParmFunc=40 Estimate_Param_FBM(tau,ridge.final,Iter.EstParmFunc, especie,EspecieMaxima,Tt,E, 714)
The estimation of the BPBM model is carried out using MCMC. To execute this function it is necessary to have the program Just Another Gibbs Sampler (JAGS) (Plummer, 2003) program installed.
Estimating_BPBM( especie, Tt, E, MatrizPBmodelo, nn.chain = 3, nn.burnin = 5000, nn.sample = 20000, nn.thin = 10, seed = NULL )
Estimating_BPBM( especie, Tt, E, MatrizPBmodelo, nn.chain = 3, nn.burnin = 5000, nn.sample = 20000, nn.thin = 10, seed = NULL )
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
|||||||||
Tt |
Number of time points available. |
|||||||||
E |
Number of bacteria in the dataset. |
|||||||||
MatrizPBmodelo |
Matrix with the covariates of the model. In an example with two SPBal and three time points, the covariates are written in the following order:
|
|||||||||
nn.chain |
the number of chains to use with the simulation. Default is 3, minimum2. |
|||||||||
nn.burnin |
the number of burnin iterations. Default is 5000. |
|||||||||
nn.sample |
the number of iterations to take. Default: 20000. The markov chain will have ("sample"-"burnin")/"thin" iterations. |
|||||||||
nn.thin |
the thinning interval to be used. Default: 10. |
|||||||||
seed |
Number. Set a seed. Default |
Returns a list with:
List with:
R2jagsOutput: R2jags object with the information of the estimation.
SamplesAllChains: Matrix. Matrix that has the iterations of all the Markov chains joined.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
Plummer, M. (2003, March). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, No. 125.10, pp. 1-10).
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=2000, nn.thin=10, 714)
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=2000, nn.thin=10, 714)
This function calculates the loglikelihood of the dirichlet for the Dirich-gLV model.
EstParmFunc(parms.vector, especie)
EstParmFunc(parms.vector, especie)
parms.vector |
Vector equal to
|
||||||
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points . The bacteria placed in the last row of this matrix is the one used as reference in the alr transfromation that the model apply |
In an example with three bacteria, the regression of this model is defined by
Returns a number with the value of the dirichlet loglikelihood.
Creus-Martí, I. and Moya, A. and Santonja, F. J. (2018). A Statistical Model with a Lotka-Volterra Structure for Microbiota Data. Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling or engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar. ISBN: 978-84-09-07541-6
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(tau1,as.vector( parms1)) EstParmFunc(parms11,especie1)
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(tau1,as.vector( parms1)) EstParmFunc(parms11,especie1)
This function calculates the loglikelihood of the dirichlet for the FBM model.
EstParmFunc_FBM(param, especie, E, EspecieMaxima, Tt, especiemodi)
EstParmFunc_FBM(param, especie, E, EspecieMaxima, Tt, especiemodi)
param |
Vector with the parameters in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
E |
Number of bacteria available |
EspecieMaxima |
Row in which the bacterial with maximum mean abundance is in |
Tt |
Number of bacteria available |
especiemodi |
Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,..., |
The regression of this model is defined by
Returns a number with the value of the dirichlet loglikelihood.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) especiemodi=especie1[,-1] tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(as.vector( t(parms1)),tau1) EstParmFunc_FBM(parms11,especie1,3 ,3 , 2,especiemodi)
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) especiemodi=especie1[,-1] tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(as.vector( t(parms1)),tau1) EstParmFunc_FBM(parms11,especie1,3 ,3 , 2,especiemodi)
This function calculates the value of the Dirichlet parameters, the expected value and the variance for the Dirich-gLV model.
ExpectedValue_EstParmFunc(Param.Estimates, especie)
ExpectedValue_EstParmFunc(Param.Estimates, especie)
Param.Estimates |
Vector with the estimated parameters. This value is the output of the Estimate_Param_EstParmFunc function. |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of this matrix is the one used as reference in the alr transformation that the model applies. |
Returns a list with:
Dirichlet.Param: Matrix. Matrix that contains at row i the Dirichlet parameter of the bacteria i at all time points.
Expected.Value: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Variance.Value: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Creus-Martí, I. and Moya, A. and Santonja, F. J. (2018). A Statistical Model with a Lotka-Volterra Structure for Microbiota Data. Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling or engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar. ISBN: 978-84-09-07541-6
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(tau1,as.vector( parms1)) ExpectedValue_EstParmFunc(parms11,especie1)
especie1=cbind(c(0.5,0.3,0.2), c(0.1,0.3,0.6)) tau1=0.4 parms1= cbind(c(0.1,0.2),c(-0.2,0.1),c(0.3,0.2)) parms11=c(tau1,as.vector( parms1)) ExpectedValue_EstParmFunc(parms11,especie1)
This function calculates the value of the dirichlet parameters, the expected value and the variance for the FBM model.
ExpectedValues_EstParmFunc_FBM( paramEstimadosFinal, especie, E, EspecieMaxima, Tt )
ExpectedValues_EstParmFunc_FBM( paramEstimadosFinal, especie, E, EspecieMaxima, Tt )
paramEstimadosFinal |
The estimate parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
E |
Number of bacteria available. |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
Tt |
Number of time points available. |
The regression of this model is defined by
Returns a list with:
Dirichlet.Param: Matrix. Matrix that contains at row i the dirichlet parameter of the bacteria i at all time points.
Expected.Value: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Variance.Value: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
set.seed(123) especie=t(gtools::rdirichlet(2,c(1,1,3))) Tt=2 E=3 tau=5 EspecieMaxima=3 Iter.EstParmFunc=5 parms11=c(0.1,0.2,0.3,0.4,0.5,0.6,tau) ExpectedValues_EstParmFunc_FBM(parms11 , especie,E,EspecieMaxima,Tt)
set.seed(123) especie=t(gtools::rdirichlet(2,c(1,1,3))) Tt=2 E=3 tau=5 EspecieMaxima=3 Iter.EstParmFunc=5 parms11=c(0.1,0.2,0.3,0.4,0.5,0.6,tau) ExpectedValues_EstParmFunc_FBM(parms11 , especie,E,EspecieMaxima,Tt)
This function calculates the value of the dirichlet parameters, the expected value and the variance for the BPBM model.
ExpectedValuess_BPBM(Estimated.Param, MatrizPBmodelo, E, Tt)
ExpectedValuess_BPBM(Estimated.Param, MatrizPBmodelo, E, Tt)
Estimated.Param |
Vector with the estimate parameters. Column "mean" of the output of "StudyingParam" function. |
MatrizPBmodelo |
Matrix. Output of "ObtainingValueSPBal" called "MatrixSPBal". |
E |
Number of bacteria available. |
Tt |
Number of time points available. |
The regression of this model is defined by:
Returns a list with:
Dirichlet.Param: Matrix. Matrix that contains at row i the dirichlet parameter of the bacteria i at all time points.
Expected.Value: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Variance.Value: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points. The bacterias are placed at the same orden than in especies
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
Tt=3 E=3 Estimated.Param=c(0.1 ,0.4, 0.7, 0.2 ,0.5, 0.8 ,0.3, 0.6, 0.9, 0.1, 0.4 ,0.7, 0.2, 0.5, 0.8, 0.3 ,0.6, 0.9) MatrizPBmodelo=rbind(c(1,1,1),c(0.3,0.6,-0.1),c(0.2,-0.4,0.3)) ExpectedValuess_BPBM(Estimated.Param,MatrizPBmodelo,E,Tt)
Tt=3 E=3 Estimated.Param=c(0.1 ,0.4, 0.7, 0.2 ,0.5, 0.8 ,0.3, 0.6, 0.9, 0.1, 0.4 ,0.7, 0.2, 0.5, 0.8, 0.3 ,0.6, 0.9) MatrizPBmodelo=rbind(c(1,1,1),c(0.3,0.6,-0.1),c(0.2,-0.4,0.3)) ExpectedValuess_BPBM(Estimated.Param,MatrizPBmodelo,E,Tt)
StudyingParam
returns a matrix where the value of the parameters is in the column "mean". This function inputs this columns and outputs the parameters in the matrix form required by the BPBM model.
FromVectorToMatrix_BPBM(param, MatrizPBmodelo, E)
FromVectorToMatrix_BPBM(param, MatrizPBmodelo, E)
param |
Vector. Column "mean" of the output of "StudyingParam" function. |
|||||||||
MatrizPBmodelo |
Matrix with the covariates of the model. In an example with two SPBal and three time points, the covariates are written in the following order:
|
|||||||||
E |
Number of bacteria in the dataset. |
Returns a matrix with the parameters in the order required by the BPBM model.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1), c(-0.3,0.4,0.3,-0.7,-0.4,-0.6), c(0.3,0.5,-0.3,0.1,0.4,0.1)) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) mm=StudyingParam(est$R2jagsOutput$BUGSoutput$summary,est$SamplesAllChains) FromVectorToMatrix_BPBM(mm$Param.Summary[,"mean"],MatrizPBmodelo,5)
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1), c(-0.3,0.4,0.3,-0.7,-0.4,-0.6), c(0.3,0.5,-0.3,0.1,0.4,0.1)) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) mm=StudyingParam(est$R2jagsOutput$BUGSoutput$summary,est$SamplesAllChains) FromVectorToMatrix_BPBM(mm$Param.Summary[,"mean"],MatrizPBmodelo,5)
Calculates a vector with the covariates of the BPBM model in one time point.
FVectorPBmodeloPredi(NumSPBal, DemSPBal, v, MatrizPBmodelo)
FVectorPBmodeloPredi(NumSPBal, DemSPBal, v, MatrizPBmodelo)
NumSPBal |
List. The component i of the list has the number of the row of the matrix |
DemSPBal |
List. The component i of the list has the number of the row of the matrix |
v |
Vector. Vector with a coda composition. The bacteria ar in the same orden than the matrix |
MatrizPBmodelo |
the matrix that contains the covariates of the model. The first line es equal to 1 for all columns. The other rows contain the value of one SPBal at all time points. The selected principal balance of the row i+1 has at its numerator the bacteria placed in the rows |
Returns a vector where the first component is a 1 and the following components have the values of the SPBal. The SPBal in the component i+1 has at its numerator the bacteria placed in the rows Num[[i]]
of the especie
. The SPBal of the component i+1 has at its denominator the bacteria placed in the rows Dem[[i]]
of the especie
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
v=c(0.1,0.1,0.2,0.3,0.3) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) MatrizPBmodelo=rbind(c(1,1,1,1), c(-0.3,0.2,0.5,0.6), c(-0.4,0.3,0.5,0.6), c(0.5,0.3,0.2,0.7), c(-0.2,0.9,0.2,0.1) ) FVectorPBmodeloPredi(Num2,Dem2,v,MatrizPBmodelo)
v=c(0.1,0.1,0.2,0.3,0.3) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) MatrizPBmodelo=rbind(c(1,1,1,1), c(-0.3,0.2,0.5,0.6), c(-0.4,0.3,0.5,0.6), c(0.5,0.3,0.2,0.7), c(-0.2,0.9,0.2,0.1) ) FVectorPBmodeloPredi(Num2,Dem2,v,MatrizPBmodelo)
Plots the time series
Graphics(especie, names.especie, esperanza, Variance, Plot.Tipe, Detail)
Graphics(especie, names.especie, esperanza, Variance, Plot.Tipe, Detail)
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
names.especie |
Vector with the names of the bacteria in the same order that are placed in the |
esperanza |
Matrix that contains at row i the expected value of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
Variance |
Matrix that contains at row i the variance of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
Plot.Tipe |
Character. If |
Detail |
Character. If |
Returns the indicated plots.
names.especie=c("Bact1", "Bact2", "Bact3") especie=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1),c(0.4,0.1,0.5),c(0.4,0.1,0.5)) esperanza=especie[,c(1:3)]+0.1 Variance=matrix(c(runif(9,0.001,0.004)), 3,3) Graphics(especie, names.especie, esperanza, Variance,"Data","no") Graphics(especie, names.especie, esperanza, Variance,"DataExpected","no") Graphics(especie, names.especie, esperanza, Variance,"All","no")
names.especie=c("Bact1", "Bact2", "Bact3") especie=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1),c(0.4,0.1,0.5),c(0.4,0.1,0.5)) esperanza=especie[,c(1:3)]+0.1 Variance=matrix(c(runif(9,0.001,0.004)), 3,3) Graphics(especie, names.especie, esperanza, Variance,"Data","no") Graphics(especie, names.especie, esperanza, Variance,"DataExpected","no") Graphics(especie, names.especie, esperanza, Variance,"All","no")
This function takes into account the data used to estimate and the data used to predict.
GraphicsPrediction( especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred, Plot.Tipe, Detail )
GraphicsPrediction( especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred, Plot.Tipe, Detail )
especie.All |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
names.especie |
Vector with the names of the bacteria in the same order that are placed in the |
ExpectedValue.All |
Matrix that contains at row i the expected value of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
VarianceValue.All |
Matrix that contains at row i the variance of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
Pred |
Number. Indicates the time point in which we start predicting. |
Plot.Tipe |
Character. If |
Detail |
Character. If |
Returns the indicated plots with a vertical line when the time point is equal to Pred-1, in Pred the prediction has started.
names.especie=c("Bact1", "Bact2", "Bact3") especie.All=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5)) ExpectedValue.All=especie.All[,-1]+0.1 VarianceValue.All=matrix(c(runif(15,0.001,0.004)), 3,5) Pred=4 GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "Data", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "DataExpected", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "All", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "Var", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "OnlyVar", "no")
names.especie=c("Bact1", "Bact2", "Bact3") especie.All=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5)) ExpectedValue.All=especie.All[,-1]+0.1 VarianceValue.All=matrix(c(runif(15,0.001,0.004)), 3,5) Pred=4 GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "Data", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "DataExpected", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "All", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "Var", "no") GraphicsPrediction(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred, "OnlyVar", "no")
This function takes into account the data used to estimate and the data used to predict. We use this function when we want to observe the results obtained with the BPBM model.
GraphicsPredictionBPBM( especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred, Plot.Tipe, Varmas, Varmenos, Detail )
GraphicsPredictionBPBM( especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred, Plot.Tipe, Varmas, Varmenos, Detail )
especie.All |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
names.especie |
Vector with the names of the bacteria in the same order that are placed in the |
ExpectedValue.All |
Matrix that contains at row i the expected value of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
VarianceValue.All |
Matrix that contains at row i the variance of the bacterial taxa i at all time points. The bacteria must be placed in the same order than in |
Pred |
Number. Indicates the time point in which we start predicting. |
Plot.Tipe |
Character. If |
Varmas |
Matrix. Output of "PredictionBPBM" adding "$ExpVarmas". Matrix that contains at row i the expected value plus two times the sqrt(variance) of the bacteria i at all time points t= |
Varmenos |
Matrix. Output of "PredictionBPBM" adding "$ExpVarmenos". Matrix that contains at row i the expected value minus two times the sqrt(variance) of the bacteria i at all time points t= |
Detail |
Character. If |
Returns the indicated plots with a vertical line when the time point is equal to Tt
=Pred-1, in Pred=Tt
+1 the predicction has started.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
names.especie=c("Bact1", "Bact2", "Bact3") especie.All=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5)) ExpectedValue.All=especie.All[,-1]+0.1 VarianceValue.All=matrix(c(runif(15,0.001,0.004)), 3,5) Pred=4 Varmas=cbind(matrix(0,3,2),matrix(c(runif(9,0.001,0.004)) ,3 ,3 )) Varmenos=cbind(matrix(0,3,2),matrix(c(runif(9,0.001,0.004)) ,3 ,3 )) GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "Data", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "DataExpected", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred , "All", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "Var", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "OnlyVar", Varmas, Varmenos, "no")
names.especie=c("Bact1", "Bact2", "Bact3") especie.All=cbind(c(0.5,0.3,0.2), c(0.6,0.3,0.1), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5), c(0.4,0.1,0.5)) ExpectedValue.All=especie.All[,-1]+0.1 VarianceValue.All=matrix(c(runif(15,0.001,0.004)), 3,5) Pred=4 Varmas=cbind(matrix(0,3,2),matrix(c(runif(9,0.001,0.004)) ,3 ,3 )) Varmenos=cbind(matrix(0,3,2),matrix(c(runif(9,0.001,0.004)) ,3 ,3 )) GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "Data", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "DataExpected", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All, Pred , "All", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "Var", Varmas, Varmenos, "no") GraphicsPredictionBPBM(especie.All, names.especie, ExpectedValue.All, VarianceValue.All , Pred , "OnlyVar", Varmas, Varmenos, "no")
The SPBal (of BPBM model) are ordered from highest to lowest variance percentage. The zero is highlight because the closer the value of the balance is to zero, the more similar (in terms of relative abundance) the numerator and denominator groups will be. The farther away from zero, the more different.
GraphicsSPBal(MatrizPBmodelo)
GraphicsSPBal(MatrizPBmodelo)
MatrizPBmodelo |
Matrix. Output of "ObtainigValueSPBal" function. MatrixSPBal is the matrix that contains the covariates of the model. The first line es equal to 1 for all columns. The other rows contain the value of one SPBal at all time points. The selected principal balance of the row i+1 has at its numerator the bacteria placed in the rows |
Returns a graphic.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) GraphicsSPBal(MatrizPBmodelo)
MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) GraphicsSPBal(MatrizPBmodelo)
This function calculates the loglikelihood of the dirichlet for the BPBM model.
LogVeroFuncBUENA(param, MatrizPBmodelo, E, Tt, especiemodi)
LogVeroFuncBUENA(param, MatrizPBmodelo, E, Tt, especiemodi)
param |
Vector. Column "mean" of the output of "StudyingParam" function. |
|||||||||
MatrizPBmodelo |
Matrix with the covariates of the model. In an example with two SPBal and three time points, the covariates are written in the following order:
|
|||||||||
E |
Number f bacteria in the dataset. |
|||||||||
Tt |
Number of time points available |
|||||||||
especiemodi |
Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,..., |
Returns a number with the value of the dirichlet loglikelihood.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
set.seed(314) especie=t(gtools::rdirichlet(n=2, c(1,2,3))) E=3 Tt=2 MatrizPBmodelo=rbind(c(1,1),c(-0.3,0.4),c(0.3,0.5)) set.seed(314) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) param=est$SamplesAllChains especiemodi=especie[,-1] LogVeroFuncBUENA(param,MatrizPBmodelo,E,Tt,especiemodi)
set.seed(314) especie=t(gtools::rdirichlet(n=2, c(1,2,3))) E=3 Tt=2 MatrizPBmodelo=rbind(c(1,1),c(-0.3,0.4),c(0.3,0.5)) set.seed(314) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) param=est$SamplesAllChains especiemodi=especie[,-1] LogVeroFuncBUENA(param,MatrizPBmodelo,E,Tt,especiemodi)
This function calculates the mean abundance of each bacteria. Then, it creates a matrix where each row contains the abundance of one bacteria at all time points but the bacteria with maximum (or minimum or a bacteria indicated by the user) mean abundance is placed at the last row
MaxBacteria(nombresOriginal, especieOriginal, E, Tt, which.esp)
MaxBacteria(nombresOriginal, especieOriginal, E, Tt, which.esp)
nombresOriginal |
Vector with the bacterial names at the same order than in DaTa. it must be fulfilled that lenght(nombresOriginal)==dim(DaTa)[2]-1 |
especieOriginal |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points |
E |
Number of bacteria available |
Tt |
Number of bacteria available |
which.esp |
If |
Returns a list with
especie
- Matrix that contains at row i the bacterial taxa of bacteria i at all time points but the bacteria with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
especiemodi
- Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,...,Tt
but the bacteria with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
nombres
- Vector with the bacteria's names placed in the order in which appear in the rows of the matrices especie
and especiemodi
EE
- Row in which the bacterial with maximum (or minimum) mean abundance was (or the value of "which" if which is numerical).
EspecieMaxima
- Row in which the bacterial with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is in especie
.)
#'
names1=c("Bact1","Bact2","Bact3") set.seed(314) esp1=t(gtools::rdirichlet(n=4, c(1,3,1))) e1=3 t1=4 MaxBacteria(names1,esp1,e1,t1,"Max") names3=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp3=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) e3=5 t3=6 MaxBacteria(names3,esp3,e3,t3,"Min")
names1=c("Bact1","Bact2","Bact3") set.seed(314) esp1=t(gtools::rdirichlet(n=4, c(1,3,1))) e1=3 t1=4 MaxBacteria(names1,esp1,e1,t1,"Max") names3=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp3=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) e3=5 t3=6 MaxBacteria(names3,esp3,e3,t3,"Min")
This function calculates the mean abundance of each bacteria taking into account the time points used to estimate the model (t=1,2,...,Tt
). Then, it creates a matrix where each row contains the abundance of one bacteria at all time points but the bacteria with maximum (or minimum) mean abundance (or the bacterial indicated by the user) is placed at the last row
MaxBacteriaPred( nombresOriginal, especieOriginal, E, Tt, Pred, K, especieOriginal.All, which.esp )
MaxBacteriaPred( nombresOriginal, especieOriginal, E, Tt, Pred, K, especieOriginal.All, which.esp )
nombresOriginal |
Vector with the bacterial names at the same order than in DaTa. it must be fulfilled that lenght(nombresOriginal)==dim(DaTa)[2]-1 |
especieOriginal |
Matrix that contains at row i the bacterial taxa of bacteria i at t=1,2,..., |
E |
Number of bacteria available |
Tt |
Number of time points used to estimate the model ( |
Pred |
Number. The data at t=1,...,Pred-1 will be used to estimate the model. The rest of the time points will be used to study the capacity of the model to predict. If |
K |
Number of time points at the data |
especieOriginal.All |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
which.esp |
If |
Returns a list with
especie
- Matrix that contains at row i the bacterial taxa of bacteria i at time points t=1,2,...,Tt
but the bacteria with with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
especiemodi
- Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,...,Tt
but the bacteria with with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
nombres
- Vector with the bacteria's names placed in the order in which appear in the rows of the matrices especie
and especiemodi
EE
- Row in which the bacterial with maximum (or minimum) mean abundance was (or the value of "which" if which is numerical).
EspecieMaxima
- Row in which the bacterial with with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is in especie
.)
#' #'
especie.all
- Matrix that contains at row i the bacterial taxa of bacteria i at all time points (t=1,2,...,K) but the bacteria with with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
especiemodi.all
- Matrix that contains at row i the bacterial taxa of bacteria i at all time points (t=2,...,K) but the bacteria with with maximum (or minimum) mean abundance (or the bacteria indicated by the user) is placed at the last row.
names2=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) e2=5 MaxBacteriaPred(names2,esp2[,-c(4,5,6)],e2,3,Pred=4, 6,esp2, "Max")
names2=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) e2=5 MaxBacteriaPred(names2,esp2[,-c(4,5,6)],e2,3,Pred=4, 6,esp2, "Max")
Calculates the value of the principal balances (Martín-Fernández et al, 2018) at all time points.
ObtainigValuePB(Num, Dem, especie, Tt)
ObtainigValuePB(Num, Dem, especie, Tt)
Num |
List. The component i of the list has the number of the row of the matrix |
Dem |
List. The component i of the list has the number of the row of the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
Tt |
Number of time points available |
Returns a matrix where the row i has the value of the principal balance at all time points. The principal balance of the row i has at its numerator the bacteria placed in the rows Num[[i]]
of the especie
. The principal balance of the row i has at its denominator the bacteria placed in the rows Dem[[i]]
of the especie
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
Martín-Fernández, J. A., Pawlowsky-Glahn, V., Egozcue, J. J., & Tolosona-Delgado, R. (2018). Advances in principal balances for compositional data. Mathematical Geosciences, 50, 273-298.
set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) ObtainigValuePB(Num2,Dem2,esp2,6)
set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) ObtainigValuePB(Num2,Dem2,esp2,6)
Calculates the value of the selected principal balances (SPBal) of the BPBM model at all time points.
ObtainigValueSPBal(Num, Dem, especie, Tt)
ObtainigValueSPBal(Num, Dem, especie, Tt)
Num |
List. The component i of the list has the number of the row of the matrix |
Dem |
List. The component i of the list has the number of the row of the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
Tt |
Number of time points available |
Returns a list with:
NumSPBal: List. The component i of the list has the number of the row of the matrix especie
where the bacteria in the numerator of the selected principal balance i are placed.
DemSPBal: List. The component i of the list has the number of the row of the matrix especie
where the bacteria in the denominator of the selected principal balance i are placed.
MatrixSPBal: MatrixSPBal is the matrix that contains the covariates of the model. The first line es equal to 1 for all columns. The other rows contain the value of one SPBal at all time points. The selected principal balance of the row i+1 has at its numerator the bacteria placed in the rows NumSPBal[[i]]
of the "especie". The selected principal balance of the row i+1 has at its denominator the bacteria placed in the rows DemSPBal[[i]]
of the "especie".
PercenVarianceSPBal: Vector. The component of the vector i contains the percentage of variance of the SPBal with numerator NumSPBal[[i]]
and denominator DemSPBal[[i]]
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) ObtainigValueSPBal(Num2,Dem2,esp2,6)
set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) Num2<-list(3,c(3,5),1,c(3,5,4)) Dem2<-list(5,4,2,c(1,2)) ObtainigValueSPBal(Num2,Dem2,esp2,6)
This function calculates the loglikelihood of the dirichlet for the BPBM model. Then, it calculates the loglikelihood with the parameters of each iteration of the MCMC chains and introduces the values in a vector called vectorD. DIC=(1/2)*var(vectorD)+mean(VectorD)
ObtainingDIC(cadenas, MatrizPBmodelo, E, Tt, especiemodi)
ObtainingDIC(cadenas, MatrizPBmodelo, E, Tt, especiemodi)
cadenas |
Matrix with the iterations (in rows) of all the Markov Chains obtained in th estimation. It is the output of "StudyingParam" adding "$AllChainsJoined". |
|||||||||
MatrizPBmodelo |
Matrix with the covariates of the model. In an example with two SPBal and three time points, the covariates are written in the following order:
|
|||||||||
E |
Number f bacteria in the dataset. |
|||||||||
Tt |
Number of time points available |
|||||||||
especiemodi |
Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,..., |
Returns a data.frame with the DIC value (using the rule, pD = var/2).
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
set.seed(314) especie=t(gtools::rdirichlet(n=2, c(1,2,3))) E=3 Tt=2 MatrizPBmodelo=rbind(c(1,1),c(-0.3,0.4),c(0.3,0.5)) set.seed(314) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) SumFinal=StudyingParam(est$R2jagsOutput$BUGSoutput$summary ,est$SamplesAllChains) cadenas=SumFinal$AllChainsJoined especiemodi=especie[,-1] ObtainingDIC(cadenas,MatrizPBmodelo,E,Tt,especiemodi)
set.seed(314) especie=t(gtools::rdirichlet(n=2, c(1,2,3))) E=3 Tt=2 MatrizPBmodelo=rbind(c(1,1),c(-0.3,0.4),c(0.3,0.5)) set.seed(314) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) SumFinal=StudyingParam(est$R2jagsOutput$BUGSoutput$summary ,est$SamplesAllChains) cadenas=SumFinal$AllChainsJoined especiemodi=especie[,-1] ObtainingDIC(cadenas,MatrizPBmodelo,E,Tt,especiemodi)
This function calculates the balance that has at the numerator the bacteria placed at Num
and has at the denominator the bacteria placed at Dem
PBalance(A, Num, Dem, especie)
PBalance(A, Num, Dem, especie)
A |
Number. The balance will be calculated for t=1,2,..., |
Num |
vector that contains the position in the matrix |
Dem |
vector that contains the position in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
Returns a vector with the value of the balance in each time point.
especie1=cbind(c(0.5,0.3,0.1,0.1), c(0.1,0.3,0.6,0.1)) Num=c(1,2) Dem=c(3,4) A=2 PBalance(A,Num,Dem,especie1)
especie1=cbind(c(0.5,0.3,0.1,0.1), c(0.1,0.3,0.6,0.1)) Num=c(1,2) Dem=c(3,4) A=2 PBalance(A,Num,Dem,especie1)
This function calculates the balance that has at the numerator de bacteria placed at Num
and has at the denominator the bacteria placed at Dem
PBalancePredi(Num, Dem, DatosEsperanzas)
PBalancePredi(Num, Dem, DatosEsperanzas)
Num |
vector that contains the position in the matrix |
Dem |
vector that contains the position in the matrix |
DatosEsperanzas |
Vector with a coda composition. The bacteria are in the same orden than the matrix |
Returns the value of the balance
Num=c(1,2) Dem=c(3,4) DatosEsperanzas=c(0.1,0.3,0.4,0.2) PBalancePredi(Num,Dem,DatosEsperanzas)
Num=c(1,2) Dem=c(3,4) DatosEsperanzas=c(0.1,0.3,0.4,0.2) PBalancePredi(Num,Dem,DatosEsperanzas)
This function calculates the value of the BPBM regression, defined by:
PBmodel(A, MatrizPBmodelo, E, Tt)
PBmodel(A, MatrizPBmodelo, E, Tt)
A |
Matrix that contains all the parameters of the model. The parameters are written in the matrix in the following order (in an example with three bacteria):
|
|||||||||||||||
MatrizPBmodelo |
Matrix. Output of "ObtainingValueSPBal" called "MatrixSPBal". |
|||||||||||||||
E |
Number of bacteria in the dataset. |
|||||||||||||||
Tt |
Number of time points. |
Returns a matrix. The row i contains the regression values of the bacteria i at all time points.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
A=rbind(c(1,2,3),c(4,5,6),c(7,8,9)) MatrizPBmodelo=cbind(c(1,2,3),c(4,5,6),c(7,8,9)) E=3 Tt=3 PBmodel(A,MatrizPBmodelo, E,Tt)
A=rbind(c(1,2,3),c(4,5,6),c(7,8,9)) MatrizPBmodelo=cbind(c(1,2,3),c(4,5,6),c(7,8,9)) E=3 Tt=3 PBmodel(A,MatrizPBmodelo, E,Tt)
This function applys a PCA to the estimate parameters (using function "prcomp" with center = TRUE
and scale. = TRUE
). Then uses the ggbiplot function to plot the biplot.
PCAbiplot(paramEstimadosFinal, names, E)
PCAbiplot(paramEstimadosFinal, names, E)
paramEstimadosFinal |
The estimate parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial especies present in the matrix |
names |
Vector with the name of the bacteria. The component i has the name of the bacteria i, with i=1,...,D. The bacteria in the las position of the vector is the bacteria used as reference in the alr transformation. |
E |
Number of bacteria available. |
Returns a list with the PCA biplot, the variance explained of each Principal Component and an object of class "prcomp" with the PCA. In the biplot, "a" denotes the intercept, "b" denotes the parameter that give information about the importance of the bacteria in defining herself in the next time point and "c" denotes denotes the parameter that give information about the importance of the rest of the community in defining the bacteria in the next time point.
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) names=c("Bact1","Bact2","Bact3","Bact4","Bact5") tau1=0.4 parms1= cbind(c(0.1,0.2,0.4,0.6),c(-0.2,0.1,0.1,0.3),c(0.3,0.2,0.3,0.5)) paramEstimadosFinal=c(as.vector( t(parms1)),tau1) PCAbiplot(paramEstimadosFinal,names,5)
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) names=c("Bact1","Bact2","Bact3","Bact4","Bact5") tau1=0.4 parms1= cbind(c(0.1,0.2,0.4,0.6),c(-0.2,0.1,0.1,0.3),c(0.3,0.2,0.3,0.5)) paramEstimadosFinal=c(as.vector( t(parms1)),tau1) PCAbiplot(paramEstimadosFinal,names,5)
This function calculates the variance of each row of the matrix PB
. Returns the percentage of variance of each row of the matrix PB
.
Percen_Variance(PB)
Percen_Variance(PB)
PB |
Matrix. |
Returns a vector with percentage of variance of each row of the matrix PB
.
matt=matrix(c(1:4),2,2) Percen_Variance(matt)
matt=matrix(c(1:4),2,2) Percen_Variance(matt)
Plots the dendogram obtained using the Ward’s method for obtaining the principal balances. The process follow in this function is explained in Section 3.1 of (Creus-Martí et al, 2022)
PlotDendogram(especie, names)
PlotDendogram(especie, names)
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. |
names |
Vector with the names of the bacteria in the same order that are written in |
Returns a list with: the dendogram.
Num: List. The component i of the list has the number of the row of the matrix especie
where the bacteria in the numerator of the principal balance i are placed.
Dem: List. The component i of the list has the number of the row of the matrix especie
where the bacteria in the denominator of the principal balance i are placed.
dendogram: Plots the dendogram.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
names2=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) PlotDendogram(esp2,names2)
names2=c("Bact1","Bact2","Bact3","Bact4","Bact5") set.seed(314) esp2=t(gtools::rdirichlet(n=6, c(1,1,5,1,1))) PlotDendogram(esp2,names2)
This function calculates the expected value and variance of the bacteria at time point Tt
. Then, this function calculates the expected value and variance of the bacteria at time point t=(Tt
+1),...,K. It calculates the expected value at each time point for each markov chain iteration. The expected value for each time point is the mean of the expected values of all iterations.mAnalogous with the variance, the dirichlet parameters and the expected valur plus(and minus) two times the sqrt of the variance.
PredictionBPBM( NumSPBal, DemSPBal, MCMC.CHAINS, alpha, K, esperanza, Var, E, Tt, MatrizPBmodelo )
PredictionBPBM( NumSPBal, DemSPBal, MCMC.CHAINS, alpha, K, esperanza, Var, E, Tt, MatrizPBmodelo )
NumSPBal |
List. The component i of the list has the number of the row of the matrix |
DemSPBal |
List. The component i of the list has the number of the row of the matrix |
MCMC.CHAINS |
Matrix with the iterations of all the chains for all the parameters. Each column has all the iteration of one parameter. If the cero is in the center of the credible interval of one parameter all its iteration in the Marchov Chain have the value 0. It is output of the "StudyingParam" function adding "$AllChainsJoined". |
alpha |
Matrix that contains at the row i the Dirichlet parameter of the bacteria i at t=1,2,3,..., |
K |
Number. The function will calculate the value of the expected value and the variance at |
esperanza |
Matrix that contains at row i the expected value of the bacterial taxa of bacteria i at t=1,2,3,..., |
Var |
Matrix that contains at row i the variance of the bacterial taxa of bacteria i at t=1,2,3,..., |
E |
Number of bacteria available |
Tt |
Number of bacteria available |
MatrizPBmodelo |
is the matrix that contains the covariates of the model. The first line es equal to 1 for all columns. The other rows contain the value of one SPBal at all time points. The selected principal balance of the row i+1 has at its numerator the bacteria placed in the rows |
Returns a list with:
ExpectedValue.All: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points t=1,2,...,K. The bacterias are placed at the same order than in especies
.
VarianceValue.All: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points t=1,2,...,K. The bacterias are placed at the same order than in especies
.
DirichlerParam.All: Matrix. Matrix that contains at row i the dirichlet parameter of the bacteria i at all time points t=1,2,...,K. The bacterias are placed at the same order than in especies
.
ExpVarmas: Matrix. Matrix that contains at row i the expected value plus two times the sqrt(variance) of the bacteria i at all time points t=Tt
,...,K, the rest of the time points has 0 values. The bacterias are placed at the same order than in especies
.
ExpVarmenos: Matrix. Matrix that contains at row i the expected value plus two times the sqrt(variance) of the bacteria i at all time points t=Tt
,...,K,the rest of the time points has 0 values. The bacterias are placed at the same order than in especies
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2022). Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity, 2022.
NumSPBal=list(1,c(1,2)) DemSPBal=list(2,3) MCMC.CHAINS=cbind(c(0.1,0.11), c(0.2,0.21), c(0.3,0.31), c(-0.1,-0.11), c(0.15,0.105), c(0.44,0.41), c(0.3,0.31), c(0.201,0.221), c(0.13,0.113) ) alpha=cbind(c(0.1,0.2,0.1),c(0.1,0.5,0.3)) K=3 esperanza=cbind(c(0.2,0.2,0.6)) Var=cbind(c(0.1,0.01,0.11)) E=3 Tt=2 MatrizPBmodelo=cbind(c(1,0.3,0.2)) PredictionBPBM(NumSPBal,DemSPBal,MCMC.CHAINS, alpha,K,esperanza,Var,E,Tt,MatrizPBmodelo )
NumSPBal=list(1,c(1,2)) DemSPBal=list(2,3) MCMC.CHAINS=cbind(c(0.1,0.11), c(0.2,0.21), c(0.3,0.31), c(-0.1,-0.11), c(0.15,0.105), c(0.44,0.41), c(0.3,0.31), c(0.201,0.221), c(0.13,0.113) ) alpha=cbind(c(0.1,0.2,0.1),c(0.1,0.5,0.3)) K=3 esperanza=cbind(c(0.2,0.2,0.6)) Var=cbind(c(0.1,0.01,0.11)) E=3 Tt=2 MatrizPBmodelo=cbind(c(1,0.3,0.2)) PredictionBPBM(NumSPBal,DemSPBal,MCMC.CHAINS, alpha,K,esperanza,Var,E,Tt,MatrizPBmodelo )
This function calculates the expected value and variance of the bacteria at time point Tt
. Then, this function calculates the expected value and variance of the bacteria at time point t=(Tt
+1),...,K
PredictionEstParmFunc( paramEstimadosFinal, EspecieMaxima, alpha, K, esperanza, Var, E, Tt )
PredictionEstParmFunc( paramEstimadosFinal, EspecieMaxima, alpha, K, esperanza, Var, E, Tt )
paramEstimadosFinal |
The estimate parameters. Vector equal to
|
||||||
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
||||||
alpha |
Matrix that contains at the row i the dirichlet parameter of the bacteria i at t=1,2,3,..., |
||||||
K |
Number. The function will calculate the value of the expected value and the variance at |
||||||
esperanza |
Matrix that contains at row i the expected value of the bacterial taxa of bacteria i at t=1,2,3,..., |
||||||
Var |
Matrix that contains at row i the variance of the bacterial taxa of bacteria i at t=1,2,3,..., |
||||||
E |
Number of bacteria available |
||||||
Tt |
Number of time points available |
The regression of this model, in an example with three bacteria, is defined by
Returns a list with:
ExpectedValue.All: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points t=2,...,K. The bacteria are placed at the same order than in especies
.
VarianceValue.All: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points t=2,...,K. The bacteria are placed at the same order than in especies
.
DirichlerParam.All: Matrix. Matrix that contains at row i the dirichlet parameter of the bacteria i at all time points t=2,...,K. The bacteria are placed at the same order than in especies
.
Creus-Martí, I. and Moya, A. and Santonja, F. J. (2018). A Statistical Model with a Lotka-Volterra Structure for Microbiota Data. Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling or engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar. ISBN: 978-84-09-07541-6
pam.ini=rbind(c(0.1,0.2,0.3),c(0.4,0.5,0.6)) paramEstimadosFinal=c(5, as.vector(pam.ini)) EspecieMaxima=3 alpha=cbind(c(2,2,3),c(1,1,3)) K=3 esperanza=cbind(c(0.2,0.3,0.5)) Var=cbind(c(0.2,0.3,0.5)) E=3 Tt=2 PredictionEstParmFunc(paramEstimadosFinal,EspecieMaxima, alpha,K,esperanza,Var,E,Tt )
pam.ini=rbind(c(0.1,0.2,0.3),c(0.4,0.5,0.6)) paramEstimadosFinal=c(5, as.vector(pam.ini)) EspecieMaxima=3 alpha=cbind(c(2,2,3),c(1,1,3)) K=3 esperanza=cbind(c(0.2,0.3,0.5)) Var=cbind(c(0.2,0.3,0.5)) E=3 Tt=2 PredictionEstParmFunc(paramEstimadosFinal,EspecieMaxima, alpha,K,esperanza,Var,E,Tt )
This function calculates the expected value and variance of the bacteria at time point Tt
. Then, this function calculates the expected value and variance of the bacteria at time point t=(Tt
+1),...,K
PredictionFBM( paramEstimadosFinal, EspecieMaxima, alpha, K, esperanza, Var, E, Tt )
PredictionFBM( paramEstimadosFinal, EspecieMaxima, alpha, K, esperanza, Var, E, Tt )
paramEstimadosFinal |
The estimate parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix |
EspecieMaxima |
Row in which the bacteria chosen as reference is in |
alpha |
Matrix that contains at the row i the Dirichlet parameter of the bacteria i at t=1,2,3,..., |
K |
Number. The function will calculate the value of the expected value and the variance at |
esperanza |
Matrix that contains at row i the expected value of the bacterial taxa of bacteria i at t=1,2,3,..., |
Var |
Matrix that contains at row i the variance of the bacterial taxa of bacteria i at t=1,2,3,..., |
E |
Number of bacteria available |
Tt |
Number of bacteria available |
The regression of this model is defined by
Returns a list with:
ExpectedValue.All: Matrix. Matrix that contains at row i the expected value of the bacteria i at all time points t=1,2,...,K. The bacteria are placed at the same order than in especies
.
VarianceValue.All: Matrix. Matrix that contains at row i the variance of the bacteria i at all time points t=1,2,...,K. The bacteria are placed at the same order than in especies
.
DirichlerParam.All: Matrix. Matrix that contains at row i the dirichlet parameter of the bacteria i at all time points t=1,2,...,K. The bacteria are placed at the same order than in especies
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
Tt=2 E=3 tau=5 EspecieMaxima=3 K=3 parms11=c(0.1,0.2,0.3,0.4,0.5,0.6,tau) alpha=cbind(c(1.726793,1.892901,1.380306), c(1,1,3)) Expected=cbind(c(alpha[1,1]/tau, alpha[2,1]/tau, alpha[3,1]/tau ), c(alpha[1,2]/tau,alpha[2,2]/tau,alpha[3,2]/tau)) Variance=cbind(c(0.03768101, 0.03920954, 0.03330857 ), c( 0.03683242,0.02784883, 0.0413761 )) Expected.final=Expected[,-2] Variance.final=Variance[,-2] PredictionFBM(parms11,EspecieMaxima, alpha,K,Expected.final,Variance.final,E,Tt )
Tt=2 E=3 tau=5 EspecieMaxima=3 K=3 parms11=c(0.1,0.2,0.3,0.4,0.5,0.6,tau) alpha=cbind(c(1.726793,1.892901,1.380306), c(1,1,3)) Expected=cbind(c(alpha[1,1]/tau, alpha[2,1]/tau, alpha[3,1]/tau ), c(alpha[1,2]/tau,alpha[2,2]/tau,alpha[3,2]/tau)) Variance=cbind(c(0.03768101, 0.03920954, 0.03330857 ), c( 0.03683242,0.02784883, 0.0413761 )) Expected.final=Expected[,-2] Variance.final=Variance[,-2] PredictionFBM(parms11,EspecieMaxima, alpha,K,Expected.final,Variance.final,E,Tt )
Preparing the dataset to be introduce in the models' functions. In order to introduce the usage of the package there is a README file. You can find the link to the file using base::system.file("extdata", "README.pdf", package = "CoDaLoMic")
. On windows you can open the file with base::shell.exec(system.file("extdata", "README.pdf", package = "CoDaLoMic"))
.
PreparingTheData(DaTa, Pred)
PreparingTheData(DaTa, Pred)
DaTa |
data.frame. The first column contains the time point information (natural numbers 1,2,3...). The rest of the columns contain the relative abundance of each bacteria at the different time points. The values of each column must sum 1. |
Pred |
Number. The data at t=1,...,Pred-1 will be used to estimate the model. The rest of the time points will be used to study the capacity of the model to predict. If |
If Pred==0
returns a list with
Tt
- The number of time points available.
E
- Number of bacteria available
especieOriginal
- Matrix that contains at row i the bacterial taxa of bacteria i at all time points.
especiemodiOriginal
- Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,...,Tt
.
If Pred!=0
returns a list with
Tt
- The number of time points available used to estimate the model (Tt
=Pred-1).
E
- Number of bacteria available
especieOriginal
- Matrix that contains at row i the bacterial taxa of bacteria i at the time points t=1,2,...,Pred-1.
especiemodiOriginal
- Matrix that contains at row i the bacterial taxa of bacteria i at time points t=2,...,Pred-1.
especieOriginal.All
- Matrix that contains at row i the bacterial taxa of bacteria i at the time points.
especiemodiOriginal.All
- Matrix that contains at row i the bacterial taxa of bacteria i at time points.
K
- Number of time points available at the dataset.
df<-data.frame(cbind(c(1,2,3), c(0.5,0.2,0.3), c(0.2,0.1,0.6), c(0.1,0.1,0.8), c(0.3,0.3,0.4))) PreparingTheData(df,Pred=0) df2<-data.frame(cbind(c(1,2,3,4,5), c(0.1,0.1,0.1,0.2,0.5), c(0.2,0.2,0.2,0.2,0.2), c(0.2,0.3,0.1,0.2,0.2))) PreparingTheData(df2,Pred=4)
df<-data.frame(cbind(c(1,2,3), c(0.5,0.2,0.3), c(0.2,0.1,0.6), c(0.1,0.1,0.8), c(0.3,0.3,0.4))) PreparingTheData(df,Pred=0) df2<-data.frame(cbind(c(1,2,3,4,5), c(0.1,0.1,0.1,0.2,0.5), c(0.2,0.2,0.2,0.2,0.2), c(0.2,0.3,0.1,0.2,0.2))) PreparingTheData(df2,Pred=4)
This function calculates the root-mean-square deviation (RMSD), the Nash Sutchiffe Coefficient, the residual sum of squares (RSS) and the mean absolute percentage error (MAPE) for the matrices introduces. This function also calculates the mean of the RMSD, the mean of the Nash Sutchiffe Coefficient and the mean of the RSS.
QualityControl(matrixData, matrixExpected, names.especie)
QualityControl(matrixData, matrixExpected, names.especie)
matrixData |
Matrix that contains at row i the bacterial taxa of bacteria i at the time points that we want take into account to calculate the quality control values. |
matrixExpected |
Matrix that contains at row i the expected value of the bacterial taxa i at the time points that we want take into account to calculate the quality control values. The bacteria must be placed in the same order than in |
names.especie |
Vector with the names of the bacteria in the same order that are placed in the |
Returns a data.frame.
names.especie=c("Bact1", "Bact2", "Bact3") matrixExpected=matrix(c(1:9),3,3) matrixData=matrixExpected+0.1 QualityControl(matrixData, matrixExpected,names.especie)
names.especie=c("Bact1", "Bact2", "Bact3") matrixExpected=matrix(c(1:9),3,3) matrixData=matrixExpected+0.1 QualityControl(matrixData, matrixExpected,names.especie)
Ridge regression
ridgeregression(Tt, especie, E, EspecieMaxima, seed = NULL)
ridgeregression(Tt, especie, E, EspecieMaxima, seed = NULL)
Tt |
Number of time points available |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of the matrix will be used as reference in the alr transformation and will be at the denominator of the balance. |
E |
Number of bacteria available. |
EspecieMaxima |
Row in which the bacteria used as reference is in |
seed |
Number. Set a seed. Default |
Returns the result of the ridge regression, object of class "ridgelm".
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) Tt=10 E=5 EspecieMaxima=5 ridgeregression(Tt,especie, E, EspecieMaxima, 558562316)
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) Tt=10 E=5 EspecieMaxima=5 ridgeregression(Tt,especie, E, EspecieMaxima, 558562316)
This function calculates the right side of the gLV equation.
rxnrate(State, parms)
rxnrate(State, parms)
State |
Vector with a CoDa composition |
||||||||||||
parms |
Matrix. Each row has the parameters of each differential equation. following our example, parms has the parameters placed as follows:
|
For instance, if we want to solve the following gLV equations:
This function returns a vector with the value of:
Returns a vector with the value of the right side of the gLV equations.
cinit1<-c(x1<-0.7,x2<-0.2,x3<-0.1) parms1= cbind(c(0.1,0.2,-0.1),c(-0.2,0.1,-0.1),c(0.3,0.2,0.3),c(0.1,0.22,0.2)) rxnrate(cinit1,parms1)
cinit1<-c(x1<-0.7,x2<-0.2,x3<-0.1) parms1= cbind(c(0.1,0.2,-0.1),c(-0.2,0.1,-0.1),c(0.3,0.2,0.3),c(0.1,0.22,0.2)) rxnrate(cinit1,parms1)
Simulated dataset with 5 microbial taxa and 10 time points. Following the scheme given by Faust et al (2018), we generated the interaction matrix using the algorithm proposed by Klemm and Eguíluz (2002), and we generated the initial abundances using the Poisson distribution. With these two tools, we simulated the data using the generalized Lotka-Volterra structure. We carried out the simulation using the R package seqtime (Faust et al, 2018). Focusing on technical details, to generate the interaction matrix we set the clique size at 4, the diagonal values at -1, the interaction connectance at 0.04, the positive edge percentage at 64
data(Simulated)
data(Simulated)
A data frame with 10 rows and 6 columns.
K. Faust, F. Bauchinger, B. Laroche et al., “Signatures ofecological processes in microbial community time series”. Microbiome, vol. 6, no. 1, p. 120, 2018
K. Klemm and V. M. Eguíluz, “Growing scale-free networkswith small-world behavior”. Physical Review, vol. 65, no. 5,Article ID 057102, 2002.
This function controls that the value of the Rhat is between 0.9 and 1.1. In addition, it controls that the effective sample size is bigger than 100 and that the zero is not at the center of the credible interval (the interval between 2.5 and 97.5). We consider that the zero is in the center of the credible interval when the zero is between the 25 and the 75 quantile of the distribution formed by the limits if the credible interval.
StudyingParam(Sum, MCMC.CHAINS)
StudyingParam(Sum, MCMC.CHAINS)
Sum |
Matrix with the summary of the "Estimating_BPBM".It is the output of the "Estimating_BPBM" adding "$R2jagsOutput$BUGSoutput$summary". |
MCMC.CHAINS |
Matrix with the values of all the Markov chains for all parameters. It is the output of the "Estimating_BPBM" adding "$SamplesAllChains". |
Returns a list with:
Param.Summary: Matrix. The matrix Sum with a zero in the column Sum[,"mean"]
when a parameter has the zero in the center of its credible interval.
AllChainsJoined: Matrix. The matrix MCMC.CHAINS with a zero in all the iterations of the chain when a parameter has the zero in the center of its credible interval.
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) StudyingParam(est$R2jagsOutput$BUGSoutput$summary,est$SamplesAllChains)
set.seed(314) especie=t(gtools::rdirichlet(n=6, c(6,6,1,6,6))) E=5 Tt=6 MatrizPBmodelo=rbind(c(1,1,1,1,1,1),c(-0.3,0.4,0.3,-0.7,-0.4,-0.6),c(0.3,0.5,-0.3,0.1,0.4,0.1)) est=Estimating_BPBM(especie, Tt, E, MatrizPBmodelo, nn.chain=3, nn.burnin=1000, nn.sample=5000, nn.thin=10) StudyingParam(est$R2jagsOutput$BUGSoutput$summary,est$SamplesAllChains)
This function returns a table with the interpretable parameters of the Dirich-gLV model.
Table_alr_Dirich_glv(Param.Estimates, especie, names, E)
Table_alr_Dirich_glv(Param.Estimates, especie, names, E)
Param.Estimates |
Vector with the estimates parameters. It is equal to
|
||||||
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of this matrix is the one used as reference in the alr transformation that the model applies. |
||||||
names |
Vector with the name of the bacteria in the same order than are present in the |
||||||
E |
Number of bacteria available. |
In an example with three bacteria, the regression of this model is defined by
Returns a table written in latex format with the matrix pam.
Creus-Martí, I. and Moya, A. and Santonja, F. J. (2018). A Statistical Model with a Lotka-Volterra Structure for Microbiota Data. Lucas Jodar, Juan Carlos Cortes and Luis Acedo, Modelling or engineering and human behavior 2018, Instituto Universitario de Matematica Multidisciplinar. ISBN: 978-84-09-07541-6
pam.ini=rbind(c(0.1,0.2,0.3),c(0.4,0.5,0.6)) paramEstimadosFinal=c(5, as.vector(pam.ini)) E=3 especie=cbind(c(0.2,0.4,0.4),c(0.1,0.1,0.8),c(0.5,0.1,0.4)) names=c("a","b","c") Table_alr_Dirich_glv(paramEstimadosFinal,especie,names,E)
pam.ini=rbind(c(0.1,0.2,0.3),c(0.4,0.5,0.6)) paramEstimadosFinal=c(5, as.vector(pam.ini)) E=3 especie=cbind(c(0.2,0.4,0.4),c(0.1,0.1,0.8),c(0.5,0.1,0.4)) names=c("a","b","c") Table_alr_Dirich_glv(paramEstimadosFinal,especie,names,E)
Returns a table with the percentage of variance that each SPBal has, the bacteria that goes in the numerator and denominator of the balance, the relationship between the group in the numerator and the denominator and the bacteria most influenced by this SPBal.
TableBPBM( NumSPBal, DemSPBal, PerVar, MatrizPBmodelo, Estimated.Param, BB = 0.55, names, E )
TableBPBM( NumSPBal, DemSPBal, PerVar, MatrizPBmodelo, Estimated.Param, BB = 0.55, names, E )
NumSPBal |
List. Output of "ObtainigValueSPBal" function.List. The component i of the list has the number of the row of the matrix |
DemSPBal |
List. Output of "ObtainigValueSPBal" function.List. The component i of the list has the number of the row of the matrix |
PerVar |
Vector. Output of "ObtainigValueSPBal" function. The component of the vector i contains the percentage of variance of the SPBal with numerator |
MatrizPBmodelo |
Matrix. Output of "ObtainigValueSPBal" function. MatrixSPBal is the matrix that contains the covariates of the model. The first line es equal to 1 for all columns. The other rows contain the value of one SPBal at all time points. The selected principal balance of the row i+1 has at its numerator the bacteria placed in the rows |
Estimated.Param |
Vector. Column "mean" of the output of StudyingParam function. |
BB |
The bacteria in the numerator and the denominator of the balance are considered similar if the mean of the SPBal is between (-BB,BB).Default: 0.55. |
names |
Vector with the bacteria's names placed in the order in which appear in the rows of the matrices |
E |
number of bacteria in the dataset. |
Returns the table written in latex
NumSPBal=list(c(3,4),3,2) DemSPBal=list(1,5,4) PerVar=c(41.37487, 21.08270, 19.16870) MatrizPBmodelo=rbind(c(1.00000000, 1.0000000, 1.0000000, 1.0000000 , 1.0000000), c(1.84449081, -1.3569851, -0.1388348, -0.5269079, -1.3288684), c(0.27531685, 0.4741394, -1.9045554, -0.3581268, -0.4768543), c( 0.07782991, 0.3492473, -0.7138882, 1.4332828, -0.7203295)) namesOr=c("Bact1", "Bact2","Bact3","Bact4","Bact5") Estimated.Param=c(0.5, 0.3, -0.2,0.1) E=5 TableBPBM(NumSPBal, DemSPBal, PerVar, MatrizPBmodelo, Estimated.Param, BB=0.55, namesOr, E)
NumSPBal=list(c(3,4),3,2) DemSPBal=list(1,5,4) PerVar=c(41.37487, 21.08270, 19.16870) MatrizPBmodelo=rbind(c(1.00000000, 1.0000000, 1.0000000, 1.0000000 , 1.0000000), c(1.84449081, -1.3569851, -0.1388348, -0.5269079, -1.3288684), c(0.27531685, 0.4741394, -1.9045554, -0.3581268, -0.4768543), c( 0.07782991, 0.3492473, -0.7138882, 1.4332828, -0.7203295)) namesOr=c("Bact1", "Bact2","Bact3","Bact4","Bact5") Estimated.Param=c(0.5, 0.3, -0.2,0.1) E=5 TableBPBM(NumSPBal, DemSPBal, PerVar, MatrizPBmodelo, Estimated.Param, BB=0.55, namesOr, E)
This function returns a table with the interpretable parameters of the FBM model.
TableFBM(paramEstimadosFinal, names, E)
TableFBM(paramEstimadosFinal, names, E)
paramEstimadosFinal |
The estimate parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix |
names |
Vector of length D. The component i has the name of the bacteria i. |
E |
Number of bacteria available. |
Returns a table written in latex format.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
paramEstimadosFinal=c(1,2,3,1,2,3,1,2,3) names=c("Bact1", "Bact2","Bact3") E=3 TableFBM(paramEstimadosFinal,names,E)
paramEstimadosFinal=c(1,2,3,1,2,3,1,2,3) names=c("Bact1", "Bact2","Bact3") E=3 TableFBM(paramEstimadosFinal,names,E)
This function estimates the parameters of the FBM model.
TauAndParameters_EstParmFunc_FBM( ttau = 30, ridge.final, Iter.EstParmFunc = 80, especie, EspecieMaxima, Tt, E, seed = NULL )
TauAndParameters_EstParmFunc_FBM( ttau = 30, ridge.final, Iter.EstParmFunc = 80, especie, EspecieMaxima, Tt, E, seed = NULL )
ttau |
Number. We estimate de FBM model for the values of tau: 1, 2,..., ttau |
ridge.final |
Object of class "ridgelm". Values obtained with the ridge regression. |
Iter.EstParmFunc |
Number. Number of iterations. Default: 80 iterations. |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of the matrix will be used as reference in the alr transformation and will be at the denominator of the balance. |
EspecieMaxima |
Row in which the bacteria used as reference is in |
Tt |
Number of time points available |
E |
Number. Number of bacteria available. |
seed |
Number. Set a seed. Default |
We give to the parameter tau the value 1,2,...,ttau
. We estimate the FBM model for all this values (using the function "Estimate_param_FBM")
and we select the value of tau that minimizes the AIC. The regression of this model is defined by
Returns a list with:
EstimateParameters: Vector with the estimated parameters, in the following order: a11,a12,a13, a21, a22,a23, ...a(D-1)1,a(D-1)2,a(D-1)3,tau. Where D is the number of bacterial species present in the matrix especie
.
AIC Number: Value of the AIC.
All.iter: Matrix. Each row has the parameters obtained in each iteration. The parameters are in the columns written in the same order that they are written in Param.Estimates
. In this matrix we must observe that in the last iterations the values has really similar or equal values, if not, we need to increase the value of Iter.EstParmFunc
.
Creus-Martí, I., Moya, A., Santonja, F. J. (2021). A Dirichlet autoregressive model for the analysis of microbiota time-series data. Complexity, 2021, 1-16.
set.seed(123) especie=t(gtools::rdirichlet(5,c(1,3,1))) Tt=5 E=3 EspecieMaxima=3 ridge.final=ridgeregression(Tt,especie, E, EspecieMaxima) ttau=10 Iter.EstParmFunc=10 TauAndParameters_EstParmFunc_FBM(ttau,ridge.final,Iter.EstParmFunc, especie,EspecieMaxima,Tt,E,714)
set.seed(123) especie=t(gtools::rdirichlet(5,c(1,3,1))) Tt=5 E=3 EspecieMaxima=3 ridge.final=ridgeregression(Tt,especie, E, EspecieMaxima) ttau=10 Iter.EstParmFunc=10 TauAndParameters_EstParmFunc_FBM(ttau,ridge.final,Iter.EstParmFunc, especie,EspecieMaxima,Tt,E,714)
Writtes a vector with the alr transformation of the bacteria i
at time points t=2,...,Tt
.
vecttor(i, especie, Tt, EspecieMaxima)
vecttor(i, especie, Tt, EspecieMaxima)
i |
Number. Position of the bacteria that we make the alr in the matrix |
especie |
Matrix that contains at row i the bacterial taxa of bacteria i at all time points. The bacteria placed in the last row of the matrix will be used as reference in the alr transformation and will be at the denominator of the balance. |
Tt |
Number of time points available |
EspecieMaxima |
Row in which the bacteria used as reference is in |
Returns a vector with the alr transformation of the bacteria i
at time points t=2,...,Tt
.
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) Tt=10 EspecieMaxima=5 i=2 vecttor(i,especie,Tt,EspecieMaxima)
set.seed(123) especie=t(gtools::rdirichlet(10,c(1,3,1,2,4))) Tt=10 EspecieMaxima=5 i=2 vecttor(i,especie,Tt,EspecieMaxima)
In this function the zeros are removed or replaced using functions of "zCompositions" package that can be used with longitudinal data (because they do not use the information of other rows to make the replacement).
ZeroData(DaTa, method = "multKM", seed = NULL)
ZeroData(DaTa, method = "multKM", seed = NULL)
DaTa |
data.frame. The first column contains the time point information (natural numbers 1,2,3...). The rest of the columns contain the relative abundance of each bacteria at the different time points. The values of each column must sum 1. |
method |
Character.
|
seed |
Number. Set a seed. Default |
The dataset without zeros.
Palarea-Albaladejo J. and Martín-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligent Laboratory Systems 2015; 143: 85-96.
set.seed(2) dat=gtools::rdirichlet(6,c(1,2,3,1,2,3)) dat2=dat dat2[2,1]=0 dat2[2,2]=dat[2,1]+dat[2,2] dat2[4,3]=0 dat2[4,4]=dat[4,3]+dat[4,4] X <- cbind( c(1:6) ,dat2) Final=ZeroData(X,"multKM",1) Final2=ZeroData(X,"multRepl",1)
set.seed(2) dat=gtools::rdirichlet(6,c(1,2,3,1,2,3)) dat2=dat dat2[2,1]=0 dat2[2,2]=dat[2,1]+dat[2,2] dat2[4,3]=0 dat2[4,4]=dat[4,3]+dat[4,4] X <- cbind( c(1:6) ,dat2) Final=ZeroData(X,"multKM",1) Final2=ZeroData(X,"multRepl",1)