Title: | Statistical Approach via Pseudo-Value Information and Estimation |
---|---|
Description: | 'SOHPIE' (pronounced as SOFIE) is a novel pseudo-value regression approach for differential co-abundance network analysis of microbiome data, which can include additional clinical covariate in the model. The full methodological details can be found in Ahn S and Datta S (2023) <arXiv:2303.13702v1>. |
Authors: | Seungjun Ahn [cre, aut, trl] , Somnath Datta [ctb, ths] |
Maintainer: | Seungjun Ahn <[email protected]> |
License: | GPL-3 |
Version: | 1.0.6 |
Built: | 2024-11-16 06:49:43 UTC |
Source: | CRAN |
A function to estimate an association matrix. This function also includes re-estimation for leave-one-out sample.
asso_mat(OTUdat, group)
asso_mat(OTUdat, group)
OTUdat |
An OTU table with subjects in rows and taxa in columns. |
group |
Indices of the subjects in a category of binary group variable. |
A list of an association matrix and reestimated association matrix is returned, which are estimated via SparCC.
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB)
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB)
A function to retrieve a vector of coefficient estimates of all predictor variables in the pseudo-value regression model.
coeff(SOHPIEres)
coeff(SOHPIEres)
SOHPIEres |
An object called after running SOHPIE_DNA. |
A table that includes coefficient estimates for all variables included in the fitted model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # coeff() function will return coefficient estimates only. coeff(SOHPIEres)
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # coeff() function will return coefficient estimates only. coeff(SOHPIEres)
A function to retrieve a vector of coefficient estimates of each taxa for one specific variable.
coeff_specific_var(coefftab, varname)
coeff_specific_var(coefftab, varname)
coefftab |
A table that includes coefficient estimates for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of coefficient estimates for a single variable from the model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # coeff() function will return coefficient estimates only. coefftab <- coeff(SOHPIEres) # coeff_specific_var() will return coefficient estimates of # a single variable of interest. coeff_specific_var(coefftab = coefftab, varname = "bin_dog")
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # coeff() function will return coefficient estimates only. coefftab <- coeff(SOHPIEres) # coeff_specific_var() will return coefficient estimates of # a single variable of interest. coeff_specific_var(coefftab = coefftab, varname = "bin_dog")
A pre-processed OTU table and clinical data were obtained from the American Gut Project, available in the SpiecEasi R package.
data(combinedamgut)
data(combinedamgut)
combinedamgut
McDonald D, et al. American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems, 2018;3(3):e00031-18. (PubMed)
data(combinedamgut)
data(combinedamgut)
A pre-processed OTU table and clinical data from the geographical epidemiology study (aka the Diet Exchange Study) is available in the microbiome R package.
data(combineddietswap)
data(combineddietswap)
'combineddietswap'
O'Keefe, SJ, et al. Fat, fibre and cancer risk in african americans and rural africans. Nat Commun. 2015;6:6342. (PubMed)
data(combineddietswap)
data(combineddietswap)
A function to obtain a list consisting of taxa that are significantly differentially connected (DC) between two biological or clinical conditions. These DC taxa are resulted from the pseudo-value regression method with additional covariates. In addition, a user can extract the names of DC taxa only.
DCtaxa_tab(qvaltab, groupvar, alpha)
DCtaxa_tab(qvaltab, groupvar, alpha)
qvaltab |
A table with adjusted p-values (or q-value in this package). |
groupvar |
Specify the name of binary indicator variable. |
alpha |
A level of significance (e.g. 0.05). |
q-values and names of significantly DC taxa (e.g. taxa name) based on SOHPIE_DNA function.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres) # Please do NOT forget to provide the name of variable in DCtaxa_tab(groupvar = ). DCtaxa_tab <- DCtaxa_tab(qvaltab = qvaltab, groupvar = "bin_dog", alpha = 0.3) DCtaxa_tab
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres) # Please do NOT forget to provide the name of variable in DCtaxa_tab(groupvar = ). DCtaxa_tab <- DCtaxa_tab(qvaltab = qvaltab, groupvar = "bin_dog", alpha = 0.3) DCtaxa_tab
A function to regress pseudo-values across set of covariates.
pseudoreg(pseudoval, clindat, c)
pseudoreg(pseudoval, clindat, c)
pseudoval |
Jackknife pseudo-values calculated. |
clindat |
A metadata/phenotypic data consisting of the clinical and demographic variables that the user wants to include in the regression. (e.g., binary group indicator for intervention vs. control, continuous age, ...) |
c |
The choice of trimming proportion for the least trimmed estimator of robust regression. A value has to be between 0.5 and 1 as specified in ltsReg() function in robustbase package. |
A pseudo-value regression is fitted. Please use pseudoreg.summary() to output p-values, q-values, and coefficient estimates.
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B) thetatilde = rbind(thetatilde_grpA, thetatilde_grpB) # Map the column names (taxa names) colnames(thetatilde) = colnames(OTUtab) # Fit a pseudo-value regression using jackknife pseudovalues # and phenotypic data. A reminder that the phenotypic data should # contain a set of predictor variables to be fitted. fitmod = pseudoreg(pseudoval=thetatilde, clindat=phenodat, c=0.5)
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B) thetatilde = rbind(thetatilde_grpA, thetatilde_grpB) # Map the column names (taxa names) colnames(thetatilde) = colnames(OTUtab) # Fit a pseudo-value regression using jackknife pseudovalues # and phenotypic data. A reminder that the phenotypic data should # contain a set of predictor variables to be fitted. fitmod = pseudoreg(pseudoval=thetatilde, clindat=phenodat, c=0.5)
A function to output summary results (p-values, q-values, and coefficient estimates) from the fitted pseudo-value regression.
pseudoreg.summary(pseudo.reg.res, taxanames)
pseudoreg.summary(pseudo.reg.res, taxanames)
pseudo.reg.res |
A fitted pseudo-value regression using pseudoreg() |
taxanames |
Names of taxa from the OTU table. |
A pseudo-value regression is fitted. Please use pseudoreg.summary() to output p-values, q-values, and coefficient estimates.
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B) thetatilde = rbind(thetatilde_grpA, thetatilde_grpB) # Map the column names (taxa names) colnames(thetatilde) = colnames(OTUtab) # Fit a pseudo-value regression using jackknife pseudovalues # and phenotypic data. A reminder that the phenotypic data should # contain a set of predictor variables to be fitted. fitmod = pseudoreg(pseudoval=thetatilde, clindat=phenodat, c=0.5) # Extract summary results from the fitted model from fitmod object above. summary.result = pseudoreg.summary(pseudo.reg.res=fitmod, taxanames=colnames(OTUtab))
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B) thetatilde = rbind(thetatilde_grpA, thetatilde_grpB) # Map the column names (taxa names) colnames(thetatilde) = colnames(OTUtab) # Fit a pseudo-value regression using jackknife pseudovalues # and phenotypic data. A reminder that the phenotypic data should # contain a set of predictor variables to be fitted. fitmod = pseudoreg(pseudoval=thetatilde, clindat=phenodat, c=0.5) # Extract summary results from the fitted model from fitmod object above. summary.result = pseudoreg.summary(pseudo.reg.res=fitmod, taxanames=colnames(OTUtab))
A function to retrieve a vector of p-values of each taxa for all variables that are included in the pseudo-value regression model.
pval(SOHPIEres)
pval(SOHPIEres)
SOHPIEres |
An object called after running SOHPIE_DNA. |
A table that includes p-values for all predictor variables considered in the regresson.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with p-values using qval() function. pvaltab <- pval(SOHPIEres)
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with p-values using qval() function. pvaltab <- pval(SOHPIEres)
A function to retrieve a vector of p-values of each taxa for one specific variable. In other words, this will be useful for quickly accessing the taxa-specific p-values for main binary group variable (or other specific variable/covariate).
pval_specific_var(pvaltab, varname)
pval_specific_var(pvaltab, varname)
pvaltab |
A table that includes p-values for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of p-values for a single variable from the model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with p-values using pval() function. pvaltab <- pval(SOHPIEres) # Retrieve a vector of p-values for a single variable of interest. pval_specific_var(pvaltab = pvaltab, varname = "bin_dog")
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with p-values using pval() function. pvaltab <- pval(SOHPIEres) # Retrieve a vector of p-values for a single variable of interest. pval_specific_var(pvaltab = pvaltab, varname = "bin_dog")
A function to retrieve a vector of q-values of each taxa for all variables that are included in the pseudo-value regression model.
qval(SOHPIEres)
qval(SOHPIEres)
SOHPIEres |
An object called after running SOHPIE_DNA. |
A table that includes q-values for all predictor variables considered in the regresson.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres)
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres)
A function to retrieve a vector of q-values of each taxa for one specific variable. In other words, this will be useful for quickly accessing the taxa-specific q-values for main binary group variable (or other specific variable/covariate).
qval_specific_var(qvaltab, varname)
qval_specific_var(qvaltab, varname)
qvaltab |
A table that includes q-values for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of q-values for a single variable from the model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres) # Retrieve a vector of q-values for a single variable of interest. qval_specific_var(qvaltab = qvaltab, varname = "bin_dog")
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # Create an object to keep the table with q-values using qval() function. qvaltab <- qval(SOHPIEres) # Retrieve a vector of q-values for a single variable of interest. qval_specific_var(qvaltab = qvaltab, varname = "bin_dog")
A pseudo-value regression approach for differential co-abundance network analysis that adjusts for additional covariates.
SOHPIE_DNA(OTUdat, clindat, groupA, groupB, c)
SOHPIE_DNA(OTUdat, clindat, groupA, groupB, c)
OTUdat |
An OTU table with subjects in rows and taxa in columns. |
clindat |
A subdata consisting of the clinical and demographic variables that the user wants to include in the regression. (e.g., binary group indicator for intervention vs. control, continuous age, ...) |
groupA |
Indices of the subjects in the first category (e.g., not living with a dog; see example below with American Gut Project sample data) of binary group variable. |
groupB |
Indices of the subjects in the second category (e.g., living with a dog; see example below with American Gut Project sample data) of binary group variable. |
c |
The choice of trimming proportion for the least trimmed estimator of robust regression. A value has to be between 0.5 and 1 as specified in ltsReg() function in robustbase package. |
A list containing three data frame objects returned from this SOHPIE_DNA main function. A user will see beta coefficients, p-values, and adjusted p-values (q-values) for each predictor variables that are included in the regression model.
Ahn S, Datta S. Differential Co-Abundance Network Analyses for Microbiome Data Adjusted for Clinical Covariates Using Jackknife Pseudo-Values. ArXiv [Preprint]. 2023 Mar 23:arXiv:2303.13702v1. PMID: 36994149; PMCID: PMC10055480.
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5)
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] #Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5)
SpiecEasi R package, says in his package that this is "a reimplementation of SparCC algorithm (Friedman et Alm, PLoS Comp Bio, 2012)." Installation of SpiecEasi can sometimes generate errors, so I have included Dr. Huaying Fang's sparcc wrapper as one of the functions in this package for the estimation of co-abundance networks. His code was acquired from CCLasso (Fang et al, Bioinformatics, 2015), provided in GitHub (https://github.com/huayingfang/CCLasso).
sparcc(x, imax = 20, kmax = 10, alpha = 0.1, Vmin = 1e-04)
sparcc(x, imax = 20, kmax = 10, alpha = 0.1, Vmin = 1e-04)
x |
count data matrix (OTU table) |
imax |
Number of iterations in the outer loop |
kmax |
max iteration steps for SparCC |
alpha |
threshold for strong correlation |
Vmin |
absolute value of correlations below this threshold are considered zero by the inner SparCC loop. |
This will estimate an association matrix (network) for observed OTU table.
A function to retrieve a vector of standard error (stderr) of coefficient estimates (betahats) all predictor variables in the pseudo-value regression model.
stderrs(SOHPIEres)
stderrs(SOHPIEres)
SOHPIEres |
An object called after running SOHPIE_DNA. |
A table that includes standard error of betahats for all predictors regressed in the fitted model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # stderrs() function will return standard error of betahats only. stderrs(SOHPIEres)
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # stderrs() function will return standard error of betahats only. stderrs(SOHPIEres)
A function to retrieve a vector of standard error of coefficient estimates (betahats) of each taxa for one specific variable.
stderrs_specific_var(stderrstab, varname)
stderrs_specific_var(stderrstab, varname)
stderrstab |
A table that includes standard error of betahat for a specific variable. |
varname |
Specify the name of the variable of interest. |
A vector of standard error of betahats for a single variable from the model.
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # stderrs() function will return standard error of betahats only. stderrstab <- stderrs(SOHPIEres) # stderrs_specific_var() will return standard error of coefficient estimates of # a single variable of interest. stderrs_specific_var(stderrstab = stderrstab, varname = "bin_dog")
data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Clinical/demographic covariates (phenotypic data): # Note: All of these covariates will be included in the regression, so # please make sure that phenodat includes the variables that will be analyzed only. phenodat = combinedamgut[, 1:7] # first column is ID, so not using it. # Obtain indices of each grouping factor # In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) SOHPIEres <- SOHPIE_DNA(OTUdat = OTUtab, clindat = phenodat, groupA = newindex_grpA, groupB = newindex_grpB, c = 0.5) # stderrs() function will return standard error of betahats only. stderrstab <- stderrs(SOHPIEres) # stderrs_specific_var() will return standard error of coefficient estimates of # a single variable of interest. stderrs_specific_var(stderrstab = stderrstab, varname = "bin_dog")
A function to compute the network centrality (i.e. total connectivity) of each microbial taxa from the association matrix.
thetahats(asso.matinput)
thetahats(asso.matinput)
asso.matinput |
An input is an association matrix that is estimated from the user-provided OTU data. |
A vector containing network centrality of each taxa.
A function to calculate jackknife pseudo-values
thetatildefun(thetahatinput, thetahatdropinput, sizegroup)
thetatildefun(thetahatinput, thetahatdropinput, sizegroup)
thetahatinput |
A network centrality calculated from association matrix for whole sample. |
thetahatdropinput |
Network centralities calculated from re-estimated association matrices for leave-one-out samples. |
sizegroup |
Sample size for group. |
A jackknife pseudo-value will be returned.
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B)
# In this example, the subset of the American Gut Project data will be used. data(combinedamgut) # A complete data containing columns with taxa and clinical covariates. # Note: The line below will use a toy example with the first 30 out of 138 taxa. OTUtab = combinedamgut[ , 8:37] # Obtain indices of each grouping factor # In this example, a variable indicating the status of living # with a dog was chosen (i.e. bin_dog). # Accordingly, Groups A and B imply living without and with a dog, respectively. newindex_grpA = which(combinedamgut$bin_dog == 0) newindex_grpB = which(combinedamgut$bin_dog == 1) # Now, we estimate (and re-estimate) association matrices # for each group separately. asso_matA = asso_mat(OTUdat=OTUtab, group=newindex_grpA) asso_matB = asso_mat(OTUdat=OTUtab, group=newindex_grpB) # Calculate the network centrality. thetahat_grpA = thetahats(asso_matA$assomat) thetahat_grpB = thetahats(asso_matB$assomat) # Obtain network centrality for the re-estimated association matrices. thetahat_drop_grpA = sapply(asso_matA$reest.assomat, thetahats) thetahat_drop_grpB = sapply(asso_matB$reest.assomat, thetahats) # Sample sizes for each group. n_A <- length(newindex_grpA) n_B <- length(newindex_grpB) # Now calculate jackknife pseudo-values for each group. thetatilde_grpA = thetatildefun(thetahat_grpA, thetahat_drop_grpA, n_A) thetatilde_grpB = thetatildefun(thetahat_grpB, thetahat_drop_grpB, n_B)