Title: | Meta-Analysis for MicroArrays |
---|---|
Description: | Combination of either p-values or modified effect sizes from different studies to find differentially expressed genes. |
Authors: | Guillemette Marot [aut,cre] |
Maintainer: | Samuel Blanck <[email protected]> |
License: | GPL |
Version: | 3.1.3 |
Built: | 2024-10-28 07:04:11 UTC |
Source: | CRAN |
Combines either p-values or moderated effect sizes from different studies to find differentially expressed genes.
Package: | metaMA |
Type: | Package |
Version: | 3.1.2 |
Date: | 2015-01-28 |
License: | GPL |
LazyLoad: | yes |
pvalcombination
and EScombination
are the most important functions to combine unpaired data.
pvalcombination
combines p-values from individual studies.
EScombination
combines effect sizes from individual studies.
pvalcombination.paired
and EScombination.paired
are to be used for paired data.
IDDIDR
can help in the interpretation of gain and loss of information due to meta-analysis.
Guillemette Marot <[email protected]>
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
library(metaMA) data(Singhdata) EScombination(esets=Singhdata$esets,classes=Singhdata$classes) pvalcombination(esets=Singhdata$esets,classes=Singhdata$classes) #more details are provided in the vignette; only open it in interactive R sessions if(interactive()){ vignette("metaMA") }
library(metaMA) data(Singhdata) EScombination(esets=Singhdata$esets,classes=Singhdata$classes) pvalcombination(esets=Singhdata$esets,classes=Singhdata$classes) #more details are provided in the vignette; only open it in interactive R sessions if(interactive()){ vignette("metaMA") }
Computes empirical Bayes statistics from limma analysis with only one group effect.
calcfit2Diffrep(C1, C2)
calcfit2Diffrep(C1, C2)
C1 |
Gene expression data of the arrays in the first condition. Each row of |
C2 |
Gene expression data of the arrays in the second condition. Each row of |
Returns fit2 described in limma vignette. To be used with unpaired data.
fit2
see Bioconductor limma vignette
Combines effect sizes already calculated.
directEScombi(ES, varES, BHth = 0.05, useREM = TRUE)
directEScombi(ES, varES, BHth = 0.05, useREM = TRUE)
ES |
Matrix of effect sizes. Each column of |
varES |
Matrix of effect size variances. Each column of |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
useREM |
A logical value indicating whether or not to include the between-study variance into the model. |
Combines effect sizes with the method presented in (Choi et al., 2003).
List
DEindices |
Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
Choi, J. K., Yu, U., Kim, S., and Yoo, O. J. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics, 19 Suppl 1.
Combines one sided p-values with the inverse normal method.
directpvalcombi(pvalonesided, nrep, BHth = 0.05)
directpvalcombi(pvalonesided, nrep, BHth = 0.05)
pvalonesided |
List of vectors of one sided p-values to be combined. |
nrep |
Vector of numbers of replicates used in each study to calculate the previous one-sided p-values. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
List
DEindices |
Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
One-sided p-values are required to avoid directional conflicts. Then a two-sided test is performed to find differentially expressed genes.
Guillemette Marot
Hedges, L. and Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press.
Function not to be used separately.
effectsize(tstat, ntilde, m)
effectsize(tstat, ntilde, m)
tstat |
Vector of test statistics and effect sizes. |
ntilde |
Proportion factor between a test statistic and its corresponding effect size. |
m |
Number of degrees of freedom. |
Matrix with one row per gene, and in column:
d |
Commonly used effect size (which is biased) |
vard |
Variance of the commonly used effect size |
dprime |
Unbiased effect size |
vardprime |
Variance of the unbiased effect size |
Guillemette Marot with contribution from Ankur Ravinarayana Chakravarthy
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. Moderated effect size combination for microarray meta-analyses and comparison study. Submitted.
#for SMVar: #stati$TestStat[order(stati$GeneId)],length(classes[[i]]),stati$DegOfFreedom[order(stati$GeneId)]) #for Limma #effectsize(fit2i$t,length(classes[[i]]),(fit2i$df.prior+fit2i$df.residual))
#for SMVar: #stati$TestStat[order(stati$GeneId)],length(classes[[i]]),stati$DegOfFreedom[order(stati$GeneId)]) #for Limma #effectsize(fit2i$t,length(classes[[i]]),(fit2i$df.prior+fit2i$df.residual))
Calculates effect sizes from unpaired data either from classical or moderated t-tests (Limma, SMVar) for each study and combines these effect sizes.
EScombination(esets, classes, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
EScombination(esets, classes, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
esets |
List of matrices (or data frames), one matrix per study. Each matrix has one row per gene and one column per replicate and gives the expression data for both conditions with the order specified in the |
classes |
List of class memberships, one per study. Each vector or factor of the list can only contain two levels which correspond to the two conditions studied. |
moderated |
Method to calculate the test statistic inside each study from which the effect size is computed. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
List
Study1 |
Vector of indices of differentially expressed genes in study 1. Similar names are given for the other individual studies. |
AllIndStudies |
Vector of indices of differentially expressed genes found by at least one of the individual studies. |
Meta |
Vector of indices of differentially expressed genes in the meta-analysis. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
While the invisible object resulting from this function contains
the values described previously, other quantities of interest are printed:
DE,IDD,Loss,IDR,IRR.
All these quantities are defined in function IDDIDR
and in (Marot et al., 2009)
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(Singhdata) #Meta-analysis res=EScombination(esets=Singhdata$esets,classes=Singhdata$classes) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
data(Singhdata) #Meta-analysis res=EScombination(esets=Singhdata$esets,classes=Singhdata$classes) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
Calculates effect sizes from paired data either from classical or moderated t-tests (Limma, SMVar) for each study and combines these effect sizes.
EScombination.paired(logratios, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
EScombination.paired(logratios, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
logratios |
List of matrices (or data frames). Each matrix has one row per gene and one column per replicate and gives the logratios of one study. All studies must have the same genes. |
moderated |
Method to calculate the test statistic inside each study from which the effect size is computed. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
List
Study1 |
Vector of indices of differentially expressed genes in study 1. Similar names are given for the other individual studies. |
AllIndStudies |
Vector of indices of differentially expressed genes found by at least one of the individual studies. |
Meta |
Vector of indices of differentially expressed genes in the meta-analysis. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
While the invisible object resulting from this function contains
the values described previously, other quantities of interest are printed:
DE,IDD,Loss,IDR,IRR.
All these quantities are defined in function IDDIDR
and in (Marot et al., 2009)
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(Singhdata) #create artificially paired data: artificialdata=lapply(Singhdata$esets,FUN=function(x) (x[,1:10]-x[,11:20])) #Meta-analysis res=EScombination.paired(artificialdata) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
data(Singhdata) #create artificially paired data: artificialdata=lapply(Singhdata$esets,FUN=function(x) (x[,1:10]-x[,11:20])) #Meta-analysis res=EScombination.paired(artificialdata) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
Calculates the gain or the loss of differentially expressed genes due to meta-analysis compared to individual studies.
IDDIRR(finalde, deindst)
IDDIRR(finalde, deindst)
finalde |
Vector of indices of differentially expressed genes after meta-analysis |
deindst |
Vector of indices of differentially expressed genes found at least in one study |
DE |
Number of Differentially Expressed (DE) genes |
IDD |
Integration Driven Discoveries: number of genes that are declared DE in the meta-analysis that were not identified in any of the individual studies alone. |
Loss |
Number of genes that are declared DE in individual studies but not in meta-analysis. |
IDR |
Integration-driven Discovery Rate: proportion of genes that are identified as DE in the meta-analysis that were not identified in any of the individual studies alone. |
IRR |
Integration-driven Revision Rate: percentage of genes that are declared DE in individual studies but not in meta-analysis. |
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(Singhdata) out=EScombination(esets=Singhdata$esets,classes=Singhdata$classes) IDDIRR(out$Meta,out$AllIndStudies) ## The function is currently defined as #function(finalde,deindst) #{ #DE=length(finalde) #gains=finalde[which(!(finalde %in% deindst))] #IDD=length(gains) #IDR=IDD/DE*100 #perte=which(!(deindst %in% finalde)) #Loss=length(perte) #IRR=Loss/length(deindst)*100 #res=c(DE,IDD,Loss,IDR,IRR) #names(res)=c("DE","IDD","Loss","IDR","IRR") #res #}
data(Singhdata) out=EScombination(esets=Singhdata$esets,classes=Singhdata$classes) IDDIRR(out$Meta,out$AllIndStudies) ## The function is currently defined as #function(finalde,deindst) #{ #DE=length(finalde) #gains=finalde[which(!(finalde %in% deindst))] #IDD=length(gains) #IDR=IDD/DE*100 #perte=which(!(deindst %in% finalde)) #Loss=length(perte) #IRR=Loss/length(deindst)*100 #res=c(DE,IDD,Loss,IDR,IRR) #names(res)=c("DE","IDD","Loss","IDR","IRR") #res #}
Calculates differential expression p-values from unpaired data either from classical or moderated t-tests (Limma, SMVar) for each study and combines these p-values by the inverse normal method.
pvalcombination(esets, classes, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
pvalcombination(esets, classes, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
esets |
List of matrices (or data frames), one matrix per study. Each matrix has one row per gene and one column per replicate and gives the expression data for both conditions with the order specified in the |
classes |
List of class memberships, one per study. Each vector or factor of the list can only contain two levels which correspond to the two conditions studied. |
moderated |
Method to calculate the test statistic inside each study from which the p-value is computed. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
List
Study1 |
Vector of indices of differentially expressed genes in study 1. Similar names are given for the other individual studies. |
AllIndStudies |
Vector of indices of differentially expressed genes found by at least one of the individual studies. |
Meta |
Vector of indices of differentially expressed genes in the meta-analysis. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
While the invisible object resulting from this function contains
the values described previously, other quantities of interest are printed:
DE,IDD,Loss,IDR,IRR.
All these quantities are defined in function IDDIDR
and in (Marot et al., 2009)
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(Singhdata) #Meta-analysis res=pvalcombination(esets=Singhdata$esets,classes=Singhdata$classes) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
data(Singhdata) #Meta-analysis res=pvalcombination(esets=Singhdata$esets,classes=Singhdata$classes) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
Calculates differential expression p-values from paired data either from classical or moderated t-tests (Limma, SMVar) for each study and combines these p-values by the inverse normal method.
pvalcombination.paired(logratios, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
pvalcombination.paired(logratios, moderated = c("limma", "SMVar", "t")[1], BHth = 0.05)
logratios |
List of matrices. Each matrix has one row per gene and one column per replicate and gives the logratios of one study. All studies must have the same genes. |
moderated |
Method to calculate the test statistic inside each study from which the effect size is computed. |
BHth |
Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%. |
List
Study1 |
Vector of indices of differentially expressed genes in study 1. Similar names are given for the other individual studies. |
AllIndStudies |
Vector of indices of differentially expressed genes found by at least one of the individual studies. |
Meta |
Vector of indices of differentially expressed genes in the meta-analysis. |
TestStatistic |
Vector with test statistics for differential expression in the meta-analysis. |
While the invisible object resulting from this function contains
the values described previously, other quantities of interest are printed:
DE,IDD,Loss,IDR,IRR.
All these quantities are defined in function IDDIDR
and in (Marot et al., 2009)
Guillemette Marot
Marot, G., Foulley, J.-L., Mayer, C.-D., Jaffrezic, F. (2009) Moderated effect size and p-value combinations for microarray meta-analyses. Bioinformatics. 25 (20): 2692-2699.
data(Singhdata) #create artificially paired data: artificialdata=lapply(Singhdata$esets,FUN=function(x) (x[,1:10]-x[,11:20])) #Meta-analysis res=pvalcombination.paired(artificialdata) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
data(Singhdata) #create artificially paired data: artificialdata=lapply(Singhdata$esets,FUN=function(x) (x[,1:10]-x[,11:20])) #Meta-analysis res=pvalcombination.paired(artificialdata) #Number of differentially expressed genes in the meta-analysis length(res$Meta) #To plot an histogram of raw p-values rawpval=2*(1-pnorm(abs(res$TestStatistic))) hist(rawpval,nclass=100)
Performs t-tests for unpaired data row by row.
row.ttest.stat(mat1, mat2)
row.ttest.stat(mat1, mat2)
mat1 |
Matrix with data for the first condition |
mat2 |
Matrix with data for the second condition |
This function is much faster than employing apply with FUN=t.test
Vector with t-test statistics
## The function is currently defined as function(mat1,mat2){ n1<-dim(mat1)[2] n2<-dim(mat2)[2] n<-n1+n2 m1<-rowMeans(mat1,na.rm=TRUE) m2<-rowMeans(mat2,na.rm=TRUE) v1<-rowVars(mat1,na.rm=TRUE) v2<-rowVars(mat2,na.rm=TRUE) vpool<-(n1-1)/(n-2)*v1 + (n2-1)/(n-2)*v2 tstat<-sqrt(n1*n2/n)*(m2-m1)/sqrt(vpool) return(tstat)}
## The function is currently defined as function(mat1,mat2){ n1<-dim(mat1)[2] n2<-dim(mat2)[2] n<-n1+n2 m1<-rowMeans(mat1,na.rm=TRUE) m2<-rowMeans(mat2,na.rm=TRUE) v1<-rowVars(mat1,na.rm=TRUE) v2<-rowVars(mat2,na.rm=TRUE) vpool<-(n1-1)/(n-2)*v1 + (n2-1)/(n-2)*v2 tstat<-sqrt(n1*n2/n)*(m2-m1)/sqrt(vpool) return(tstat)}
Performs t-tests for paired data row by row.
row.ttest.statp(mat)
row.ttest.statp(mat)
mat |
Matrix with data to be tested (for example, log-ratios in microarray experiments). |
This function is much faster than employing apply with FUN=t.test.
Vector with t-test statistics.
## The function is currently defined as function(mat){ m<-rowMeans(mat,na.rm=TRUE) sd<-rowSds(mat,na.rm=TRUE) tstat<-m/(sd*sqrt(1/dim(mat)[2])) return(tstat)}
## The function is currently defined as function(mat){ m<-rowMeans(mat,na.rm=TRUE) sd<-rowSds(mat,na.rm=TRUE) tstat<-m/(sd*sqrt(1/dim(mat)[2])) return(tstat)}
Calculates variances of each row of an array
rowVars(x, na.rm = TRUE)
rowVars(x, na.rm = TRUE)
x |
Array of two or more dimensions, containing numeric, complex, integer or logical values, or a numeric data frame. |
na.rm |
Logical. Should missing values (including NaN) be omitted from the calculations? |
This function is the same as applying apply with FUN=var but is a lot faster.
A numeric or complex array of suitable size, or a vector if the result is one-dimensional. The dimnames (or names for a vector result) are taken from the original array.
## The function is currently defined as function (x,na.rm = TRUE) { sqr = function(x) x * x n = rowSums(!is.na(x)) n[n <= 1] = NA return(rowSums(sqr(x - rowMeans(x,na.rm = na.rm)), na.rm = na.rm)/(n - 1)) }
## The function is currently defined as function (x,na.rm = TRUE) { sqr = function(x) x * x n = rowSums(!is.na(x)) n[n <= 1] = NA return(rowSums(sqr(x - rowMeans(x,na.rm = na.rm)), na.rm = na.rm)/(n - 1)) }
Publicly available microarray dataset artificially split in 5 studies
data(Singhdata)
data(Singhdata)
List of 3 elements:
List of 5 data frames corresponding to 5 artificial studies, each with 12625 genes and 20 replicates (10 normal samples and 10 tumoral samples)
List of 5 numeric vectors with class memberships, one per study
Factor with 12625 levels corresponding to gene names
These data are available on the website http://www.bioinf.ucd.ie/people/ian/. We considered 50 normal samples and 50 tumoral samples, leaving out the 2 last tumoral samples. Data are already normalized.
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D'Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P.W., Golub, T. R., and Sellers,W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2). 203:209.
data(Singhdata)
data(Singhdata)