Meta analysis with special GMCMs

This is a quick tutorial for using the special GMCM for meta analysis.

Initialization

The GMCM1 package is loaded.

#install.packages("GMCM")  # Uncomment to install the GMCM package
library("GMCM")

If GMCM is not installed, please uncomment the above line and rerun the script.

Loading data

The data is loaded and the first rows are printed. To illustrate we load the u133VsExon dataset within the package. The dataset contains 19,577 p-values for the null hypothesis of no differential gene expression between two cell types for each of two different experiments called u133 and exon.

x <- get(data("u133VsExon"))
head(x, n = 4)
##                         u133         exon
## ENSG00000265096 0.1756103672 1.072572e-01
## ENSG00000152495 0.0017797571 6.741108e-10
## ENSG00000198040 0.0053705738 1.505019e-03
## ENSG00000229092 0.0006693361 6.755118e-05

See help("u133VsExon") for more information.

Next, we subset the data to simply reduce computation time. After that, we will rank it, and visualize it.

x <- x[1:5000, ]

The values above are p-values where small values indicate strong evidence—contrary to what is expected by the special model. In the special model, large values should be critical to the null hypothesis.

Therefore we ranks and scale 1 - p:

u <- Uhat(1 - x)

The original and ranked data is plotted:

par(mfrow = c(1,2))  # Visualizing P-values and the ranked P-values
plot(x, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
     xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(u, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
     xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")

Here each point represent a gene. The genes in the lower left of the first panel and correspondingly in the upper right of the second panel are the seemingly reproducible genes. They have a low p-value and thus a high rank in both experiments. The genes in the upper left and lower right are the ones that are apparently spurious results; they have a low p-value in only one experiment.

Initial parameters

The initial parameters are set to

init_par <- c(pie1 = 0.6, mu = 1, sigma = 1, rho = 0.2)

Model fitting

With the data loaded and prepared, the model is can be fitted with the defined initial parameters.

par <- fit.meta.GMCM(u = u,
                     init.par = init_par,
                     method = "NM",
                     max.ite = 1000,
                     verbose = FALSE)
print(par)
##      pie1        mu     sigma       rho 
## 0.7489036 1.9494403 1.1066673 0.7360319

We here use the Nelder-Mead fitting method with with a maximum number of iterations of 1000.

Meta analysis with unsupervised clustering

The estimated parameters are used to calculate the local and adjusted irreproducibility discovery rates:

meta_IDR_thres <- 0.05
out <- get.IDR(x, par = par,
               threshold = meta_IDR_thres) # Compute IDR
str(out)
## List of 5
##  $ idr      : num [1:5000] 0.872 0.99 0.976 0.981 0.203 ...
##  $ IDR      : num [1:5000] 0.4894 0.769 0.7032 0.7252 0.0645 ...
##  $ l        : int 431
##  $ threshold: num 0.05
##  $ Khat     : num [1:5000] 1 1 1 1 1 1 1 1 1 1 ...
out <- get.IDR(u, par = par, threshold = meta_IDR_thres)
below <- out[["IDR"]] < meta_IDR_thres
out$l <- sum(below)
out$Khat <- ifelse(below, 2, 1)

By default, get.IDR computes thresholds on the adjusted IDR. The local irreproducibility discovery rate correspond to the posterior probability of the point originating from the irreproducible component.

Results

The classes are counted by

table(out$Khat)
## 
##    1    2 
## 4136  864

Where we see how many of the 5000 genes are deemed reproducible and irreproducible.

The results are also displayed by plotting

plot(x, col = out$Khat, asp = 1) # Plot of raw values

plot(u, col = out$Khat, asp = 1) # Plot of copula values

z <- GMCM:::qgmm.marginal(u, theta = meta2full(par, d = ncol(u))) # Estimate latent process
plot(z, col = out$Khat, asp = 1) # Plot of estimated latent process

The model indeed capture genes in the upper right and classify them as reproducible.

The fitted par object can be converted to a theta object which can be plotted directly:

plot(meta2full(par, d = ncol(u)))

Session information

This report was generated using rmarkdown2 and knitr3 under the session given below. The report utilizes parameterized reports and knitr::spin.

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.49     GMCM_1.4       rmarkdown_2.29
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
##  [5] maketools_1.3.1   cachem_1.1.0      htmltools_0.5.8.1 buildtools_1.0.0 
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.2    sys_3.4.3         tools_4.4.2       evaluate_1.0.1   
## [17] bslib_0.8.0       ellipse_0.5.0     Rcpp_1.0.13-1     yaml_2.3.10      
## [21] jsonlite_1.8.9    rlang_1.1.4

References

Please cite the GMCM paper1 if you use the package or shiny app.

  1. Bilgrau AE, Eriksen PS, Rasmussen JG, Johnsen HE, Dybkaer K, Boegsted M (2016). “GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models.” Journal of Statistical Software, 70(2), 1-23. doi:10.18637/jss.v070.i02 https://doi.org/10.18637/jss.v070.i02.
  2. Xie Y (2024). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.49, https://yihui.org/knitr/.
  3. Xie Y (2015). Dynamic Documents with R and knitr, 2nd edition. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963, https://yihui.org/knitr/.
  4. Xie Y (2014). “knitr: A Comprehensive Tool for Reproducible Research in R.” In Stodden V, Leisch F, Peng RD (eds.), Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595.
  5. Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2024). rmarkdown: Dynamic Documents for R. R package version 2.29, https://github.com/rstudio/rmarkdown.
  6. Xie Y, Allaire J, Grolemund G (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9781138359338, https://bookdown.org/yihui/rmarkdown.
  7. Xie Y, Dervieux C, Riederer E (2020). R Markdown Cookbook. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9780367563837, https://bookdown.org/yihui/rmarkdown-cookbook.