This is a quick tutorial for using the special GMCM for meta analysis.
The GMCM1 package is loaded.
If GMCM is not installed, please uncomment the above line and rerun the script.
The data is loaded and the first rows are printed. To illustrate we
load the u133VsExon
dataset within the package. The dataset
contains 19,577 p-values for the null hypothesis of no
differential gene expression between two cell types for each of two
different experiments called u133
and
exon
.
## u133 exon
## ENSG00000265096 0.1756103672 1.072572e-01
## ENSG00000152495 0.0017797571 6.741108e-10
## ENSG00000198040 0.0053705738 1.505019e-03
## ENSG00000229092 0.0006693361 6.755118e-05
See help("u133VsExon")
for more information.
Next, we subset the data to simply reduce computation time. After that, we will rank it, and visualize it.
The values above are p-values where small values indicate strong evidence—contrary to what is expected by the special model. In the special model, large values should be critical to the null hypothesis.
Therefore we ranks and scale 1 - p:
The original and ranked data is plotted:
par(mfrow = c(1,2)) # Visualizing P-values and the ranked P-values
plot(x, cex = 0.5, pch = 4, col = "tomato", main = "P-values",
xlab = "P-value (U133)", ylab = "P-value (Exon)")
plot(u, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values",
xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)")
Here each point represent a gene. The genes in the lower left of the first panel and correspondingly in the upper right of the second panel are the seemingly reproducible genes. They have a low p-value and thus a high rank in both experiments. The genes in the upper left and lower right are the ones that are apparently spurious results; they have a low p-value in only one experiment.
The initial parameters are set to
With the data loaded and prepared, the model is can be fitted with the defined initial parameters.
par <- fit.meta.GMCM(u = u,
init.par = init_par,
method = "NM",
max.ite = 1000,
verbose = FALSE)
print(par)
## pie1 mu sigma rho
## 0.7489036 1.9494403 1.1066673 0.7360319
We here use the Nelder-Mead fitting method with with a maximum number
of iterations of 1000
.
The estimated parameters are used to calculate the local and adjusted irreproducibility discovery rates:
meta_IDR_thres <- 0.05
out <- get.IDR(x, par = par,
threshold = meta_IDR_thres) # Compute IDR
str(out)
## List of 5
## $ idr : num [1:5000] 0.872 0.99 0.976 0.981 0.203 ...
## $ IDR : num [1:5000] 0.4894 0.769 0.7032 0.7252 0.0645 ...
## $ l : int 431
## $ threshold: num 0.05
## $ Khat : num [1:5000] 1 1 1 1 1 1 1 1 1 1 ...
out <- get.IDR(u, par = par, threshold = meta_IDR_thres)
below <- out[["IDR"]] < meta_IDR_thres
out$l <- sum(below)
out$Khat <- ifelse(below, 2, 1)
By default, get.IDR
computes thresholds on the adjusted
IDR. The local irreproducibility discovery rate correspond to the
posterior probability of the point originating from the irreproducible
component.
The classes are counted by
##
## 1 2
## 4136 864
Where we see how many of the 5000 genes are deemed reproducible and irreproducible.
The results are also displayed by plotting
z <- GMCM:::qgmm.marginal(u, theta = meta2full(par, d = ncol(u))) # Estimate latent process
plot(z, col = out$Khat, asp = 1) # Plot of estimated latent process
The model indeed capture genes in the upper right and classify them as reproducible.
The fitted par
object can be converted to a
theta
object which can be plotted directly:
This report was generated using rmarkdown2
and knitr3 under the
session given below. The report utilizes parameterized
reports and knitr::spin
.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.48 GMCM_1.4 rmarkdown_2.29
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
## [5] maketools_1.3.1 cachem_1.1.0 htmltools_0.5.8.1 buildtools_1.0.0
## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.4.2 highr_0.11 sys_3.4.3 tools_4.4.2
## [17] evaluate_1.0.1 bslib_0.8.0 ellipse_0.5.0 Rcpp_1.0.13-1
## [21] yaml_2.3.10 jsonlite_1.8.9 rlang_1.1.4
Please cite the GMCM paper1 if you use the package or shiny app.