--- title: "Meta analysis with special GMCMs" author: "Anders Ellern Bilgrau" date: "2019-08-31" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Meta analysis with special GMCMs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r knit-int, echo=FALSE, include=FALSE} library("knitr") set.seed(5828993) opts_knit$set(self.contained = TRUE) ``` This is a quick tutorial for using the special GMCM for meta analysis. ## Initialization The **GMCM**^[1][1]^ package is loaded. ```{r load-packages, include=TRUE} #install.packages("GMCM") # Uncomment to install the GMCM package library("GMCM") ``` If **GMCM** is *not* installed, please uncomment the above line and rerun the script. ## Loading data The data is loaded and the first rows are printed. To illustrate we load the `u133VsExon` dataset within the package. The dataset contains 19,577 *p*-values for the null hypothesis of no differential gene expression between two cell types for each of two different experiments called `u133` and `exon`. ```{r load-data, include=TRUE, echo=TRUE} x <- get(data("u133VsExon")) head(x, n = 4) ``` See `help("u133VsExon")` for more information. Next, we subset the data to simply reduce computation time. After that, we will rank it, and visualize it. ```{r} x <- x[1:5000, ] ``` The values above are p-values where *small* values indicate strong evidence---contrary to what is expected by the special model. In the special model, *large* values should be critical to the null hypothesis. Therefore we ranks and scale 1 - p: ```{r preprocess-data} u <- Uhat(1 - x) ``` The original and ranked data is plotted: ```{r} par(mfrow = c(1,2)) # Visualizing P-values and the ranked P-values plot(x, cex = 0.5, pch = 4, col = "tomato", main = "P-values", xlab = "P-value (U133)", ylab = "P-value (Exon)") plot(u, cex = 0.5, pch = 4, col = "tomato", main = "Ranked P-values", xlab = "rank(1-P) (U133)", ylab = "rank(1-P) (Exon)") ``` Here each point represent a gene. The genes in the lower left of the first panel and correspondingly in the upper right of the second panel are the seemingly *reproducible* genes. They have a low *p*-value and thus a high rank in *both* experiments. The genes in the upper left and lower right are the ones that are apparently spurious results; they have a low *p*-value in only one experiment. ## Initial parameters The initial parameters are set to ```{r show-initial-params, include=TRUE, echo=TRUE} init_par <- c(pie1 = 0.6, mu = 1, sigma = 1, rho = 0.2) ``` ## Model fitting With the data loaded and prepared, the model is can be fitted with the defined initial parameters. ```{r fit_model, error=TRUE} par <- fit.meta.GMCM(u = u, init.par = init_par, method = "NM", max.ite = 1000, verbose = FALSE) print(par) ``` We here use the Nelder-Mead fitting method with with a maximum number of iterations of `1000`. ## Meta analysis with unsupervised clustering The estimated parameters are used to calculate the local and adjusted irreproducibility discovery rates: ```{r compute_probs} meta_IDR_thres <- 0.05 out <- get.IDR(x, par = par, threshold = meta_IDR_thres) # Compute IDR str(out) out <- get.IDR(u, par = par, threshold = meta_IDR_thres) below <- out[["IDR"]] < meta_IDR_thres out$l <- sum(below) out$Khat <- ifelse(below, 2, 1) ``` By default, `get.IDR` computes thresholds on the adjusted IDR. The local irreproducibility discovery rate correspond to the posterior probability of the point originating from the irreproducible component. ## Results The classes are counted by ```{r classes_table} table(out$Khat) ``` Where we see how many of the 5000 genes are deemed reproducible and irreproducible. The results are also displayed by plotting ```{r plot_results} plot(x, col = out$Khat, asp = 1) # Plot of raw values plot(u, col = out$Khat, asp = 1) # Plot of copula values z <- GMCM:::qgmm.marginal(u, theta = meta2full(par, d = ncol(u))) # Estimate latent process plot(z, col = out$Khat, asp = 1) # Plot of estimated latent process ``` The model indeed capture genes in the upper right and classify them as reproducible. The fitted `par` object can be converted to a `theta` object which can be plotted directly: ```{r plot_theta} plot(meta2full(par, d = ncol(u))) ``` ### Session information This report was generated using **rmarkdown**^[2][2]^ and **knitr**^[3][3]^ under the session given below. The report utilizes [parameterized reports][2] and [`knitr::spin`][3]. ```{r session-info} sessionInfo() ``` ### References Please cite the **GMCM** paper^[1][1]^ if you use the package or shiny app. ```{r citation, echo=FALSE, results='asis'} cites <- lapply(c("GMCM", "knitr", "rmarkdown"), citation) fmt_cites <- unlist(lapply(cites, format, style = "text")) cat(paste0(" ", seq_along(fmt_cites), ". ", fmt_cites, "\n")) ``` [1]: http://doi.org/10.18637/jss.v070.i02 [2]: https://bookdown.org/yihui/rmarkdown/parameterized-reports.html [3]: https://yihui.name/knitr/demo/stitch/