metaGE-vignette

This package provides functionalities to perform a meta-analysis of genetic-wide association studies results in plant quantitative genetics. The present document displays the code and results corresponding to the analysis of the package data set.

Package installation

First one need to install the metaGE package.

if (!require('metaGE')){
  install.packages('metaGE')
} 
#> Loading required package: metaGE
library('metaGE')

The analysis requires the following R packages:


library(tidyverse)
library(DT)

## For graphical displays
library(data.table)
library(corrplot)
library(ggplot2)

Build the dataset

The starting point of the analysis is the list of files corresponding to the association studies results performed in the different environments. Here we consider only three association files for simplification. These files are included in the package.

## Get the folder containing the association file
RepData <- system.file("extdata", package = "metaGE")

## Get the complete list of association files
File.list <- list.files(RepData ,full.names = TRUE) %>% 
  tibble(Names = .)%>% 
  mutate(ShortNames = Names %>%
           str_remove(pattern = paste0(RepData,"/")) %>%
           str_remove(pattern = "_3DF.txt"))  %>%
  select(ShortNames,Names) %>% 
  deframe

File.list
#>                                                            Gai12R 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12R_3DF.txt" 
#>                                                            Gai12W 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12W_3DF.txt" 
#>                                                            Gai13R 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai13R_3DF.txt"

Here is what the data look like in a single file:

## Have a look at the first one
fread(File.list[1]) %>% head()  %>% datatable()

One now needs to collect the results of all association files into a single dataset using the metaGE.collect function.

Notice that all files may not have the same SNPs (due to filtering per environment). This will result in NAs when files are merged. By default, any marker with NAs is discarded. Specify NA.rmv = FALSE if you want to keep the marker with NAs.

###Build the dataset
## First provide the list of variable names 
Names.list <- list(MARKER="Marker_Name",
                   CHR="Chromosome",
                   POS="Marker_Position",
                   FREQ="Maf",
                   EFFECT="SNP_Weight",
                   PVAL="Pvalue",
                   ALLELE0="Allele1",
                   ALLELE1="Allele2")

MinFreq <- 0.07

## Now collect
MetaData <- metaGE.collect(FileNames = File.list, VariableNames = Names.list, MinFreq = MinFreq)
#> 106 duplicated markers removed
#> 4 markers with NAs removed
head(MetaData$Data) %>% datatable()

The association files can be in several folders. In this case, one can define the FileNames argument as a list of lists. The argument VariableNames may also be a list of lists.

## Get the list of the association files
File.list1 <- File.list[1:2]

## Get the list of the other association files
File.list2 <- File.list[3]

File.listc <- list("List1" = File.list1, "List2" = File.list2)
File.listc
#> $List1
#>                                                            Gai12R 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12R_3DF.txt" 
#>                                                            Gai12W 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12W_3DF.txt" 
#> 
#> $List2
#>                                                            Gai13R 
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai13R_3DF.txt"

###Build the dataset
## First provide the list of variable names 
Names.listc <- list(Names.list, Names.list)

MinFreq <- 0.07

## Now collect
MetaData <- metaGE.collect(FileNames = File.listc, VariableNames = Names.listc, MinFreq = MinFreq)
#> 106 duplicated markers removed
#> 4 markers with NAs removed

head(MetaData$Data)  %>% datatable()

From now on, we consider the dataset metaData included in the package metaGE. This dataset contains the results of 10 different genetic association studies on maize lines testing the association between a set of 27 411 markers and the grain yield.

# Import the data
data("metaData")

Accounting for correlations between individual GWAS

One can compute the correlations between the individual GWAS using the metaGE.cor function.

Threshold <- 0.8
MatCorr <- metaGE.cor(metaData, Threshold = Threshold)

Here is what the correlation matrix looks like:

corrplot(MatCorr,order = "hclust")

Global meta-analysis procedures

One may fit different models with the metaGE.fit function depending on the argument Method :

  • if Method = 'Fe', then the Fixed Effect (Fe) model is fitted, and a test to identify globally significant markers is performed.
  • if Method ='Re', then Random Effect (Re) model is fitted, and a test that allows some heterogeneity on the effect of the marker is performed.

Here are the two different models:

## Fixed Effect
FeDF <- metaGE.fit(metaData, MatCorr, Method = "Fe")
head(FeDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE))  %>% datatable()

## Random Effect
ReDF <- metaGE.fit(metaData, MatCorr, Method = "Re")
head(ReDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE))  %>% datatable()

One can have a look at the pvalues one gets for the different sets of tests to check for any problem. This can be done using the metaGE.pvalplot function, which displays the p-value distribution and the QQplot of the -log10(pvalues).

Pvalue.list <- list('PVALUE.Fe'= FeDF$PVALUE, 'PVALUE.Re'= ReDF$PVALUE)
plott <- map(names(Pvalue.list),~metaGE.pvalplot(Pvalue.list[[.x]], Main= .x) )

Check the candidates

First apply some multiple testing procedure (here Benjamini-Hochberg with H0 prior estimation).

CorrectingPValues <- function(Pvalues){

  ## Get a p0 estimate
  p0 = min(2*sum(Pvalues > 0.5)/length(Pvalues),1-1/length(Pvalues))

  ## Get the corrected p-values
  CorrectedPval <- stats::p.adjust(Pvalues, method = 'BH')*p0

  return(CorrectedPval)
}

Here is the number of significant markers at a 0.05 threshold for the different sets of tests :

## FDR control
Alpha <- 0.05
Signif <- lapply(Pvalue.list, FUN = function(i){
                  return(CorrectingPValues(i) %>% 
                    `<`(Alpha) %>% 
                    which)})

lapply(X = Signif,FUN = length)
#> $PVALUE.Fe
#> [1] 40
#> 
#> $PVALUE.Re
#> [1] 167

Manhattan plot and heatmap

One can draw the corresponding manhattan plot using the metaGE.manhattan :


PvalThresholdFe <-FeDF[Signif$PVALUE.Fe,]$PVALUE%>% max %>% max(.,0)
manhattanFe <- metaGE.manhattan(Data = FeDF,VarName = 'PVALUE', Threshold = PvalThresholdFe,Main = '-log10(Pval) alongside the chromosome Fe method' )
print(manhattanFe)


PvalThresholdRe <- ReDF[Signif$PVALUE.Re,]$PVALUE%>% max %>% max(.,0)
manhattanRe <- metaGE.manhattan(Data = ReDF,VarName = 'PVALUE', Threshold = PvalThresholdRe,Main = '-log10(Pval) alongside the chromosome Re method')
print(manhattanRe)

One can draw the corresponding heatmap using the metaGE.heatmap :

heatmapFe <- metaGE.heatmap(Data = FeDF[Signif$PVALUE.Fe,],Prefix = "Z.",Main = "Z-scores of Fe significant markers across environments")



heatmapRe <- metaGE.heatmap(Data = ReDF[Signif$PVALUE.Re,],Prefix = "Z.", Main = "Z-scores of Re significant markers across environments")

Score local

One can use the score local approach developed by Fariello MI, Boitard S, Mercier S, et al.(2017) using the metaGE.lscore. This approach aims to detect significant regions in a genome sequence by accumulating single marker p-values. The technical details of the computation can be found in Fariello MI, Boitard S, Mercier S, et al. Accounting for linkage disequilibrium in genome scans for selection without individual genotypes: The local score approach. Mol Ecol. 2017;00:1–15. https://doi.org/10.1111/mec.14141.

x <- 3
FeDF_ls <- metaGE.lscore(Data = FeDF, PvalName = "PVALUE", xi = x)

Here are the significant regions founded by the score local approach :

FeDF_ls$SigZones %>% datatable()

One can draw the manhattan plot of the score local and highlight the significtives markers using the metaGE.manhattan :

manhattanFe_lscore <- metaGE.manhattan(Data = FeDF_ls$Data,
                              VarName = "SCORE",
                              Score = T,
                              SigZones = FeDF_ls$SigZones )
print(manhattanFe_lscore+ggtitle('Score local alongside the chromosome Fe method'))

Tests for GxE interactions

Different tests may be performed with the metaGE.test function depending on the argument :

  • if Incidence is provided to the function and Contrast = NULL, then a test of contrast to identify markers significant for at least one subclass of environments is performed.
  • if Incidence and Contrast are provided to the function, then the test of contrast specified is performed.
  • if Covariate is provided to the function, then Random Effect regression, a test to identify significant markers correlated to environments covariate, is performed.

One can perform several tests by providing a list of the arguments Contrast and/or Incidence and/or Covariate.

Some covariates describing the environments are available in the envDesc data set :

data("envDesc")
envDesc %>% datatable()

Tests of contrast

First, one must build the incidence matrix using the metaGE.incidence function.

## Build the incidence matrix 
(Incidence.Temp <- metaGE.incidence(VarName = "Temp",Covariate = envDesc,EnvName = "ShortName", Data = metaData))
#>        Cool Hot Hot(day)
#> Cam12R    0   1        0
#> Cam12W    0   1        0
#> Deb12R    1   0        0
#> Deb12W    1   0        0
#> Gai12R    1   0        0
#> Gai12W    1   0        0
#> Kar12R    1   0        0
#> Kar12W    1   0        0
#> Ner12R    0   0        1
#> Ner12W    0   0        1

One can test whether the markers are significant in at least one environment subclass by setting Contrast to NULL. One can also identify significant markers with a distinct effect for the different subclasses of environments by specifying the appropriate Contrast.

One can use the metaGE.test function to perform tests of contrast.

## Build the list of Incidence
Incidence.list <- list(Temp = Incidence.Temp,
                       Diff.Temp = Incidence.Temp)

#Build the list of Contrast
Contrast.list <- list(Temp = NULL,
                      Diff.Temp = matrix(c(1,-1,0,0,1,-1),2,byrow = T)) 


ContrastDF <- metaGE.test(Data = metaData, MatCorr = MatCorr,
                          Incidence = Incidence.list,
                          Contrast = Contrast.list)

Here are the number of significant markers at a 0.05 threshold on control FDR for the different sets of tests : (Benjamini-Hochberg with H0 prior estimation)

Alpha <- 0.05
SignifContrast <- apply(ContrastDF %>% select(contains("PVALUE.")),2,
                        FUN = function(i){return(CorrectingPValues(i) %>% 
                                                    `<`(Alpha) %>% 
                                                      which)})

lapply(SignifContrast,length)
#> $PVALUE.Temp
#> [1] 662
#> 
#> $PVALUE.Diff.Temp
#> [1] 611

One can draw the corresponding heatmap using the metaGE.heatmap :

# Specify the groups of environment
heatmap_Temp <- metaGE.heatmap(Data = ContrastDF[SignifContrast$PVALUE.Temp,],EnvGroups = envDesc[,1:2],Prefix = "Z.",Main = "Z-scores of markers with contrasted effect according the temperature")

Tests of meta-regression

One may want to identify markers correlated to environments covariate. These can be done by performing a meta-regression test thanks to the function metaGE.test by providing a data frame containing one column corresponding to the environments (the first one) and column(s) corresponding to the covariate(s) in the argument Covariate.One can test whether the markers are significant in at least one environment subclass by setting Contrast to NULL. One can also identify significant markers with a distinct effect for the different subclasses of environments by specifying the appropriate Contrast.

One can use the metaGE.test function to perform tests of contrast.

RegressionDF <- metaGE.test(Data=metaData,MatCorr = MatCorr,Covariate = envDesc[,c(1,5,6)],EnvName = "ShortName")

Here are the number of significant markers at a 0.05 threshold on control FDR for the meta-regression tests : (Benjamini-Hochberg with H0 prior estimation)

SignifRegression <- apply(RegressionDF %>% select(contains("PVALUE.")),2,
                        FUN = function(i){return(CorrectingPValues(i) %>% 
                                                    `<`(Alpha) %>% 
                                                      which)})

lapply(SignifRegression,length)
#> $PVALUE.Tnight.mean
#> [1] 345
#> 
#> $PVALUE.ET0.mean
#> [1] 0

Here are the five most significant markers correlated to the Tnight.mean covariate.

RegressionDF[SignifRegression$PVALUE.Tnight.mean,] %>% select(CHR, POS, MARKER, PVALUE.Tnight.mean) %>% arrange(PVALUE.Tnight.mean) %>% head()  %>% datatable()

One can draw the graph of the marker effects according to the covariate using the metaGE.regplot.

metaGE.regplot(Data = metaData, Covariate = envDesc,VarName = "Tnight.mean",EnvName = "ShortName", MarkerName = "AX-90548528", aesCol = "Classification")