This package provides functionalities to perform a meta-analysis of genetic-wide association studies results in plant quantitative genetics. The present document displays the code and results corresponding to the analysis of the package data set.
First one need to install the metaGE
package.
if (!require('metaGE')){
install.packages('metaGE')
}
#> Loading required package: metaGE
library('metaGE')
The analysis requires the following R packages:
The starting point of the analysis is the list of files corresponding to the association studies results performed in the different environments. Here we consider only three association files for simplification. These files are included in the package.
## Get the folder containing the association file
RepData <- system.file("extdata", package = "metaGE")
## Get the complete list of association files
File.list <- list.files(RepData ,full.names = TRUE) %>%
tibble(Names = .)%>%
mutate(ShortNames = Names %>%
str_remove(pattern = paste0(RepData,"/")) %>%
str_remove(pattern = "_3DF.txt")) %>%
select(ShortNames,Names) %>%
deframe
File.list
#> Gai12R
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12R_3DF.txt"
#> Gai12W
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12W_3DF.txt"
#> Gai13R
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai13R_3DF.txt"
Here is what the data look like in a single file:
One now needs to collect the results of all association files into a
single dataset using the metaGE.collect
function.
Notice that all files may not have the same SNPs (due to filtering per environment). This will result in NAs when files are merged. By default, any marker with NAs is discarded. Specify NA.rmv = FALSE if you want to keep the marker with NAs.
###Build the dataset
## First provide the list of variable names
Names.list <- list(MARKER="Marker_Name",
CHR="Chromosome",
POS="Marker_Position",
FREQ="Maf",
EFFECT="SNP_Weight",
PVAL="Pvalue",
ALLELE0="Allele1",
ALLELE1="Allele2")
MinFreq <- 0.07
## Now collect
MetaData <- metaGE.collect(FileNames = File.list, VariableNames = Names.list, MinFreq = MinFreq)
#> 106 duplicated markers removed
#> 4 markers with NAs removed
head(MetaData$Data) %>% datatable()
The association files can be in several folders. In this case, one can define the FileNames argument as a list of lists. The argument VariableNames may also be a list of lists.
## Get the list of the association files
File.list1 <- File.list[1:2]
## Get the list of the other association files
File.list2 <- File.list[3]
File.listc <- list("List1" = File.list1, "List2" = File.list2)
File.listc
#> $List1
#> Gai12R
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12R_3DF.txt"
#> Gai12W
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai12W_3DF.txt"
#>
#> $List2
#> Gai13R
#> "/tmp/Rtmp6wi4SL/Rinst1a0770b9b5a0/metaGE/extdata/Gai13R_3DF.txt"
###Build the dataset
## First provide the list of variable names
Names.listc <- list(Names.list, Names.list)
MinFreq <- 0.07
## Now collect
MetaData <- metaGE.collect(FileNames = File.listc, VariableNames = Names.listc, MinFreq = MinFreq)
#> 106 duplicated markers removed
#> 4 markers with NAs removed
head(MetaData$Data) %>% datatable()
From now on, we consider the dataset metaData
included
in the package metaGE
. This dataset contains the results of
10 different genetic association studies on maize lines testing the
association between a set of 27 411 markers and the grain yield.
One can compute the correlations between the individual GWAS using
the metaGE.cor
function.
Here is what the correlation matrix looks like:
One may fit different models with the metaGE.fit
function depending on the argument Method
:
Method = 'Fe'
, then the Fixed Effect (Fe) model is
fitted, and a test to identify globally significant markers is
performed.Method ='Re'
, then Random Effect (Re) model is
fitted, and a test that allows some heterogeneity on the effect of the
marker is performed.Here are the two different models:
## Fixed Effect
FeDF <- metaGE.fit(metaData, MatCorr, Method = "Fe")
head(FeDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable()
## Random Effect
ReDF <- metaGE.fit(metaData, MatCorr, Method = "Re")
head(ReDF %>% select(CHR, POS, MARKER, Mu, Tau, PVALUE)) %>% datatable()
One can have a look at the pvalues one gets for the different sets of
tests to check for any problem. This can be done using the
metaGE.pvalplot
function, which displays the p-value
distribution and the QQplot of the -log10(pvalues).
Pvalue.list <- list('PVALUE.Fe'= FeDF$PVALUE, 'PVALUE.Re'= ReDF$PVALUE)
plott <- map(names(Pvalue.list),~metaGE.pvalplot(Pvalue.list[[.x]], Main= .x) )
First apply some multiple testing procedure (here Benjamini-Hochberg with H0 prior estimation).
CorrectingPValues <- function(Pvalues){
## Get a p0 estimate
p0 = min(2*sum(Pvalues > 0.5)/length(Pvalues),1-1/length(Pvalues))
## Get the corrected p-values
CorrectedPval <- stats::p.adjust(Pvalues, method = 'BH')*p0
return(CorrectedPval)
}
Here is the number of significant markers at a 0.05 threshold for the different sets of tests :
One can draw the corresponding manhattan plot using the
metaGE.manhattan
:
PvalThresholdFe <-FeDF[Signif$PVALUE.Fe,]$PVALUE%>% max %>% max(.,0)
manhattanFe <- metaGE.manhattan(Data = FeDF,VarName = 'PVALUE', Threshold = PvalThresholdFe,Main = '-log10(Pval) alongside the chromosome Fe method' )
print(manhattanFe)
PvalThresholdRe <- ReDF[Signif$PVALUE.Re,]$PVALUE%>% max %>% max(.,0)
manhattanRe <- metaGE.manhattan(Data = ReDF,VarName = 'PVALUE', Threshold = PvalThresholdRe,Main = '-log10(Pval) alongside the chromosome Re method')
print(manhattanRe)
One can draw the corresponding heatmap using the
metaGE.heatmap
:
One can use the score local approach developed by Fariello MI,
Boitard S, Mercier S, et al.(2017) using the metaGE.lscore
.
This approach aims to detect significant regions in a genome sequence by
accumulating single marker p-values. The technical details of the
computation can be found in Fariello MI, Boitard S, Mercier S, et
al. Accounting for linkage disequilibrium in genome scans for selection
without individual genotypes: The local score approach. Mol Ecol.
2017;00:1–15. https://doi.org/10.1111/mec.14141.
Here are the significant regions founded by the score local approach :
One can draw the manhattan plot of the score local and highlight the
significtives markers using the metaGE.manhattan
:
Different tests may be performed with the metaGE.test
function depending on the argument :
Incidence
is provided to the function and
Contrast = NULL
, then a test of contrast to identify
markers significant for at least one subclass of environments is
performed.Incidence
and Contrast
are provided to
the function, then the test of contrast specified is performed.Covariate
is provided to the function, then Random
Effect regression, a test to identify significant markers correlated to
environments covariate, is performed.One can perform several tests by providing a list of the arguments
Contrast
and/or Incidence
and/or
Covariate
.
Some covariates describing the environments are available in the
envDesc
data set :
First, one must build the incidence matrix using the
metaGE.incidence
function.
## Build the incidence matrix
(Incidence.Temp <- metaGE.incidence(VarName = "Temp",Covariate = envDesc,EnvName = "ShortName", Data = metaData))
#> Cool Hot Hot(day)
#> Cam12R 0 1 0
#> Cam12W 0 1 0
#> Deb12R 1 0 0
#> Deb12W 1 0 0
#> Gai12R 1 0 0
#> Gai12W 1 0 0
#> Kar12R 1 0 0
#> Kar12W 1 0 0
#> Ner12R 0 0 1
#> Ner12W 0 0 1
One can test whether the markers are significant in at least one
environment subclass by setting Contrast
to
NULL
. One can also identify significant markers with a
distinct effect for the different subclasses of environments by
specifying the appropriate Contrast
.
One can use the metaGE.test
function to perform tests of
contrast.
## Build the list of Incidence
Incidence.list <- list(Temp = Incidence.Temp,
Diff.Temp = Incidence.Temp)
#Build the list of Contrast
Contrast.list <- list(Temp = NULL,
Diff.Temp = matrix(c(1,-1,0,0,1,-1),2,byrow = T))
ContrastDF <- metaGE.test(Data = metaData, MatCorr = MatCorr,
Incidence = Incidence.list,
Contrast = Contrast.list)
Here are the number of significant markers at a 0.05 threshold on control FDR for the different sets of tests : (Benjamini-Hochberg with H0 prior estimation)
Alpha <- 0.05
SignifContrast <- apply(ContrastDF %>% select(contains("PVALUE.")),2,
FUN = function(i){return(CorrectingPValues(i) %>%
`<`(Alpha) %>%
which)})
lapply(SignifContrast,length)
#> $PVALUE.Temp
#> [1] 662
#>
#> $PVALUE.Diff.Temp
#> [1] 611
One can draw the corresponding heatmap using the
metaGE.heatmap
:
One may want to identify markers correlated to environments
covariate. These can be done by performing a meta-regression test thanks
to the function metaGE.test
by providing a data frame
containing one column corresponding to the environments (the first one)
and column(s) corresponding to the covariate(s) in the argument
Covariate
.One can test whether the markers are significant
in at least one environment subclass by setting Contrast
to
NULL
. One can also identify significant markers with a
distinct effect for the different subclasses of environments by
specifying the appropriate Contrast
.
One can use the metaGE.test
function to perform tests of
contrast.
RegressionDF <- metaGE.test(Data=metaData,MatCorr = MatCorr,Covariate = envDesc[,c(1,5,6)],EnvName = "ShortName")
Here are the number of significant markers at a 0.05 threshold on control FDR for the meta-regression tests : (Benjamini-Hochberg with H0 prior estimation)
SignifRegression <- apply(RegressionDF %>% select(contains("PVALUE.")),2,
FUN = function(i){return(CorrectingPValues(i) %>%
`<`(Alpha) %>%
which)})
lapply(SignifRegression,length)
#> $PVALUE.Tnight.mean
#> [1] 345
#>
#> $PVALUE.ET0.mean
#> [1] 0
Here are the five most significant markers correlated to the Tnight.mean covariate.
RegressionDF[SignifRegression$PVALUE.Tnight.mean,] %>% select(CHR, POS, MARKER, PVALUE.Tnight.mean) %>% arrange(PVALUE.Tnight.mean) %>% head() %>% datatable()
One can draw the graph of the marker effects according to the
covariate using the metaGE.regplot
.