Title: | Fine-Scale Population Analysis (Rewrite for Gene-Trait-Environment Interaction Analysis) |
---|---|
Description: | Statistical tool set for population genetics. The package provides following functions: 1) estimators of genetic differentiation (FST), 2) regression analysis of environmental effects on genetic differentiation using generalized least squares (GLS) method, 3) interfaces to read and manipulate 'GENEPOP' format data files). For more information, see Kitada, Nakamichi and Kishino (2020) <doi:10.1101/2020.01.30.927186>. |
Authors: | Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino |
Maintainer: | Reiichiro Nakamichi <[email protected]> |
License: | GPL (>= 2.0) |
Version: | 0.4 |
Built: | 2024-12-11 06:49:51 UTC |
Source: | CRAN |
Statistical tool set for population genetics. The package provides following functions: 1) estimators of genetic differentiation (FST), 2) regression analysis of environmental effects on genetic differentiation using generalized least squares (GLS) method, 3) interfaces to read and manipulate 'GENEPOP' format data files). For more information, see Kitada, Nakamichi and Kishino (2020) <doi:10.1101/2020.01.30.927186>.
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Maintainer: Reiichiro Nakamichi <[email protected]>
Kitada S, Nakamichi R, Kishino H (2017) The empirical Bayes estimators of fine-scale population structure in high gene flow species. Mol. Ecol. Resources, 17, 1210-1222.
Nei M, Chesser RK (1983) Estimation of fixation indices and gene diversity. Annals of Human Genetics, 47, 253-259.
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358-1370.
Weir BS, Goudet J (2017) A Unified Characterization of Population Structure and Relatedness. Genetics, 206, 2085-2103.
This function reads a GENEPOP file (Rousset 2008), remove designated markers, and write a GENEPOP file of clipped data. The user can directly designate the names of the markers to be removed. The user also can set the filtering threshold of major allele frequency.
clip.GENEPOP(infile, outfile, remove.list = NULL, major.af = NULL)
clip.GENEPOP(infile, outfile, remove.list = NULL, major.af = NULL)
infile |
A character value specifying the name of the GENEPOP file to be clipped. |
outfile |
A character value specifying the name of the clipped GENEPOP file. |
remove.list |
A character value or vector specifying the names of the markers to be removed. The names must be included in the target GENEPOP file. |
major.af |
A numeric value specifying the threshold of major allele frequency for marker removal. Markers with major allele frequencies higher than this value will be removed. This value must be between 0 and 1. |
Reiichiro Nakamichi
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
# Example of GENEPOP file data(jsmackerel) jsm.genepop.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.genepop.file, sep="\n") # Remove markers designated by their names clipped_by_name.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_name.jsm.genepop.file, remove.list=c("Sni21","Sni26")) # Remove markers with high major allele frequencies (in this example, > 0.5) clipped_by_af.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_af.jsm.genepop.file, major.af=0.5) # Remove markers both by their names and by major allele frequencies clipped_by_both.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_both.jsm.genepop.file, remove.list=c("Sni21","Sni26"), major.af=0.5) # See four text files in temporary directory. # jsm.genepop.file : original data of five markers # clipped_by_name.jsm.genepop.file : clipped data by marker names # clipped_by_af.jsm.genepop.file : clipped data by allele frequency # clipped_by_both.jsm.genepop.file : clipped data by both names and frequency
# Example of GENEPOP file data(jsmackerel) jsm.genepop.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.genepop.file, sep="\n") # Remove markers designated by their names clipped_by_name.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_name.jsm.genepop.file, remove.list=c("Sni21","Sni26")) # Remove markers with high major allele frequencies (in this example, > 0.5) clipped_by_af.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_af.jsm.genepop.file, major.af=0.5) # Remove markers both by their names and by major allele frequencies clipped_by_both.jsm.genepop.file <- tempfile() clip.GENEPOP(infile=jsm.genepop.file, outfile=clipped_by_both.jsm.genepop.file, remove.list=c("Sni21","Sni26"), major.af=0.5) # See four text files in temporary directory. # jsm.genepop.file : original data of five markers # clipped_by_name.jsm.genepop.file : clipped data by marker names # clipped_by_af.jsm.genepop.file : clipped data by allele frequency # clipped_by_both.jsm.genepop.file : clipped data by both names and frequency
This function estimates genom-wide global FST based on Weir and Cockerham's theta (Weir & Cockerham 1984) from a GENEPOP data object (Rousset 2008). Missing genotype values in the GENEPOP file ("0000" or "000000") are simply ignored.
globalFST(popdata)
globalFST(popdata)
popdata |
Population data object created by |
fst |
Estimated genome-wide global FST |
se |
Standard error of estimated FST |
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358-1370.
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # theta estimation result.globalFST <- globalFST(popdata) print(result.globalFST)
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # theta estimation result.globalFST <- globalFST(popdata) print(result.globalFST)
This function provides a multiple regression analysis considering auto correlation of response variable using generalized least squres method (Aitken 1934). It supports lm
like format of model
. A typical model has the form response ~ terms
. Terms specification supports only first + second
form. Cross term specification of first * second
form is not supported.
GLS(model, data, omega = NULL)
GLS(model, data, omega = NULL)
model |
Symbolic description of the model to be fitted. |
data |
Data frame containing variables in the |
omega |
A numeric matrix of auto correlation of responce variable. |
coefficients |
Estimated coefficient, standard error, Z value and p value of each factor. |
variance |
Variance-covariance matrix of estimated coefficients. |
logL |
Log likelihood of fitted model. |
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Aitken AC (1934) On Least-squares and Linear Combinations of Observations. Proceedings of the Royal Society of Edinburgh, 55, 42-48.
# Example data of Atlantic herring data(herring) ah.genepop.file <- tempfile() ah.popname.file <- tempfile() cat(herring$genepop, file=ah.genepop.file, sep="\n") cat(herring$popname, file=ah.popname.file, sep=" ") # Data load popdata <- read.GENEPOP(ah.genepop.file, ah.popname.file) # Pop-specific FST and correlation among populations fst.popsp <- pop_specificFST(popdata, cov=TRUE) cov.fst.popsp <- fst.popsp$cov sd.fst.popsp <- sqrt(diag(cov.fst.popsp)) cov2.fst.popsp <- apply(cov.fst.popsp, 2, function(x){x / sd.fst.popsp}) cor.fst.popsp <- apply(cov2.fst.popsp, 1, function(x){x / sd.fst.popsp}) # Pop-pairwise FST and population structure fst.poppair <- pop_pairwiseFST(popdata) fst.md <- cmdscale(fst.poppair) # GLS analysis of FST and environmental factors test.data <- data.frame(fst=fst.md[,1], herring$environment) GLS(fst~., scale(test.data), omega=cor.fst.popsp)
# Example data of Atlantic herring data(herring) ah.genepop.file <- tempfile() ah.popname.file <- tempfile() cat(herring$genepop, file=ah.genepop.file, sep="\n") cat(herring$popname, file=ah.popname.file, sep=" ") # Data load popdata <- read.GENEPOP(ah.genepop.file, ah.popname.file) # Pop-specific FST and correlation among populations fst.popsp <- pop_specificFST(popdata, cov=TRUE) cov.fst.popsp <- fst.popsp$cov sd.fst.popsp <- sqrt(diag(cov.fst.popsp)) cov2.fst.popsp <- apply(cov.fst.popsp, 2, function(x){x / sd.fst.popsp}) cor.fst.popsp <- apply(cov2.fst.popsp, 1, function(x){x / sd.fst.popsp}) # Pop-pairwise FST and population structure fst.poppair <- pop_pairwiseFST(popdata) fst.md <- cmdscale(fst.poppair) # GLS analysis of FST and environmental factors test.data <- data.frame(fst=fst.md[,1], herring$environment) GLS(fst~., scale(test.data), omega=cor.fst.popsp)
An example of a genetic data for Atlantic herring population (Limborg et al. 2012). It contains genotypic information of 281 SNPs from 18 subpopulations of 607 individuals. GENEPOP format (Rousset 2008) text file is available. Subpopulation names, environmental factors (longitude, latitude, temperature and salinity) at each subpopulation are attached.
data("herring")
data("herring")
$ genepop : Genotypic information of 281 SNPs in GENEPOP format text data.
$ popname : Names of subpopulations.
$ environment : Table of temperature and salinity at each subpopulation.
Limborg MT, Helyar SJ, de Bruyn M et al. (2012) Environmental selection on transcriptome-derived SNPs in a high gene flow marine fish, the Atlantic herring (Clupea harengus). Molecular Ecology, 21, 3686-3703.
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
data(herring) ah.genepop.file <- tempfile() ah.popname.file <- tempfile() cat(herring$genepop, file=ah.genepop.file, sep="\n") cat(herring$popname, file=ah.popname.file, sep=" ") # See two text files in temporary directory. # ah.genepop.file : GENEPOP format file of 281SNPs in 18 subpopulations # ah.popname.file : plain text file of subpopulation names print(herring$environment)
data(herring) ah.genepop.file <- tempfile() ah.popname.file <- tempfile() cat(herring$genepop, file=ah.genepop.file, sep="\n") cat(herring$popname, file=ah.popname.file, sep=" ") # See two text files in temporary directory. # ah.genepop.file : GENEPOP format file of 281SNPs in 18 subpopulations # ah.popname.file : plain text file of subpopulation names print(herring$environment)
An example of a genetic data for a Japanese Spanish mackerel population (Nakajima et al. 2014). It contains genotypic information of 5 microsatellite markers from 8 subpopulations of 715 individuals. GENEPOP format (Rousset 2008) text files are available. Name list of subpopulations also is attached.
data("jsmackerel")
data("jsmackerel")
$ MS.genepop: Genotypic information of 5 microsatellites in GENEPOP format text data.
$ popname: Names of subpopulations.
Nakajima K et al. (2014) Genetic effects of marine stock enhancement: a case study based on the highly piscivorous Japanese Spanish mackerel. Canadian Journal of Fisheries and Aquatic Sciences, 71, 301-314.
Kitada S, Kitakado T, Kishino H (2007) Empirical Bayes inference of pairwise FST and its distribution in the genome. Genetics, 177, 861-873.
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # See two text files in temporary directory. # jsm.ms.genepop.file : GENEPOP format file of microsatellite data # jsm.popname.file : plain text file of subpopulation names
data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # See two text files in temporary directory. # jsm.ms.genepop.file : GENEPOP format file of microsatellite data # jsm.popname.file : plain text file of subpopulation names
This function estimates locus-specific global FST among subpopulations using empirical Bayes method (Kitada et al. 2007, 2017) from a GENEPOP data object (Rousset 2008). Missing genotype values in the GENEPOP file ("0000" or "000000") are simply ignored.
locus_specificFST(popdata)
locus_specificFST(popdata)
popdata |
Population data object created by |
Estimated locus-specific global FST.
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Kitada S, Kitakado T, Kishino H (2007) Empirical Bayes inference of pairwise FST and its distribution in the genome. Genetics, 177, 861-873.
Kitada S, Nakamichi R, Kishino H (2017) The empirical Bayes estimators of fine-scale population structure in high gene flow species. Mol. Ecol. Resources, DOI: 10.1111/1755-0998.12663
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # FST estimation locspFST <- locus_specificFST(popdata) print(locspFST)
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # FST estimation locspFST <- locus_specificFST(popdata) print(locspFST)
This function estimates genome-wide poppulation-paiwise FST among subpopulations based on Nei and Chesser's corrected GST (Nei&Chesser 1983) from a GENEPOP data object (Rousset 2008). Missing genotype values in the GENEPOP file ("0000" or "000000") are simply ignored.
pop_pairwiseFST(popdata)
pop_pairwiseFST(popdata)
popdata |
Population data object created by |
Estimated genome-wide population-pairwise FST.
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Nei M, Chesser RK (1983) Estimation of fixation indices and gene diversity. Annals of Human Genetics, 47, 253-259.
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
read.GENEPOP
,
as.dist
, as.dendrogram
,
hclust
, cmdscale
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # GST estimation result.poppairFST <- pop_pairwiseFST(popdata) poppairFST.d <- as.dist(result.poppairFST) print(poppairFST.d) # dendrogram poppairFST.hc <- hclust(poppairFST.d,method="average") plot(as.dendrogram(poppairFST.hc), xlab="",ylab="",main="", las=1) # MDS plot mds <- cmdscale(poppairFST.d) plot(mds, type="n", xlab="",ylab="") text(mds[,1],mds[,2], popdata$pop_names)
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # GST estimation result.poppairFST <- pop_pairwiseFST(popdata) poppairFST.d <- as.dist(result.poppairFST) print(poppairFST.d) # dendrogram poppairFST.hc <- hclust(poppairFST.d,method="average") plot(as.dendrogram(poppairFST.hc), xlab="",ylab="",main="", las=1) # MDS plot mds <- cmdscale(poppairFST.d) plot(mds, type="n", xlab="",ylab="") text(mds[,1],mds[,2], popdata$pop_names)
This function estimates genome-wide poppulation-specific FST based on Weir and Goudet's Method (Weir&Goudet 2017) from a GENEPOP data object (Rousset 2008). Missing genotype values in the GENEPOP file ("0000" or "000000") are simply ignored.
pop_specificFST(popdata, cov = FALSE)
pop_specificFST(popdata, cov = FALSE)
popdata |
Population data object created by |
cov |
A Logical argument indicating whether variance-covariance matrix of estimated FST should be calculated. |
fst |
Estimated genome-wide population-specific FST and standard error. |
cov |
Variance-covariance matrix of estimated FST |
Reiichiro Nakamichi, Shuichi Kitada, Hirohisa Kishino
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
Weir BS, Goudet J (2017) A Unified Characterization of Population Structure and Relatedness. Genetics, 206, 2085-2103.
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # FST estimation result.popspFST <- pop_specificFST(popdata) print(result.popspFST$fst)
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Data load # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # FST estimation result.popspFST <- pop_specificFST(popdata) print(result.popspFST$fst)
This function reads a GENEPOP format file (Rousset 2008) and parse it into an R data object. This data object provides a summary of genotype/haplotype of each sample, allele frequency in each population, and marker status. This data object is used in downstream analysis of this package.
read.GENEPOP(genepop, popname = NULL)
read.GENEPOP(genepop, popname = NULL)
genepop |
A character value specifying the name of the GENEPOP file to be analyzed. |
popname |
A character value specifying the name of the plain text file containing the names of subpopulations to be analyzed. This text file must not contain other than subpopulation names. The names must be separated by spaces, tabs or line breaks. If this argument is omitted, serial numbers will be assigned as subpopulation names. |
num_pop |
Number of subpopulations. |
pop_sizes |
Number of samples in each subpopulation. |
pop_names |
Names of subpopulations. |
ind_names |
Names of samples in each subpopulation. |
num_loci |
Number of loci. |
loci_names |
Names of loci. |
num_allele |
Number of alleles at each locus. |
allele_list |
A list of alleles at each locus. |
ind_count |
Observed count of genotyped samples in each subpopulation at each locus. |
allele_count |
Observed count of genotyped alleles in each subpopulation at each locus. |
allele_freq |
Observed allele frequencies in each subpopulation at each locus. |
genotype |
Genotypes of each sample at each locus in haploid designation. |
call_rate_loci |
Call rate of each locus (rate of genotyped samples at each locus). |
call_rate_ind |
Call rate of each sample (rate of genotyped markers for each sample). |
He |
Expected heterozigosity in each subpopulation. |
Ho |
Observed heterozigosity in each subpopulation. |
Reiichiro Nakamichi
Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Resources, 8, 103-106.
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Read GENEPOP file with subpopulation names. # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # Read GENEPOP file without subpopulation names. popdata.noname <- read.GENEPOP(genepop=jsm.ms.genepop.file)
# Example of GENEPOP file data(jsmackerel) jsm.ms.genepop.file <- tempfile() jsm.popname.file <- tempfile() cat(jsmackerel$MS.genepop, file=jsm.ms.genepop.file, sep="\n") cat(jsmackerel$popname, file=jsm.popname.file, sep=" ") # Read GENEPOP file with subpopulation names. # Prepare your GENEPOP file and population name file in the working directory. # Replace "jsm.ms.genepop.file" and "jsm.popname.file" by your file names. popdata <- read.GENEPOP(genepop=jsm.ms.genepop.file, popname=jsm.popname.file) # Read GENEPOP file without subpopulation names. popdata.noname <- read.GENEPOP(genepop=jsm.ms.genepop.file)