Title: | Statistical Tests and Graphics for Hardy-Weinberg Equilibrium |
---|---|
Description: | Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related JSS paper <doi:10.18637/jss.v064.i03>. |
Authors: | Jan Graffelman [aut, cre], Christopher Chang [ctb], Xavi Puig [ctb], Jan Wigginton [ctb], Leonardo Ortoleva [ctb], William R. Engels [ctb] |
Maintainer: | Jan Graffelman <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.7.8 |
Built: | 2024-11-03 06:37:06 UTC |
Source: | CRAN |
Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related paper (Graffelman, 2015).
Package: | HardyWeinberg |
Type: | Package |
Version: | 1.7.7 |
Date: | 2024-03-31 |
License: | GPL Version 2 or later. |
With function HWTernaryPlot
one can create ternary plots with
acceptance regions for HWE. Several routines implement statistical tests
for HWE such as HWChisq
, HWExact
, HWLratio
and
HWPerm
. Bayesian procedures are available using
HWPosterior
. Aikaike information criterion for various scenarios
can be calculated with HWAIC
. Power computations are possible
with HWPower
.
Jan Graffelman
Maintainer: Jan Graffelman <[email protected]>
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates, Massachusetts.
Graffelman, J. and Morales, J. (2008) Graphical Tests for Hardy-Weinberg Equilibrium Based on the Ternary Plot. Human Heredity 65(2):77-84.
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6): 558-568. doi:10.1038/hdy.2016.20 Nature Publishing Group.
Hardy, G.H. (1908) Mendelian Proportions in a Mixed Population. Science 28(706): 49-50.
Weinberg, W. (1908) On the Demonstration of Heredity in Man. In: Papers on Human Genetics. S. H. Boyer (Ed.) Englewood Cliffs, NJ:Prentice Hall, translated, 1963.
library(HardyWeinberg) # draw random SNPs from a population that is in HWE set.seed(123) m <- 100 # number of markers n <- 100 # sample size X <- HWData(n,m) out <- HWTernaryPlot(X,100,region=1,vertex.cex=2,signifcolour=TRUE)
library(HardyWeinberg) # draw random SNPs from a population that is in HWE set.seed(123) m <- 100 # number of markers n <- 100 # sample size X <- HWData(n,m) out <- HWTernaryPlot(X,100,region=1,vertex.cex=2,signifcolour=TRUE)
Function af
computes the allele frequency for a vector with autosomal or X-chromosomal genotype counts or compositions.
af(x)
af(x)
x |
a vector or matrix with counts or compositions. |
Function af
calculates the A allele frequency for the given order of autosomal genotype counts (AA,AB,BB) or the given order of
X chromosomal genotype counts (A,B,AA,AB,BB). The genotypes must be supplied this order. Use maf
to calculate the minor allele
frequency and more flexible coding.
a vector with allele frequencies
Jan Graffelman ([email protected])
# # A simulated random marker # X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) X <- X/sum(X) print(X) print(af(X)) # # MN blood group counts # x <- c(MM=298,MN=489,NN=213) # # Calculate frequency M allele # af(x) # # Calculate frequency N allele # af(rev(x)) 1 - af(x) # # Calculated allele frequencies (p) for a matrix of genotype counts. # X <- HWData(nm=10,n=100) X p <- af(X) p
# # A simulated random marker # X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) X <- X/sum(X) print(X) print(af(X)) # # MN blood group counts # x <- c(MM=298,MN=489,NN=213) # # Calculate frequency M allele # af(x) # # Calculate frequency N allele # af(rev(x)) 1 - af(x) # # Calculated allele frequencies (p) for a matrix of genotype counts. # X <- HWData(nm=10,n=100) X p <- af(X) p
Function AFtest
tests equality of allele frequencies for males
and females for bi-allelic marker data by means of a Fisher exact test.
AFtest(x, verbose = TRUE, ...)
AFtest(x, verbose = TRUE, ...)
x |
a vector containg the genotypic counts c(A,B,AA,AB,BB) for a bi-allelic X-chromosomal markers. |
verbose |
verbose = TRUE prints results, verbose = FALSE is silent. |
... |
additional arguments for function |
Function AFtest
constructs the contingency table of sex by
allele, and call fisher.test
to test for equality of allele
frequencies. The test assumes Hardy-Weinberg equilibrium.
AC |
Two-way table of sex by allele |
pval |
p-value of the test |
Jan Graffelman [email protected]
rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) AFtest(rs5968922)
rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) AFtest(rs5968922)
Function afx
computes the allele frequency for a vector or matrix of X-chromosomal genotype counts or frequencies.
afx(x)
afx(x)
x |
a vector or matrix with X-chromosomal counts or compositions. |
Function afx
calculates the A allele frequency for the given order of X-chromosomal genotype counts (A,B,AA,AB,BB). The genotypes must be supplied this order. Use maf
to calculate the minor allele frequency and more flexible coding.
a vector with allele frequencies
Jan Graffelman ([email protected])
# # A simulated random X-chromosomal marker # X <- as.vector(rmultinom(1,100,c(0.3,0.2,0.1,0.15,0.25))) X <- X/sum(X) print(X) print(afx(X)) # # # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) afx(rs5968922) # # # y <- c(C=337, G=129, CC=277, CG=209, GG=48) afx(y) # # Calculated allele frequencies (p) for a matrix of X-chromosomal genotype counts. # X <- HWData(nm=10,n=1000,x.linked=TRUE) X p <- afx(X) p
# # A simulated random X-chromosomal marker # X <- as.vector(rmultinom(1,100,c(0.3,0.2,0.1,0.15,0.25))) X <- X/sum(X) print(X) print(afx(X)) # # # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) afx(rs5968922) # # # y <- c(C=337, G=129, CC=277, CG=209, GG=48) afx(y) # # Calculated allele frequencies (p) for a matrix of X-chromosomal genotype counts. # X <- HWData(nm=10,n=1000,x.linked=TRUE) X p <- afx(X) p
Function agcounts
determines sample size, minor are major allele counts,
allele counts in females, numbers of males and females and allele
frequencies for a vector of genotypes counts of an X-chromosomal markers.
agcounts(x, verbose = FALSE)
agcounts(x, verbose = FALSE)
x |
a vector of X-chromosomal genotype counts (A,B,AA,AB,BB) |
verbose |
print the counts if ( |
n |
sample size |
nA |
number of A alleles |
nB |
number of B alleles |
nf |
number of females |
nm |
number of males |
nAf |
number of A alleles in females |
nBf |
number of B alleles in females |
nt |
total number of alleles |
fAA |
number of AA females |
fAB |
number of AB females |
fBB |
number of BB females |
pA |
overall A allele frequency |
pB |
overall B allele frequency |
Jan Graffelman [email protected]
mac
, link{maf}
rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) counts <- agcounts(rs5968922)
rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) counts <- agcounts(rs5968922)
Function alleles
extracts the names of the alleles from a named
genotype vector.
alleles(x, fromlabels = TRUE)
alleles(x, fromlabels = TRUE)
x |
A named or unnamed genotype vector (e.g. c(AA=10,AB=20,BB=5)) |
fromlabels |
extract genotypes from the labels of the vector elements, or from the vector elements themselves. |
A character vector with the alleles
Jan Graffelman ([email protected])
x <- c(AA=10,AG=10,GG=10,AT=5) als.x <- alleles(x) print(als.x) y <- rep(names(x),x) als.y <- alleles(y,fromlabels = FALSE) print(als.y)
x <- c(AA=10,AG=10,GG=10,AT=5) als.x <- alleles(x) print(als.x) y <- rep(names(x),x) als.y <- alleles(y,fromlabels = FALSE) print(als.y)
AllelesToTriangular
constructs a lower triangular matrix of genotype counts from one or two vectors of alleles. It is particularly useful to create genotype counts for microsatellite data (STRs).
AllelesToTriangular(A1, A2 = NULL, given=NULL)
AllelesToTriangular(A1, A2 = NULL, given=NULL)
A1 |
The first allele of each individual, or a vector with all alleles, two consecutive ones for each individual. |
A2 |
The second allele of each individual (optional). |
given |
A vector of known alleles (optional). This argument can be used to specify alleles that may not exist in the data. |
If the data is a single column vector with two succesive alleles for each individual, then specify A1
only. If data consists of two columns, each holding one allele of each individual, then specify A1
and A2
. Typical STR data that comes in the format of two repeat lengths for a set of individuals can be transformed into a lower triangular matrix with genotype counts. See the examples below.
A lower triangular matrix with genotype counts.
Jan Graffelman [email protected]
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GM <- AllelesToTriangular(A1,A2) print(GM)
data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GM <- AllelesToTriangular(A1,A2) print(GM)
The dataframe contains the genotype frequencies MM, Mm and mm for the 70 SNPs for both cases and controls. The data are taken from table 7.11 in Laird & Lange.
data(Alzheimer)
data(Alzheimer)
A data frame containing 70 observations.
Laird, N. M. and Lange, C. Table 7.11, p. 124
Laird, N. M. and Lange, C. (2011) The fundamentals of modern statistical genetics. Springer.
Matrix CEUchr22
contains 10000 single nucleotide polymorphisms sampled from chromosome 22 of sample of
99 Utah residents with Northern and Western European ancestry.
data("CEUchr22")
data("CEUchr22")
Matrix
The polymorphisms are coded in (0,1,2) format representing the count of the reference allele.
https://www.internationalgenome.org/
The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526(7571), pp. 68–74.
data(CEUchr22) str(CEUchr22)
data(CEUchr22) str(CEUchr22)
Function dgraffelmanweir
calculate the probability P(NAB=nab and
MA=ma|NA=na) for a bi-allelic X-chromosomal variant.
dgraffelmanweir.bi(x, y)
dgraffelmanweir.bi(x, y)
x |
vector with male genotype counts (A,B) |
y |
vector with female genotype counts (AA,AB,BB) |
a single real number
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20
males <- c(A=392, B=212) females <- c(AA=275, AB=296, BB=80) prob <- dgraffelmanweir.bi(males,females) print(prob)
males <- c(A=392, B=212) females <- c(AA=275, AB=296, BB=80) prob <- dgraffelmanweir.bi(males,females) print(prob)
Function dlevene
calculates Levene's exact density for a diploid
system with k alleles.
dlevene(N)
dlevene(N)
N |
A lower triangular matrix with genotype counts |
The supplied matrix of genotype counts should be triangular, with the homozygote counts on the diagonal, and all heterozygote counts below the diagonal.
a single real number
Jan Graffelman ([email protected])
Levene, H. (1949) On a matching problem arising in genetics. Annals of Mathematical Statistics, 20, pp. 91-94.
x <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0) x <- toTriangular(x) prob <- dlevene(x) print(prob)
x <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0) x <- toTriangular(x) prob <- dlevene(x) print(prob)
Program dlevene.bi
calculates Levene's density (P(AB|A)) for a
bi-allelic variant.
dlevene.bi(x)
dlevene.bi(x)
x |
a vector of genotype counts (AA,AB,BB) |
a single real number
Jan Graffelman ([email protected])
Levene, H. (1949) On a matching problem arising in genetics. Annals of Mathematical Statistics, 20, pp. 91-94.
x <- c(AA=298,AB=489,BB=213) prob <- dlevene.bi(x) print(prob)
x <- c(AA=298,AB=489,BB=213) prob <- dlevene.bi(x) print(prob)
EAFExact
uses a Fisher Exact test to compare allele frequencies
in males and females for variants with k alleles (k >= 2).
EAFExact(m, f, verbose = TRUE, ...)
EAFExact(m, f, verbose = TRUE, ...)
m |
vector or triangular matrix with male genotype counts |
f |
vector or triangular matrix with female genotype counts |
verbose |
print output (TRUE) or not (FALSE) |
... |
additional arguments for |
For bi-allelic autosomal variants the genotype counts can be supplied as vectors ((AA,AB,BB) for males, and (AA,AB,BB) for females). For X-chromosomal bi-allelic variants the genotype counts can also supplied as vectors ((A,B) for males, and (AA,AB,BB) for females). For multi-allelic autosomal variants male and genotype counts can be supplied as vectors (AA,AB,AC,BB,BC,CC,...) or as a triangular matrix, where matrix rows and colums are labelled with the allele name (A,B,C,...). For multi-allelic X-chromosomal variants, male genotype counts must be supplied as a vector (A,B,C,...) and female genotype counts must be supplied as a triangular matrix. See the examples below.
pval |
p-value |
tab |
table with allele counts |
Jan Graffelman [email protected]
# # bi-allelic autosomal # m <- c(AA=60,AB=96,BB=44) f <- c(AA=44,AB=97,BB=59) EAFtest <- EAFExact(m,f) # # bi-allelic X-chromosomal # males <- c(A=392, B=212) females <- c(AA=275, AB=296, BB=80) EAFtest <- EAFExact(males,females,verbose=TRUE) # # tri-allelic autosomal # males <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) females <- c(AA=28,AB=55,AC=33,BB=18,BC=50,CC=16) EAFtest <- EAFExact(males,females,verbose=TRUE) # # tri-allelic X-chromosomal # males <- c(A=15,B=17,C=24) females <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) EAFtest <- EAFExact(males,females,verbose=TRUE)
# # bi-allelic autosomal # m <- c(AA=60,AB=96,BB=44) f <- c(AA=44,AB=97,BB=59) EAFtest <- EAFExact(m,f) # # bi-allelic X-chromosomal # males <- c(A=392, B=212) females <- c(AA=275, AB=296, BB=80) EAFtest <- EAFExact(males,females,verbose=TRUE) # # tri-allelic autosomal # males <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) females <- c(AA=28,AB=55,AC=33,BB=18,BC=50,CC=16) EAFtest <- EAFExact(males,females,verbose=TRUE) # # tri-allelic X-chromosomal # males <- c(A=15,B=17,C=24) females <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) EAFtest <- EAFExact(males,females,verbose=TRUE)
Calculates Fisher's z transformation for a correlation coefficient
fisherz(r)
fisherz(r)
r |
a correlation coefficient |
a real number
Jan Graffelman ([email protected])
r <- 0.5 print(fisherz(r))
r <- 0.5 print(fisherz(r))
The function fold
sums corresponding below and above diagonal elements of a square matrix to form a triangular matrix.
fold(X, lower = TRUE)
fold(X, lower = TRUE)
X |
a square matrix |
lower |
logical. If |
Useful for constructing triangular matrices of genotype counts
A matrix
Jan Graffelman [email protected]
allelenames <- paste("A",11:13,sep="") allele1 <- factor(c("A11","A11","A12","A12","A13","A12"),levels=allelenames) allele2 <- factor(c("A11","A12","A12","A13","A13","A11"),levels=allelenames) GC <- table(allele1,allele2) GC <- as.matrix(unclass(GC)) GCf <- fold(GC)
allelenames <- paste("A",11:13,sep="") allele1 <- factor(c("A11","A11","A12","A12","A13","A12"),levels=allelenames) allele2 <- factor(c("A11","A12","A12","A13","A13","A11"),levels=allelenames) GC <- table(allele1,allele2) GC <- as.matrix(unclass(GC)) GCf <- fold(GC)
GenerateSamples
generates all possible genotypic compositions
(AA,AB,BB) for a given sample size n
.
GenerateSamples(n = 5)
GenerateSamples(n = 5)
n |
the desired sample size |
returns a matrix with in each row a possible genotypic compostion for the given sample size.
Jan Graffelman [email protected]
GenerateSamples(5)
GenerateSamples(5)
Function genlabels
sets the names of a vector or matrix of
genotype counts.
genlabels(X)
genlabels(X)
X |
a 3 (or 5) element vector with genotype counts, a matrix of genotype counts (3 or 5 columns) |
A vector or a matrix
Jan Graffelman ([email protected])
x <- c(25,50,25) x <- genlabels(x)
x <- c(25,50,25) x <- genlabels(x)
Biallelic glyoxalase genotype data for 17 populations from India
data("Glyoxalase")
data("Glyoxalase")
A data frame with 17 observations on the following 3 variables.
AA
number of homozygote AA individuals
AB
number of heterozygote AB individuals
BB
number of homozygote BB individuals
Olson, J.M. (1993) Table 3.
Olson, J.M. (1993) Testing the Hardy-Weinberg law across strata. Annals of Human Genetics, 57, pp. 291-295.
data(Glyoxalase)
data(Glyoxalase)
The dataframe contains the genotype frequencies in generic notation, AA, AB and BB the first 225 polymorphic SNPs without missing data on chromosome 1 of the Han Chinese in Beijing. The data are compiled from the HapMap project, phase 3.2, containing genotype information of 84 individuals.
data(HapMapCHBChr1)
data(HapMapCHBChr1)
A matrix containing 225 rows and 3 columns (AA, AB, BB).
http://hapmap.ncbi.nlm.nih.gov/
The International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs Nature 449, pp. 851–861.
Function he
calculates the expected heterozygosity for a biallelic genetic variant.
he(x, bias.correct = TRUE)
he(x, bias.correct = TRUE)
x |
a vector or matrix with genotype counts. |
bias.correct |
if |
x
can be a vector of genotype counts for a single marker (AA,AB,BB) or a three-column matrix of
genotype counts for multiple markers.
a vector of expected heterozygosities.
Jan Graffelman ([email protected])
# # He for a single marker # x <- c(MM=298,MN=489,NN=213) he(x) # # He for a matrix of rmarkers # set.seed(123) X <- HWData(10,100) he(X) he(X,bias.correct = FALSE)
# # He for a single marker # x <- c(MM=298,MN=489,NN=213) he(x) # # He for a matrix of rmarkers # set.seed(123) X <- HWData(10,100) he(X) he(X,bias.correct = FALSE)
Function HWABO
takes four genotype counts ("A","B","AB","OO") and
estimates the three allele frequencies using the EM algorithm.
HWABO(x, p = c(1/3, 1/3, 1/3), maxiter = 50, tol = 1e-10, verbose = TRUE)
HWABO(x, p = c(1/3, 1/3, 1/3), maxiter = 50, tol = 1e-10, verbose = TRUE)
x |
a vector with genotype counts ("A","B","AB","OO"). |
p |
a vector with initial allele frequencies (by default (1/3,1/3,1/3)). |
maxiter |
maximum number of iterations. |
tol |
tolerance for convergence, 1e-10 by default |
verbose |
print iteration history or not. |
pn |
vector with estimated allele frequencies. |
It.hist |
iteration history with log-likelihood. |
expected |
expected genotype counts under HWE. |
Jan Graffelman [email protected]
x <- c(fA=182,fB=60,nAB=17,nOO=176) al.fre <- HWABO(x) #al2 <- HWABO(x,p=c(0.99,0.01,0.01),maxiter=25) #al3 <- HWABO(x,p=c(0.01,0.99,0.01),maxiter=25) #al4 <- HWABO(x,p=c(0.01,0.01,0.99),maxiter=25)
x <- c(fA=182,fB=60,nAB=17,nOO=176) al.fre <- HWABO(x) #al2 <- HWABO(x,p=c(0.99,0.01,0.01),maxiter=25) #al3 <- HWABO(x,p=c(0.01,0.99,0.01),maxiter=25) #al4 <- HWABO(x,p=c(0.01,0.01,0.99),maxiter=25)
Function HWAIC
calculates Akaike's Information Criterion for
ten different models that describe a bi-allelic genetic variant: M11:
Hardy-Weinberg proportions and equality of allele frequencies in the
sexes (HWP & EAF); M12: EAF and HWP in males only; M13: EAF and HWP in
females only; M14: EAF and equality of inbreeding coefficients in
the sexes (EIC); M15: EAF only; M21: HWP in both sexes; M22: HWP for
males only; M23: HWP for females only; M24: EIC only; M25: None of the previous.
HWAIC(x, y, tracing = 0, tol = 0.000001)
HWAIC(x, y, tracing = 0, tol = 0.000001)
x |
Male genotype counts (AA,AB,BB) |
y |
Female genotype counts (AA,AB,BB) |
tracing |
Activate tracing in the maximization of some likelihoods (0=no tracing; 1:tracing) |
tol |
tolerance for iterative maximization of some likelihoods |
The log-likelihood for the six models is calculated. For two models (C
and E) this is done numerically using package RSolnp
.
A named vector containing 6 values for AIC
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2018) On the testing of Hardy-Weinberg proportions and equality of allele frequencies in males and females at bi-allelic genetic markers. Genetic Epidemiology 42(1) pp. 34-48. doi:10.1002/gepi.22079
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) stats <- HWAIC(males,females) print(stats)
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) stats <- HWAIC(males,females) print(stats)
HWAlltests
performs all classical frequentists tests for
Hardy-Weinberg equilibrium and lists their p-values.
HWAlltests(x, verbose = TRUE, include.permutation.test = FALSE, x.linked = FALSE)
HWAlltests(x, verbose = TRUE, include.permutation.test = FALSE, x.linked = FALSE)
x |
a vector with a set of genotype counts (AA, AB, BB) |
verbose |
print output if set to TRUE |
include.permutation.test |
turns on the permutation test if set to TRUE |
x.linked |
|
By default the permutation test is not performed in order to reduce computing time.
A dataframe with test statistics and p-values.
Jan Graffelman [email protected]
x <- c(298,489,213) names(x) <- c("MM","MN","NN") HWAlltests(x,verbose=TRUE)
x <- c(298,489,213) names(x) <- c("MM","MN","NN") HWAlltests(x,verbose=TRUE)
HWAlr
computes the additive log-ratio transformation for
genotype counts of bi-allelic genetic markers.
HWAlr(X, zeroadj = 0.5, denominator = 2)
HWAlr(X, zeroadj = 0.5, denominator = 2)
X |
A matrix of genotype counts (columns AA, AB and BB) |
zeroadj |
A zero adjustment parameter (0.5 by default) |
denominator |
The genotype count put in the denominator of the log-ratio (1=AA, 2=AB, 3=BB) |
A matrix or vector of log-ratio coordinates
Jan Graffelman ([email protected])
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215
X <- HWData(100,100) Y <- HWAlr(X)
X <- HWData(100,100) Y <- HWAlr(X)
HWAlrPlot
creates a scatter plot of the log-ratio coordinates of
bi-allelic genetic markers. Hardy-Weinberg equilibrium is indicated by
a straigh line in the plot.
HWAlrPlot(X, zeroadj = 0.5)
HWAlrPlot(X, zeroadj = 0.5)
X |
A matrix of genotype counts (columns AA, AB, BB) |
zeroadj |
Zero-adjustment parameter. Zero counts in the count
matrix are substituted by |
NULL
Jan Graffelman ([email protected])
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215
X <- HWClo(HWData(100,100)) HWAlrPlot(X)
X <- HWClo(HWData(100,100)) HWAlrPlot(X)
HWChisq
performs the chi-square test for Hardy-Weinberg
equilibrium both for autosomal and X-chromosomal markers; it can dealt both with bi-allelic and multi-allelic variants.
HWChisq(X, cc = 0.5, verbose = TRUE, x.linked = FALSE, phifixed = NULL)
HWChisq(X, cc = 0.5, verbose = TRUE, x.linked = FALSE, phifixed = NULL)
X |
For bi-allelic variants, |
cc |
|
verbose |
|
x.linked |
|
phifixed |
(For X-chromosomal markers only)
|
HWChisq
does a chi-square test for Hardy-Weinberg equilibrium,
and by default applies a continuity correction. For extreme allele
frequencies, the continuity correction can lead to excessive type 1
error rates, and is better turned off in that case. The continuity
correction can be turned off by specifying cc=0
.
HWChisq
can do the chi-square test for both autosomal and
X-chrosomal markers. By setting x.linked = TRUE
the marker
will be assumed to be on the X-chromosome, and the count vector
supplied should have 5 elements instead of 3 elements for an
autosomal marker. For X-chromsomal markers argument phifixed
is in general best left to its default value (NULL
). Only in
specific situations where the theoretical population sex ratio is known (e.g. in
simulation studies where a universe with known gender ratio is
sampled) phifixed
could be set to the theoretical ratio of interest.
With bi-allelic variants, when alternative
is set to less
, a one-sided test for
against a negative inbreeding coefficient (heterozygote excess) is
performed. When alternative
is set to greater
a one-sided test for
against a positive inbreeding coefficient (lack of heterozygotes) is
performed.
For multi-allelic variants, which typically do have some rare alleles and rare genotypes, the asymptotic chi-square test is in
general not recommended, and exact test procedures or permutation tests are recommended (see HWExact
or HWPerm.mult
).
HWChisq
returns a list with the components:
chisq |
value of the chi-square statistic. NA is returned if the marker is monomorphic. |
pval |
p-value of the chi-square test for Hardy-Weinberg equilibrium. |
D |
Half the deviation from Hardy-Weinberg equilibrium for the AB genotype. |
p |
the allele frequency of A. |
f |
the inbreeding coefficient. |
expected |
the expected counts under Hardy-Weinberg equilibrium. |
chi.contrib |
the contributions of the different genotypes to the chi-square statistic. |
Jan Graffelman [email protected]
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.
For the chi-square test for X-linked markers:
Graffelman, J. & Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20
# # Test for an autosomal blood group marker # x <- c(MM=298,MN=489,NN=213) HW.test <- HWChisq(x,verbose=TRUE) # # Same test without continuity correction # HW.test <- HWChisq(x,cc=0,verbose=TRUE) # # Test for X-chromsomal SNPs. # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWChisq(rs5968922,cc=0,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWChisq(y,x.linked=TRUE,cc=0) # # Test a multi-allelic microsatellite # data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GC <- AllelesToTriangular(A1,A2) HW.test <- HWChisq(GC) # # retaining only the three most common alleles # ii <- (A1 == 10 | A1 == 11 | A1 == 12) & (A2 == 10 | A2 == 11 | A2 == 12) A1s <- A1[ii] A2s <- A2[ii] GC <- AllelesToTriangular(A1s,A2s) HW.test <- HWChisq(GC)
# # Test for an autosomal blood group marker # x <- c(MM=298,MN=489,NN=213) HW.test <- HWChisq(x,verbose=TRUE) # # Same test without continuity correction # HW.test <- HWChisq(x,cc=0,verbose=TRUE) # # Test for X-chromsomal SNPs. # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWChisq(rs5968922,cc=0,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWChisq(y,x.linked=TRUE,cc=0) # # Test a multi-allelic microsatellite # data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GC <- AllelesToTriangular(A1,A2) HW.test <- HWChisq(GC) # # retaining only the three most common alleles # ii <- (A1 == 10 | A1 == 11 | A1 == 12) & (A2 == 10 | A2 == 11 | A2 == 12) A1s <- A1[ii] A2s <- A2[ii] GC <- AllelesToTriangular(A1s,A2s) HW.test <- HWChisq(GC)
Function HWChisqMat
executes the Chisquare test for HWE for each row in a matrix. This function is
deprecated and it is better to use the faster HWChisqStats
instead.
HWChisqMat(X, ...)
HWChisqMat(X, ...)
X |
A n times 3 matrix of genotypic counts (AA,AB,BB) |
... |
extra arguments that are passed on to |
a vector with chi-square statistics or p-values
Jan Graffelman [email protected]
X <- HWData(100,100) colnames(X) <- c("MM","MN","NN") Results <- HWChisqMat(X)
X <- HWData(100,100) colnames(X) <- c("MM","MN","NN") Results <- HWChisqMat(X)
HWChisqStats
is a function for the fast computation of chi-square
statistics (or the corresponding p-values) for a large set of bi-allelic markers (typically SNPs).
HWChisqStats(X, x.linked = FALSE, pvalues = FALSE)
HWChisqStats(X, x.linked = FALSE, pvalues = FALSE)
X |
A matrix with genotype counts, one row per marker. |
x.linked |
Logical indicating whether the markers are autosomal ( |
pvalues |
Logical indicated whether chi-square statistics should be returned ( |
Matrix X
should strictly comply with the following format. For
an autosomal dataset is should contain the 3 genotype counts in order
(AA,AB,BB). For an X-chromosomal dataset it should contain the 5
genotype counts in order (A,B,AA,AB,BB) where A and B are the male
counts and AA, AB and BB the female counts.
This function was written for speed improvement, and should be much
faster than looping over the rows of X
with HWChisq
. There
is no error checking on the supplied data matrix.
A vector of chi-square statistics
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at 2 bi-allelic genetic markers on the X chromosome.
# # Autosomal example # set.seed(123) X <- HWData(1000,100) monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0) X <- X[!monom,] # exclude monomorphics Chisq.stats <- HWChisqStats(X,x.linked=FALSE,pvalues=FALSE) Chisq.pvals <- HWChisqStats(X,x.linked=FALSE,pvalues=TRUE) # # X-chromosomal example # X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE) Chisq.stats <- HWChisqStats(X,x.linked=TRUE,pvalues=FALSE) Chisq.pvals <- HWChisqStats(X,x.linked=TRUE,pvalues=TRUE)
# # Autosomal example # set.seed(123) X <- HWData(1000,100) monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0) X <- X[!monom,] # exclude monomorphics Chisq.stats <- HWChisqStats(X,x.linked=FALSE,pvalues=FALSE) Chisq.pvals <- HWChisqStats(X,x.linked=FALSE,pvalues=TRUE) # # X-chromosomal example # X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE) Chisq.stats <- HWChisqStats(X,x.linked=TRUE,pvalues=FALSE) Chisq.pvals <- HWChisqStats(X,x.linked=TRUE,pvalues=TRUE)
Function HWClo
divides each row of a matrix by its total, and so
produces matrix of compositions.
HWClo(X)
HWClo(X)
X |
A matrix of (genotype) counts |
A matrix
Jan Graffelman [email protected]
X <- HWData(2,100) Y <- HWClo(X)
X <- HWData(2,100) Y <- HWClo(X)
HWClr
computes the centred log-ratio transformation for
genotype counts of bi-allelic genetic markers.
HWClr(X, zeroadj = 0.5)
HWClr(X, zeroadj = 0.5)
X |
A matrix of genotype counts (columns AA, AB and BB) |
zeroadj |
A zero adjustment parameter (0.5 by default) |
A matrix or vector of log-ratio coordinates
Jan Graffelman ([email protected])
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215
X <- HWData(100,100) Y <- HWClr(X)
X <- HWData(100,100) Y <- HWClr(X)
HWClrPlot
creates a scatter plot of the centred log-ratio coordinates of
bi-allelic genetic markers. Hardy-Weinberg equilibrium is indicated by
a straigh line in the plot.
HWClrPlot(X, zeroadj = 0.5)
HWClrPlot(X, zeroadj = 0.5)
X |
A matrix of genotype counts (columns AA, AB, BB) |
zeroadj |
Zero-adjustment parameter. Zero counts in the count
matrix are substituted by |
NULL
Jan Graffelman ([email protected])
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215
X <- HWClo(HWData(100,100)) HWClrPlot(X)
X <- HWClo(HWData(100,100)) HWClrPlot(X)
Computes the probability of a particular genotypic sample given the allele count, sample size and number of heterozygotes.
HWCondProbAB(n, nA, nAB)
HWCondProbAB(n, nA, nAB)
n |
|
nA |
|
nAB |
|
p |
probability of the particular sample |
Jan Graffelman ([email protected])
x <- c(298,489,213) names(x) <- c("MM","MN","NN") n <- sum(x) nM <- 2*x[1]+x[2] nMN <- x[2] p <- HWCondProbAB(n,nM,nMN)
x <- c(298,489,213) names(x) <- c("MM","MN","NN") n <- sum(x) nM <- 2*x[1]+x[2] nMN <- x[2] p <- HWCondProbAB(n,nM,nMN)
Function HWD
computes Weir's disequilibrium coefficient D.
HWD(X)
HWD(X)
X |
a vector of genotype counts (AA, AB, BB) |
Returns the disequilibrium coefficient
Jan Graffelman [email protected]
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.
x <- c(MM=298,MN=489,NN=213) D <- HWD(x) cat("Disequilibrium coefficient: ",D,"\n")
x <- c(MM=298,MN=489,NN=213) D <- HWD(x) cat("Disequilibrium coefficient: ",D,"\n")
HWData generates samples of genotypic counts under various schemes. It mainly uses sampling from the multinomial distribution for given or random allele frequencies, either assuming Hardy-Weinberg proportions or a specified degree of inbreeding. Sampling can also be performed conditional on the allele frequency. The same procedures are also available for X linked markers.
HWData(nm = 100, n = rep(100, nm), f = rep(0, nm), p = NULL, conditional = FALSE, exactequilibrium = FALSE, x.linked = FALSE, nA = NULL, n.males = round(0.5 * n), shape1 = 1, shape2 = 1, counts = TRUE)
HWData(nm = 100, n = rep(100, nm), f = rep(0, nm), p = NULL, conditional = FALSE, exactequilibrium = FALSE, x.linked = FALSE, nA = NULL, n.males = round(0.5 * n), shape1 = 1, shape2 = 1, counts = TRUE)
nm |
The number of bi-allelic markers. |
n |
The sample sizes. |
f |
The inbreeding coefficients (only for autosomal markers) |
p |
a vector of allele frequencies |
conditional |
if |
exactequilibrium |
generates data in exact HWE if set to
|
x.linked |
Simulated autosomal markers ( |
nA |
A vector of minor allele counts, one for each marker. If not
specified, it will be calculated from |
n.males |
The number of males (only relevant if |
shape1 |
First shape parameter of the beta distribution used to generate allele frequencies |
shape2 |
Second shape parameter of the beta distribution used to generate allele frequencies |
counts |
If |
Option pfixed
is deprecated and replaced by conditional
Option pdist
is deprecated and replaced by parameters shape1
and shape2
HWData
returns a matrix of genotype counts, nm
by 3 for
autsomal markers or nm
by 5 for X-chromosomal markers.
If the inbreeding coefficient is specified (f
) it will only
take effect for autosomal markers (x.linked=FALSE
) and
multinomial sampling (conditional=FALSE
).
X |
A matrix containing the genotype counts. |
Jan Graffelman ([email protected])
# # Generate 100 SNPs with uniform allele frequency under the equilibrium assumption. # out <- HWData(nm=100,n=100) # # Generate genotype frequencies of 100 SNPs with uniform allele frequency assuming exact equilbrium. # X <- HWData(nm=100, exactequilibrium = TRUE, counts = FALSE) # # Generate 100 SNPs (as counts), all having an expected A allele frequency of 0.50 # X <- HWData(nm=100,p=0.5) # # Generate 100 SNPs, 50 with A allele frequency 0.25 and 50 with A allele frequency 0.75, # assuming fixed allele frequencies. # X <- HWData(nm=100,p=rep(c(0.25,0.75),50), conditional = TRUE) # # Generate 100 SNPs with a skewed (beta) distribution of the allele frequency, # rich in variants with a low minor allele frequency. # X <- HWData(nm=100,shape1=1,shape2=10) # # Generate 100 X chromosomal SNPs with uniformly distributed allele frequency. # X <- HWData(nm=100, x.linked = TRUE)
# # Generate 100 SNPs with uniform allele frequency under the equilibrium assumption. # out <- HWData(nm=100,n=100) # # Generate genotype frequencies of 100 SNPs with uniform allele frequency assuming exact equilbrium. # X <- HWData(nm=100, exactequilibrium = TRUE, counts = FALSE) # # Generate 100 SNPs (as counts), all having an expected A allele frequency of 0.50 # X <- HWData(nm=100,p=0.5) # # Generate 100 SNPs, 50 with A allele frequency 0.25 and 50 with A allele frequency 0.75, # assuming fixed allele frequencies. # X <- HWData(nm=100,p=rep(c(0.25,0.75),50), conditional = TRUE) # # Generate 100 SNPs with a skewed (beta) distribution of the allele frequency, # rich in variants with a low minor allele frequency. # X <- HWData(nm=100,shape1=1,shape2=10) # # Generate 100 X chromosomal SNPs with uniformly distributed allele frequency. # X <- HWData(nm=100, x.linked = TRUE)
Function HWEM
estimates the relative contributions of two populations with different allele frequencies that contributed
to the genotype frequencies of sample which is a mixture of these two populations.
HWEM(x, p = NULL, G = NULL, delta.init = c(0.5, 0.5), itmax = 50, eps = 1e-06, verbose = FALSE)
HWEM(x, p = NULL, G = NULL, delta.init = c(0.5, 0.5), itmax = 50, eps = 1e-06, verbose = FALSE)
x |
a 3x1 vector of genotype frequencies of the mixed sample, either counts or proportions. |
p |
a vector with two allele frequencies. |
G |
a 3x2 matrix with genotype frequencies as columns. |
delta.init |
initial values for the contributions of each populations |
itmax |
maximum number of iterations |
eps |
tolerance criterion for convergence |
verbose |
|
HWEM
employs the EM algorithm to estimate the contribution of two populations to the genotype frequencies of a bi-allelic
marker, assumed to be typed for a mixed sample with individuals coming from two different populations with different allele
frequencies. To estimate the contributions, both the mixed sample genotype frequencies (x) and either the genotype frequencies (G)
or the allele frequencies (p) of the original populations must be specified. In case only allele frequencies are specified,
Hardy-Weinberg proportions are assumed for the genotype frequencies.
a vector with two proportions, ordered according to the specified allele or genotype frequencies.
Jan Graffelman [email protected]
Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1) pp. 1–38.
# genotype frequencies population 1 g1 <- c(0.034, 0.330, 0.636) # genotype frequencies population 2 g2 <- c(0.349, 0.493, 0.158) # sample from the mixed population x <- c(0.270, 0.453,0.277) # # estimation based on genotype frequencies # G <- cbind(g1,g2) contributions <- HWEM(x,G=G) # # estimation based on allele frequencies # p <- c(af(g1),af(g2)) contributions <- HWEM(x,p=p)
# genotype frequencies population 1 g1 <- c(0.034, 0.330, 0.636) # genotype frequencies population 2 g2 <- c(0.349, 0.493, 0.158) # sample from the mixed population x <- c(0.270, 0.453,0.277) # # estimation based on genotype frequencies # G <- cbind(g1,g2) contributions <- HWEM(x,G=G) # # estimation based on allele frequencies # p <- c(af(g1),af(g2)) contributions <- HWEM(x,p=p)
HWExact
performs an exact test for Hardy-Weinberg equilibrium
HWExact(X, alternative = "two.sided", pvaluetype = "selome", eps=1e-10, x.linked = FALSE, verbose = TRUE)
HWExact(X, alternative = "two.sided", pvaluetype = "selome", eps=1e-10, x.linked = FALSE, verbose = TRUE)
X |
vector with the genotype counts AA, AB, BB |
alternative |
|
pvaluetype |
if |
x.linked |
|
eps |
a tolerance that can be set for comparing probabilities in order to include tied outcomes |
verbose |
print results or not. |
HWExact
uses the recursion equations described by Wigginton
et. al.
For testing large sets of bi-allelic variants, use the faster code in HWExactStats
.
For large samples, HWExact
may give the error message:
"evaluation nested too deeply: infinite recursion". This can usually
be resolved by increasing R's limit on nested expressions with
options(expressions=10000)
or a higher limit. With higher
limits, the error message "protect(): protection stack overflow" can
occur. This error can usually be resolved by increasing R's
protection stack with the command line option
--max-ppsize 100000
or higer values. However, with such large
samples the exact test will give virtually the same result as a
chi-square test, and it may be easier to use HWChisq
in these
circumstances.
pval |
p-value of the exact test |
prob |
probabilities of all possible samples with the same sample size and minor allele count |
pofthesample |
probability of the observed sample |
Jan Graffelman ([email protected])
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.
Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium, American Journal of Human Genetics (76) pp. 887-893.
Graffelman, J. and Moreno, V. (2013) The mid p-value in exact tests for Hardy-Weinberg equilibrium, Statistical Applications in Genetics and Molecular Biology 12(4) pp. 433-448.
HWLratio
, HWChisq
, HWExactStats
# # Example for an autosomal marker using the standard exact p-value # x <- c(298,489,213) names(x) <- c("MM","MN","NN") HW.test <- HWExact(x,verbose=TRUE) # # Example for an autosomal marker using the mid p-value # HW.test <- HWExact(x,verbose=TRUE,pvaluetype="midp") # # Example x-linked markers # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80 ) HWExact(rs5968922,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWExact(y,x.linked=TRUE) # #
# # Example for an autosomal marker using the standard exact p-value # x <- c(298,489,213) names(x) <- c("MM","MN","NN") HW.test <- HWExact(x,verbose=TRUE) # # Example for an autosomal marker using the mid p-value # HW.test <- HWExact(x,verbose=TRUE,pvaluetype="midp") # # Example x-linked markers # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80 ) HWExact(rs5968922,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWExact(y,x.linked=TRUE) # #
Function HWExactMat
is deprecated; use HWExactStats
instead.
HWExactMat(X, ...)
HWExactMat(X, ...)
X |
A n times 3 matrix of genotypic counts (AA,AB,BB) |
... |
extra arguments that are passed on to |
pvalvec |
Vector with the p-values of each test |
Jan Graffelman [email protected]
X <- HWData(100,100) colnames(X) <- c("MM","MN","NN") Results <- HWExactMat(X) Output <- cbind(X,Results) print(Output)
X <- HWData(100,100) colnames(X) <- c("MM","MN","NN") Results <- HWExactMat(X) Output <- cbind(X,Results) print(Output)
HWExactPrevious
performs an exact test for Hardy-Weinberg equilibrium
HWExactPrevious(X, alternative = "two.sided", pvaluetype = "selome", x.linked = FALSE, verbose = FALSE)
HWExactPrevious(X, alternative = "two.sided", pvaluetype = "selome", x.linked = FALSE, verbose = FALSE)
X |
vector with the genotype counts AA, AB, BB |
alternative |
|
pvaluetype |
if |
x.linked |
|
verbose |
print results or not. |
HWExactPrevious
uses the recursion equations described by Wigginton
et. al.
For large samples, HWExactPrevious
may give the error message:
"evaluation nested too deeply: infinite recursion". This can usually
be resolved by increasing R's limit on nested expressions with
options(expressions=10000)
or a higher limit. With higher
limits, the error message "protect(): protection stack overflow" can
occur. This error can usually be resolved by increasing R's
protection stack with the command line option
--max-ppsize 100000
or higer values. However, with such large
samples the exact test will give virtually the same result as a
chi-square test, and it may be easier to use HWChisq
in these
circumstances.
pval |
p-value of the exact test |
prob |
probabilities of all possible samples with the same sample size and minor allele count |
pofthesample |
probability of the observed sample |
Jan Graffelman ([email protected])
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.
Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium, American Journal of Human Genetics (76) pp. 887-893.
# # Example autosomal marker # x <- c(298,489,213) names(x) <- c("MM","MN","NN") ## Not run: HW.test <- HWExactPrevious(x,verbose=TRUE) # # Example x-linked marker # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80 ) ## Not run: HWExactPrevious(rs5968922,x.linked=TRUE,verbose=TRUE)
# # Example autosomal marker # x <- c(298,489,213) names(x) <- c("MM","MN","NN") ## Not run: HW.test <- HWExactPrevious(x,verbose=TRUE) # # Example x-linked marker # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80 ) ## Not run: HWExactPrevious(rs5968922,x.linked=TRUE,verbose=TRUE)
HWExactStats
is a function for the computation of Exact p-values
for a large set of bi-allelic markers (typically SNPs).
HWExactStats(X, x.linked = FALSE, plinkcode = TRUE, midp = FALSE,...)
HWExactStats(X, x.linked = FALSE, plinkcode = TRUE, midp = FALSE,...)
X |
A matrix with genotype counts, one row per marker. |
x.linked |
Logical indicating whether the markers are autosomal ( |
plinkcode |
Logical indicating whether to use faster C++ code from the PLINK software. |
midp |
Logical indicating whether to use the mid p-value for the C++ code or not |
... |
extra arguments that are passed on to |
Matrix X
should strictly comply with the following format. For
an autosomal dataset is should contain the 3 genotype counts in order
(AA,AB,BB). For an X-chromosomal dataset it should contain the 5
genotype counts in order (A,B,AA,AB,BB) where A and B are the male
counts and AA, AB and BB the female counts.
Argument plinkcode=TRUE
(the default) will use C++ code for faster calculation (functions SNPHWE2
and
SNPHWEX
) with larger datasets. The C++ code was generously shared by Christopher
Chang, and the same code is used in the program
PLINK (2.0).
A vector of p-values
Jan Graffelman [email protected] (R code) and Christopher Chang [email protected] (C++ code)
Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20
Purcell et al. (2007) PLINK: A Toolset for Whole-Genome Association and Population-Based Linkage Analysis. American Journal of Human Genetics 81(3) pp. 559–575.
# # Autosomal example # set.seed(123) X <- HWData(1000,100) monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0) X <- X[!monom,] # exclude monomorphics Exact.pvalues <- HWExactStats(X,x.linked=FALSE) # # X-chromosomal example # X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE) Exact.pvalues <- HWExactStats(X,x.linked=TRUE)
# # Autosomal example # set.seed(123) X <- HWData(1000,100) monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0) X <- X[!monom,] # exclude monomorphics Exact.pvalues <- HWExactStats(X,x.linked=FALSE) # # X-chromosomal example # X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE) Exact.pvalues <- HWExactStats(X,x.linked=TRUE)
HWf
computes the inbreeding coefficient for sample
of genotype counts, or a matrix of genotype counts.
HWf(X)
HWf(X)
X |
a vector or matrix of genotype counts (AA, AB, BB) |
For monomorphic markers a warning is issued, and the estimate for the inbreeding coefficient is NaN.
Returns a single inbreeding coefficient (intraclass correlation coefficient), if X
is a single sample, or a vector of inbreeding coefficients, if X
is a matrix with genotype counts.
Jan Graffelman [email protected]
Crow, J. F. and Kimura, M. (1970) An introduction to population genetics theory. Harper & Row, publishers, New York
# # A single sample # x <- c(MM=298,MN=489,NN=213) fhat <- HWf(x) fhat # # Multiple samples # set.seed(123) X <- HWData(nm=100,n=1000) fhat <- HWf(X)
# # A single sample # x <- c(MM=298,MN=489,NN=213) fhat <- HWf(x) fhat # # Multiple samples # set.seed(123) X <- HWData(nm=100,n=1000) fhat <- HWf(X)
HWGenotypePlot
makes a scatterplots of the AB or BB frequency
versus the AA frequency and represents a blue curve indicating the
Hardy-Weinberg equilibrium condition.
HWGenotypePlot(X, plottype = 1, xlab = expression(f[AA]), ylab = ifelse(plottype == 1, expression(f[AB]), expression(f[BB])), asp = 1, pch = 19, xlim = c(0, 1), ylim = c(0, 1), cex = 1, cex.axis = 2, cex.lab = 2, ...)
HWGenotypePlot(X, plottype = 1, xlab = expression(f[AA]), ylab = ifelse(plottype == 1, expression(f[AB]), expression(f[BB])), asp = 1, pch = 19, xlim = c(0, 1), ylim = c(0, 1), cex = 1, cex.axis = 2, cex.lab = 2, ...)
X |
A matrix of genotype counts or frequencies with three columns (AA, AB, BB) |
plottype |
|
xlab |
A label for the x axis |
ylab |
A label for the y axis |
asp |
Aspec ratio (1 by default) |
pch |
Plotting charachter (19 by default) |
xlim |
Limits for the x axis (0-1 by default) |
ylim |
Limits for the y axis (0-1 by default) |
cex |
Character expansion factor (1 by default) |
cex.axis |
Character expansion factor for the axes (2 by default) |
cex.lab |
Character expansion factor for labels of axis (2 by default) |
... |
Additional arguments for the |
NULL
Jan Graffelman [email protected]
n <- 100 # sample size m <- 100 # number of markers Xc <- HWClo(HWData(n,m)) HWGenotypePlot(Xc,plottype=1,main="Heterozygote-homozygote scatterplot")
n <- 100 # sample size m <- 100 # number of markers Xc <- HWClo(HWData(n,m)) HWGenotypePlot(Xc,plottype=1,main="Heterozygote-homozygote scatterplot")
HWIlr
computes isometric log ratio coordinates for genotypic
compositions (AA, AB, BB)
HWIlr(X, zeroadj = 0.5)
HWIlr(X, zeroadj = 0.5)
X |
A matrix of genotype counts, markers in rows, counts for AA, AB and BB in three columns |
zeroadj |
Adjustment for zeros (0.5 by defaults) |
A matrix of log ratio coordinates.
Jan Graffelman ([email protected])
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barcelo-Vidal, C. (2003) Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology 35(3), pp. 279-300.
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215
X <- HWData(100,100) Y <- HWIlr(X)
X <- HWData(100,100) Y <- HWIlr(X)
HWIlrPlot
makes a scatter plot of the isometric log ratio
coordinates for bia-llelic markers.
HWIlrPlot(X, zeroadj = 0.5, ...)
HWIlrPlot(X, zeroadj = 0.5, ...)
X |
Matrix of genotype counts, one marker per row, AA, AB and BB in three columns |
zeroadj |
Adjustment for zero values (0.5 by default) |
... |
Additional arguments for function |
A matrix of log ratio coordinates.
Jan Graffelman ([email protected])
Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a nonparametric compositional approach. In Pawlowsky-Glahn, V. and Buccianti A., editors, Compositional Data Analysis: Theory and Applications, pages 208-217, John Wiley & Sons, Ltd.
X <- HWClo(HWData(100,100)) HWIlrPlot(X)
X <- HWClo(HWData(100,100)) HWIlrPlot(X)
Function HWLindley
calculates the posterior density for disequilibrium measure alpha, as defined by Lindley (1988).
HWLindley(alphaseq = seq(-3, 3, by = 0.01), x)
HWLindley(alphaseq = seq(-3, 3, by = 0.01), x)
alphaseq |
a single value or a sequence of values for alpha |
x |
the genotype count vector in format (AA,AB,BB) |
Numerical integration is used to compute the density.
a vector with values of the density for each value in alphaseq
Jan Graffelman [email protected]
Lindley, D.V. (1988) Statistical Inference Concerning Hardy-Weinberg Equilibrium. In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V. and Smith, A.F.M. Bayesian Statistics, 3, pp. 307-326. Oxford University Press.
x <- c(MM=298,MN=489,NN=213) post.dens <- HWLindley(seq(-1,1,by=0.01),x) ## Not run: plot(seq(-1,1,by=0.01),post.dens,type="l",xlab=expression(alpha), ylab=expression(pi(alpha))) segments(0,0,0,HWLindley(0,x),lty="dotted",col="red") ## End(Not run)
x <- c(MM=298,MN=489,NN=213) post.dens <- HWLindley(seq(-1,1,by=0.01),x) ## Not run: plot(seq(-1,1,by=0.01),post.dens,type="l",xlab=expression(alpha), ylab=expression(pi(alpha))) segments(0,0,0,HWLindley(0,x),lty="dotted",col="red") ## End(Not run)
Function HWLindley.cri
calculates a Bayesian credible interval using Lindley's
posterior density for equilibrium paramater alpha.
HWLindley.cri(x, verbose = TRUE, limits = c(0.025, 0.975))
HWLindley.cri(x, verbose = TRUE, limits = c(0.025, 0.975))
x |
a vector of three genotype counts in order (AA,AB,BB). |
verbose |
print output ( |
limits |
upper and lower probability limits of the interval |
The limits are found by numerical integration over Lindley's density.
a vector with the lower and upper limit of the credible interval
Jan Graffelman [email protected]
Lindley, D.V. (1988) Statistical Inference Concerning Hardy-Weinberg Equilibrium. In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V. and Smith, A.F.M. Bayesian Statistics, 3, pp. 307-326. Oxford University Press.
Graffelman, J. (2020) Statistical tests for the Hardy-Weinberg equilibrium. Wiley StatsRef: Statistics Reference Online doi:10.1002/9781118445112.stat08274.
# # MN blood group data # x <- c(MM=298,MN=489,NN=213) # # credible interval of 95% # HWLindley.cri(x) # # credible interval of 90% # HWLindley.cri(x,limits=c(0.05,0.95))
# # MN blood group data # x <- c(MM=298,MN=489,NN=213) # # credible interval of 95% # HWLindley.cri(x) # # credible interval of 90% # HWLindley.cri(x,limits=c(0.05,0.95))
Function HWLRAllTests
performs a set of likelihood ratio tests in
relation with Hardy-Weinberg proportions (HWP) and equality of allele
frequencies (EAF) for autosomal bi-allelic genetic variants.
HWLRAllTests(x, y)
HWLRAllTests(x, y)
x |
Male genotype counts (AA,AB,BB) |
y |
Female genotype counts (AA,AB,BB) |
Function HWLRAllTests
calls HWLRtest
and calculates the
p-value of six different tests: 1) joint HWP and EAF (A-F); 2) EAF
irrespective of HWP (C-F); 3) HWP irrespective of EAF (D-F); 4) HWP
versus EIC (given EAF) (A-B); 5) EIC irrespective of EAF (E-F) and 6)
HWP versus EIC. Letters refer to scenarios described by Graffelman &
Weir (2018).
A named vector with six p-values
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2018) On the testing of Hardy-Weinberg proportions and equality of allele frequencies in males and females at bi-allelic genetic markers. Genetic Epidemiology 42(1): pp. 34–48. doi:10.1002/gepi.22079
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) pvalues <- HWLRAllTests(males,females) print(pvalues)
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) pvalues <- HWLRAllTests(males,females) print(pvalues)
HWLratio
performs the Likelihood ratio test for Hardy Weinberg
equilibrium, both for autosomal and X-chromosomal markers.
HWLratio(X, verbose = TRUE, x.linked = FALSE)
HWLratio(X, verbose = TRUE, x.linked = FALSE)
X |
|
verbose |
|
x.linked |
|
HWLratio
returns a list with the components:
Lambda |
the likelihood ratio |
G2 |
-2*log(Lambda) |
pval |
the p-value |
Jan Graffelman [email protected]
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter 3.
x <- c(298,489,213) names(x) <- c("MM","MN","NN") HW.test <- HWLratio(x,verbose=TRUE) # # Test for X-chromsomal SNPs. # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWLratio(rs5968922,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWLratio(y,x.linked=TRUE) # #
x <- c(298,489,213) names(x) <- c("MM","MN","NN") HW.test <- HWLratio(x,verbose=TRUE) # # Test for X-chromsomal SNPs. # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWLratio(rs5968922,x.linked=TRUE,verbose=TRUE) # # # y <- c(GG=48, CG=209, CC=277, G=129, C=337) HWLratio(y,x.linked=TRUE) # #
Program HWLRtest
performs a likelihood ratio test comparing two
scenarios for an autosomal bi-allelic genetic variant. The scenarios
concern Hardy-Weinberg proportions (HWP) and equality of allele
frequencies (EAF) in both sexes. The different scenarios are described
by Graffelman & Weir (2017).
HWLRtest(x, y, scene.null = "S1", scene.alt = "S6", verbose = TRUE, tracing = 0)
HWLRtest(x, y, scene.null = "S1", scene.alt = "S6", verbose = TRUE, tracing = 0)
x |
Male genotype counts |
y |
Female genotype counts |
scene.null |
Scenario under the null hypothesis (E.g. "S1") |
scene.alt |
Scenario under the alternative hypothesis (E.g. "S6") |
verbose |
print output or not |
tracing |
Show tracing of the numeric likelihood maximization (1) or not (0). |
The different scenarios are indicated with S1, S2, S3, S4, S6 and S6. S1 refers to Hardy-Weinber proportions and equality of allele frequencies. S2 refers to equality of allele frequencies and equality of inbreeding coefficients for the two sexes. S3 refers to equality of allele frequencies irrespective of HWP. S4 refers to HWP irrespective of allele frequencies. S5 refers to equality of inbreeding coefficients irrespective of allele frequencies. S6 is unrestricted.
G2 |
Likelihood ratio statistic |
df |
Degrees of freedom of the likelihood ratio statistic |
pval |
p-value |
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2018) On the testing of Hardy-Weinberg proportions and equality of allele frequencies in males and females at bi-allelic genetic markers. Genetic Epidemiology 42(1) pp. 34–48 doi:10.1002/gepi.22079
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) # # test EAF # lr1.out <- HWLRtest(males,females,scene.null="S3",scene.alt="S6") # # test EIC given EAF # lr2.out <- HWLRtest(males,females,scene.null="S2",scene.alt="S3") # # test HWP versus EIC, given EAF. # lr3.out <- HWLRtest(males,females,scene.null="S1",scene.alt="S2")
males <- c(AA=11,AB=32,BB=13) females <- c(AA=14,AB=23,BB=11) # # test EAF # lr1.out <- HWLRtest(males,females,scene.null="S3",scene.alt="S6") # # test EIC given EAF # lr2.out <- HWLRtest(males,females,scene.null="S2",scene.alt="S3") # # test HWP versus EIC, given EAF. # lr3.out <- HWLRtest(males,females,scene.null="S1",scene.alt="S2")
Function HWMissing
imputes missing genotype data with a
multinomial logit model that uses information from allele intensities
and/or neighbouring markers. Multiple imputation algorithms
implemented in the Mice package are used to obtain imputed data sets.
Inference for HWE is carried out by estimating the inbreeding
coefficient or exact p-values for each imputed data set, and
by combining all estimates
using Rubin's pooling rules.
HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05, varest = "oneovern", statistic = "chisquare", alternative = "two.sided", ...)
HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05, varest = "oneovern", statistic = "chisquare", alternative = "two.sided", ...)
X |
An input data frame. By default, the first column should contain the SNP with missing values. |
imputecolumn |
Indicates which column of the supplied data frame
is to be imputed (by default, the first colum, |
m |
The number of imputations (50 by default) |
coding |
Indicates how the genotype data is coded (e.g. 0 for AA, 1 for AB, and 2 for BB). |
verbose |
|
alpha |
significance level (0.05 by default) used when computing confidence intervals |
varest |
Estimator for the variance of the inbreeding
coefficient. |
statistic |
If |
alternative |
|
... |
additional options for function |
The function HWMissing
tests one genetic marker (e.g. a SNP)
with missings for HWE. By default, this marker is supposed to be the
first column of dataframe X
. The other columns of X
contain covariates to be used in the imputation model. Covariates
will typically be other, correlated markers or allele intensities of
the SNP to be imputed. Covariate markers should be coded as factor
variables whereas allele intensities should be numerical
variables. By default, a polytomous regression model will be used to
impute the missings. If the covariates also contain missings, an
imputation method for each column of X
can be specified by
using the method
of mice (see example below).
If there are no covariates, missings can be imputed under the MCAR
assumption. In that case, missings are imputed by taking a random
sample from the observed data. This is what HWMissing
will do
if no covariates are supplied, X
being a single factor
variable.
Several estimators for the variance of the inbreeding coefficient
have been described in the literature. The asymptotic variance of the
inbreeding coefficient under the null hypothesis is 1/n, and is used
if varest = "oneovern"
is used. This is the recommended
option. Alternatively, the approximation described in Weir (p. 66) can be used
with varest = "bailey"
.
Res |
A vector with the inbreeding coefficient, a confidence interval for the inbreeding coefficient, a p-value for a HWE test and missing data statistics. |
Xmat |
A matrix with the genotypic composition of each of the
|
Jan Graffelman [email protected]
Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data. Second edition, New York, John Wiley & sons.
Graffelman, J., S\'anchez, M., Cook, S. and Moreno, V. (2013) Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information. PLoS ONE 8(12): e83316. doi:10.1371/journal.pone.0083316
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
data(Markers) ## Not run: set.seed(123) Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR. set.seed(123) Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities. set.seed(123) Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs ## End(Not run)
data(Markers) ## Not run: set.seed(123) Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR. set.seed(123) Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities. set.seed(123) Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs ## End(Not run)
Program HWNetwork
implements a network algorithm for efficient calculation of exact test p-values in HWE tests with multiple alleles.
HWNetwork(a1, a2, ma = NULL, fe = NULL, gender = NULL, verbose = TRUE)
HWNetwork(a1, a2, ma = NULL, fe = NULL, gender = NULL, verbose = TRUE)
a1 |
the first allele (expressed as a number) |
a2 |
the second allele (expressed as a number; NA if the variant is X chromosomal) |
ma |
alternative format: vector of male X chromosomal allele counts. |
fe |
triangular matrix of female genotype counts |
gender |
gender of the individual (1=male; 2=female) |
verbose |
be silent ( |
Function HWNetwork
accepts data in two formats. Original genotype data (e.g. repeat numbers of microsatellites) can be supplied, or the data can be supplied in summarized form as a male genotype count vector and a female genotype count matrix. If one of the two male alleles is missing (NA) the variant will taken to be X-chromosomal. If all males have two alleles, the variant will taken to be autosomal.
the exact p-value of the test.
Jan Graffelman [email protected]
Aoki, S. (2003) Network algorithm for the Exact Test of Hardy-Weinberg Proportion for Multiple Alleles. Biometrical Journal 45(4), pp. 471-490.
Engels, W. R. (2009) Exact Tests for Hardy-Weinberg Proportions. Genetics 183, pp. 1431-1441.
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
# # From vectors with counts of genotypes # data(TSIXTriAllelics) ma <- as.matrix(TSIXTriAllelics[1,2:4]) names(ma) <- c("A","B","C") fe <- TSIXTriAllelics[1,5:10] names(fe) <- c("AA","AB","AC","BB","BC","CC") fe <- HardyWeinberg:::toTriangularfixed(fe) HWNetwork(ma=ma,fe=fe)
# # From vectors with counts of genotypes # data(TSIXTriAllelics) ma <- as.matrix(TSIXTriAllelics[1,2:4]) names(ma) <- c("A","B","C") fe <- TSIXTriAllelics[1,5:10] names(fe) <- c("AA","AB","AC","BB","BC","CC") fe <- HardyWeinberg:::toTriangularfixed(fe) HWNetwork(ma=ma,fe=fe)
Function HWPerm
does a permutation test for Hardy-Weinberg
equilibrium using a user-supplied test statistic.
HWPerm(x, nperm = 17000, verbose = TRUE, x.linked = FALSE, FUN = ifelse(x.linked,Chisquare.x,Chisquare), eps=1e-10, ...)
HWPerm(x, nperm = 17000, verbose = TRUE, x.linked = FALSE, FUN = ifelse(x.linked,Chisquare.x,Chisquare), eps=1e-10, ...)
x |
A vector of genotype counts (AA,AB,BB) |
nperm |
The number of permutations |
verbose |
|
x.linked |
|
FUN |
An function call for calculating the test statistic for HWE (see examples below) |
eps |
Tolerance for comparison of floating point numbers (1e-10 by default) |
... |
Additional parameters for the function call argument |
The set of alleles for the observed sample is permuted. Consequently, the test is conditional on allele frequency.
HWPerm
returns a list with the components:
stat |
value of the chosen test statistic for the observed sample. |
pval |
p-value of the permutation test. |
Jan Graffelman [email protected]
Ziegler, A. & Konig, I.R. (2006) A statistical approach to genetic epidemiology. Wiley.
x <- c(MM=298,MN=489,NN=213) ## Not run: # # use a default chi-square statistic # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE) # # use a chi-square statistic with continuity correction. # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE, FUN=function(z) HWChisq(z,verbose=FALSE)$chisq,cc=0.5) # # # use a likelihood ratio statistic. # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE, FUN=function(y) HWLratio(y,verbose=FALSE)$G2) # # use an exact test p-value # HWPerm(x,nperm=10000,verbose=TRUE,FUN=function(y) 1-HWExact(y,verbose=FALSE)$pval) # # # Permutation test for a marker on the X chromosome # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWPerm(rs5968922,nperm=10000,x.linked=TRUE,verbose=TRUE) ## End(Not run)
x <- c(MM=298,MN=489,NN=213) ## Not run: # # use a default chi-square statistic # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE) # # use a chi-square statistic with continuity correction. # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE, FUN=function(z) HWChisq(z,verbose=FALSE)$chisq,cc=0.5) # # # use a likelihood ratio statistic. # HW.test <- HWPerm(x,nperm=10000,verbose=TRUE, FUN=function(y) HWLratio(y,verbose=FALSE)$G2) # # use an exact test p-value # HWPerm(x,nperm=10000,verbose=TRUE,FUN=function(y) 1-HWExact(y,verbose=FALSE)$pval) # # # Permutation test for a marker on the X chromosome # rs5968922 <- c(A=392, B=212, AA=275, AB=296, BB=80) HW.test <- HWPerm(rs5968922,nperm=10000,x.linked=TRUE,verbose=TRUE) ## End(Not run)
Function HWPerm.mult
implements permutation tests for
Hardy-Weinberg equilibrium for autosomal and X-chromosomal variants.
HWPerm.mult(x, y = NULL, nperm = 17000, eps = 1e-10, verbose = TRUE, ...)
HWPerm.mult(x, y = NULL, nperm = 17000, eps = 1e-10, verbose = TRUE, ...)
x |
vector or triangular matrix with male genotype counts |
y |
vector or triangular matrix with female genotype counts |
nperm |
number of permutations (17.000 by default) |
eps |
a tolerance for the comparison of floating point numbers |
verbose |
print output or not |
... |
addtional arguments |
This function approximates exact test probabilities for joint tests
for HWE and equality of allele frequencies for variants with multiple
alleles. For purely bi-allelic variant HWPerm
can be used which
allows for more statistics than just probabilities.
If argument y
is not specified, gender is considered
irrelevant, and x
contains total genotype counts. If x
and y
are specified, x
should contain male genotype
counts and y
female genotype counts. x
and y
can
be vectors if the variant is bi-allelic, but are assumed lower
triangular if there are more than two alleles. x
is still a
vector if there are multiple alleles but the variant is
X-chromosomal. See the examples given below.
pofthesample |
probability of the observed sample |
pseudodist |
probabilities of simulated samples |
pval |
p-value |
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. Molecular Ecology Resources. 18(3) pp. 461–473. doi:10.1111/1755-0998.12748
# # bi-allelic autosomal # x1 <- c(AA=298,AB=489,BB=213) ## Not run: out <- HWPerm.mult(x1) ## End(Not run) # # bi-allelic X-chromosomal # x2.m <- c(A=39, B=21) x2.f <- toTriangular(c(AA=28, AB=30, BB=8)) ## Not run: out <- HWPerm.mult(x2.m,x2.f) ## End(Not run) # # autosomal k alleles not accounting for gender # x3 <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0) x3 <- toTriangular(x3) ## Not run: out <- HWPerm.mult(x3) ## End(Not run) # # X-chromosomal k alleles # x4.m <- c(A=15,B=17,C=24) x4.f <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) ## Not run: out <- HWPerm.mult(x4.m,x4.f) ## End(Not run) # # Autosomal k alleles accounting for gender # x5.m <- toTriangular(c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)) x5.f <- toTriangular(c(AA=8,AB=12,AC=13,BB=8,BC=7,CC=0)) ## Not run: out <- HWPerm.mult(x5.m,x5.f) ## End(Not run) # # Autosomal STR with multipe alleles # data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GenotypeCounts <- AllelesToTriangular(A1,A2) print(GenotypeCounts) ## Not run: out <- HWPerm.mult(GenotypeCounts) ## End(Not run)
# # bi-allelic autosomal # x1 <- c(AA=298,AB=489,BB=213) ## Not run: out <- HWPerm.mult(x1) ## End(Not run) # # bi-allelic X-chromosomal # x2.m <- c(A=39, B=21) x2.f <- toTriangular(c(AA=28, AB=30, BB=8)) ## Not run: out <- HWPerm.mult(x2.m,x2.f) ## End(Not run) # # autosomal k alleles not accounting for gender # x3 <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0) x3 <- toTriangular(x3) ## Not run: out <- HWPerm.mult(x3) ## End(Not run) # # X-chromosomal k alleles # x4.m <- c(A=15,B=17,C=24) x4.f <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) ## Not run: out <- HWPerm.mult(x4.m,x4.f) ## End(Not run) # # Autosomal k alleles accounting for gender # x5.m <- toTriangular(c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)) x5.f <- toTriangular(c(AA=8,AB=12,AC=13,BB=8,BC=7,CC=0)) ## Not run: out <- HWPerm.mult(x5.m,x5.f) ## End(Not run) # # Autosomal STR with multipe alleles # data(NistSTRs) A1 <- NistSTRs[,1] A2 <- NistSTRs[,2] GenotypeCounts <- AllelesToTriangular(A1,A2) print(GenotypeCounts) ## Not run: out <- HWPerm.mult(GenotypeCounts) ## End(Not run)
Function HWPosterior
calculates posterior probabilities and Bayes
factors for tests for Hardy-Weinberg equilibrium of autosomal and X-chromosomal
variants.
HWPosterior(males, females, verbose = TRUE, prior.af = c(0.5,0.5), prior.gf = c(0.333,0.333,0.333), x.linked = FALSE, precision = 0.05)
HWPosterior(males, females, verbose = TRUE, prior.af = c(0.5,0.5), prior.gf = c(0.333,0.333,0.333), x.linked = FALSE, precision = 0.05)
males |
A vector of male genotype counts. For autosomal variants, this should be a three-element named vector like (AA,AB,BB); for X-chromosomal variants it should be a two-element vector giving the counts of the hemizygous genotypes like (A,B). |
females |
A vector of female genotype counts. This should be a three-element named vector like (AA,AB,BB) |
verbose |
prints results if |
prior.af |
Beta prior parameters for male and female allele frequencies |
prior.gf |
Dirichlet prior parameters for female genotype frequencies |
x.linked |
logical indicating whether the variant is autosomal or X-chromosomal |
precision |
precision parameter for marginal likelihoods that require numeric integration |
For X-chromosomal variants, four possible models are considered, and the posterior probabilities and Bayes factors for each model are calculated.
For autosomal variants, ten possible scenarios are considered, and the posterior probabilities for all models are calculated.
In general, default Dirichlet priors are used for genotype frequencies, and beta prior are used for allele frequencies.
For X-chromosomal variants, a matrix with posterior probabilities and Bayes factors will be produced. For autosomal variants, a vector of posterior probabilities is produced.
Xavi Puig [email protected] and Jan Graffelman [email protected]
Puig, X., Ginebra, J. and Graffelman, J. (2017) A Bayesian test for Hardy-Weinberg equilibrium of bi-allelic X-chromosomal markers. Heredity 119(4):226–236. doi:10.1038/hdy.2017.30.
Puig, X., Ginebra, J. and Graffelman, J. (2019) Bayesian model selection for the study of Hardy-Weinberg proportions and homogeneity of gender allele frequencies. Heredity 123(5), pp. 549-564. doi:10.1038/s41437-019-0232-0
HWChisq
, HWExact
, HWExactStats
# # An X-chromosomal example # males <- c(A=43,B=13) females <- c(AA=26,AB=19,BB=3) out <- HWPosterior(males,females,verbose=TRUE,x.linked=TRUE) # # An autosomal example # data(JPTsnps) males <- JPTsnps[1,1:3] females <- JPTsnps[1,4:6] post.prob <- HWPosterior(males,females,x.linked=FALSE)
# # An X-chromosomal example # males <- c(A=43,B=13) females <- c(AA=26,AB=19,BB=3) out <- HWPosterior(males,females,verbose=TRUE,x.linked=TRUE) # # An autosomal example # data(JPTsnps) males <- JPTsnps[1,1:3] females <- JPTsnps[1,4:6] post.prob <- HWPosterior(males,females,x.linked=FALSE)
HWPower
is a function that computes the power of a test for
Hardy-Weinberg equilibrium.
HWPower(n = 100, nA = 100, pA = 0.5, y = c(AA=25,AB=50,BB=25), alpha = 0.05, theta = 4, f = NULL, test = "exact", alternative = "two.sided", pvaluetype = "selome", cc = 0.5)
HWPower(n = 100, nA = 100, pA = 0.5, y = c(AA=25,AB=50,BB=25), alpha = 0.05, theta = 4, f = NULL, test = "exact", alternative = "two.sided", pvaluetype = "selome", cc = 0.5)
n |
The sample size |
nA |
The minor allele count |
pA |
The minor allele frequency |
y |
A sample of genotype counts (AA,AB,BB) |
alpha |
The significance level (0.05 by default) |
theta |
The degree of disequilibrium ( |
f |
The inbreeding coefficient. Overrules |
test |
The type of test for which power is to be computed. Can be "exact" (default) or "chisq" (chi-square) |
alternative |
The nature of the alternative hypothesis ("two.sided" (default), "greater" or "less") |
pvaluetype |
The type of p-value used in an exact test ("selome", "dost" or "midp") |
cc |
Continuity correction parameter for the chi-square test (0.5 by default) |
HWPower
uses the Levene-Haldane distribution (distribution of the
number of heterzygotes given the minor allele count) for computing
power.
HWPower
can be used in three different way. In principle, the
power is calcuted on the basis of the sample size (n
) and the minor
allele count (nA
). Alternatively, the user may specifiy sample
size (n
) and minor allele frequency (pA
). Finally, power
can also be calculated directly from a sample of genotype counts. In
that case the calculated power is the power for a sample of the given
sample size and minor allele count. The three ways to use HWPower
are illustrated in the example section.
if test
= "exact" the power of the exact test is computed for
the given significance level and minor allele count.
if test
= "chisq" the power of the chi-square test is computed for
the given significance level and minor allele count.
Jan Graffelman ([email protected])
Graffelman, J. and Moreno, V. (2013) The Mid p-value in exact tests for Hardy-Weinberg proportions. Statistical Applications in Genetics and Molecular Biology 12(4): 433-448.
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
pw.chisq <- HWPower(n=100,nA=100,alpha=0.05,test="chisq",theta=16) print(pw.chisq) pw.exact <- HWPower(n=100,nA=100,alpha=0.05,test="exact",theta=16,pvaluetype="selome") print(pw.exact) pw.exact <- HWPower(n=100,nA=100) print(pw.exact) pw.exact <- HWPower(n=100,pA=0.5) print(pw.exact) pw.exact <- HWPower(y=c(AA=25,AB=50,BB=25)) print(pw.exact)
pw.chisq <- HWPower(n=100,nA=100,alpha=0.05,test="chisq",theta=16) print(pw.chisq) pw.exact <- HWPower(n=100,nA=100,alpha=0.05,test="exact",theta=16,pvaluetype="selome") print(pw.exact) pw.exact <- HWPower(n=100,nA=100) print(pw.exact) pw.exact <- HWPower(n=100,pA=0.5) print(pw.exact) pw.exact <- HWPower(y=c(AA=25,AB=50,BB=25)) print(pw.exact)
HWQqplot
creates a Q-Q plot for the p-values obtained in an Exact
test for Hardy-Weinberg equilibrium. Empirical p-values are plotted
against multiple simulated quantiles of the theoretical p-value distribution.
HWQqplot(X, nsim = 100, fit = "curve", logplot = FALSE, main = "Q-Q plot for HWE", mm = NULL, pvaluetype = "selome", ...)
HWQqplot(X, nsim = 100, fit = "curve", logplot = FALSE, main = "Q-Q plot for HWE", mm = NULL, pvaluetype = "selome", ...)
X |
Data matrix with genotype counts, one row for each sample, 3 columns |
nsim |
Number of samples drawn from the null distribution (100 by default) |
fit |
If |
logplot |
If |
main |
Title for the plot |
mm |
Maximal value for x and y axis in the plot |
pvaluetype |
Type of p-value to be used in an exact test. Can be "selome" (default), "midp" or "dost". |
... |
Any additional arguments for the |
HWQqplot
constructs a Q-Q plot of the p-values of an exact test
for Hardy-Weinberg equilibrium. Under the null, this p-value is not
uniform. HWQqplot samples from the theoretical null distribution, taking
into account that markers may vary in allele frequency and in sample
size (due to missing values). For each simulated sample a grey curve or
line is shown. A green reference line with intercept 0 and slope 1 is
also shown in the plot.
NULL
Jan Graffelman [email protected]
Rohlfs, R.V. and Weir, B.S. (2008) Distributions of Hardy-Weinberg equilibrium test statistics. Genetics 180, pp. 1609-1616.
HWTernaryPlot
, HWExact
, qqplot
## Not run: set.seed(1234) n <- 200 # sample size m <- 100 # number of markers X <- HWData(n,m) HWQqplot(X,logplot=TRUE,pvaluetype="selome",main="Q-Q Plot for HWE") ## End(Not run)
## Not run: set.seed(1234) n <- 200 # sample size m <- 100 # number of markers X <- HWData(n,m) HWQqplot(X,logplot=TRUE,pvaluetype="selome",main="Q-Q Plot for HWE") ## End(Not run)
Function HWStr
is a wrapper function arount HWPerm.mult
and HWChisq
in order to test a set of STRs for Hardy-Weinberg equilibrium.
HWStr(X, test = "permutation", verbose = FALSE, ...)
HWStr(X, test = "permutation", verbose = FALSE, ...)
X |
A data matrix with STRs in columns, with there two alleles coded in successive columns. |
test |
|
verbose |
be silent if set to |
... |
possible extra parameters to be passed on to |
It is recommended to test by a permutation test. By default, 17.000 permutations are used. Exact testing is not implemented. Missing genotypes are excluded on a per STR basis.
A data frame with the
strnames |
STR name |
N |
sample size |
Nt |
number of alleles |
MinorAF |
minor allele frequency |
MajorAF |
major allele frequency |
Ho |
observed heterozygosity |
He |
expected heterozygosity (without bias correction) |
Hp |
Shannon index of allele frequencies |
pval |
p-value of the HWE test |
Jan Graffelman [email protected]
data(NistSTRs) ## Not run: Results <- HWStr(NistSTRs,test="permutation") ## End(Not run)
data(NistSTRs) ## Not run: Results <- HWStr(NistSTRs,test="permutation") ## End(Not run)
Function HWStrata
implements Olson's asymptotic test for HWE for a stratified sample of
single biallelic polymorphism.
HWStrata(X, verbose = TRUE)
HWStrata(X, verbose = TRUE)
X |
A three-column matrix of genotype counts (e.g. with columns AA, AB, BB) |
verbose |
print output if |
See the references for the related homogeneity assumption.
T2 |
The test statistic |
pval |
The p-value |
Jan Graffelman [email protected]
Olson J.M. (1993) Testing the Hardy-Weinberg law across strata. Annals of Human Genetics 57(4):291-295.
Olson, J.M. and Foley, M. (1996) Testing for homogeneity of Hardy-Weinberg disequilibrium using data sampled from several populations. Biometrics 52(3) pp. 971-979.
# # Test across strata # data("Glyoxalase") Glyoxalase <- as.matrix(Glyoxalase) HWStrata(Glyoxalase) # # Stratified exact testing, testing each sample # HWExactStats(Glyoxalase)
# # Test across strata # data("Glyoxalase") Glyoxalase <- as.matrix(Glyoxalase) HWStrata(Glyoxalase) # # Stratified exact testing, testing each sample # HWExactStats(Glyoxalase)
HWTernaryPlot
is a routine that draws a ternary plot for three-way genotypic compositions (AA,AB,BB), and represents
the acceptance region for different tests for Hardy-Weinberg equilibrium (HWE) in the plot. This allows for graphical
testing of a large set of markers (e.g. SNPs) for HWE. The (non) significance of the test
for HWE can be inferred from the position of the marker in the ternary plot. Different statistical tests for HWE
can be done graphically with this routine: the ordinary chisquare test, the chisquare test with continuity
correction and the Haldane's exact test.
HWTernaryPlot(X, n = NA, addmarkers = TRUE, newframe = TRUE, hwcurve = TRUE, vbounds = FALSE, mafbounds = FALSE, mafvalue = 0.05, axis = 0, region = 1, vertexlab = colnames(X), alpha = 0.05, vertex.cex = 1, pch = 19, cc = 0.5, markercol = "black", markerbgcol = "black", cex = 0.75, axislab = "", verbose = FALSE, markerlab = NULL, markerpos = NULL, mcex = 1, connect = FALSE, curvecols = rep("black",5), signifcolour = TRUE, patternsigsymbol = 19, curtyp = "solid", ssf = "max", pvaluetype = "selome", grid = FALSE, gridlabels = TRUE, patternramp = FALSE, axisticklabels = FALSE, ...)
HWTernaryPlot(X, n = NA, addmarkers = TRUE, newframe = TRUE, hwcurve = TRUE, vbounds = FALSE, mafbounds = FALSE, mafvalue = 0.05, axis = 0, region = 1, vertexlab = colnames(X), alpha = 0.05, vertex.cex = 1, pch = 19, cc = 0.5, markercol = "black", markerbgcol = "black", cex = 0.75, axislab = "", verbose = FALSE, markerlab = NULL, markerpos = NULL, mcex = 1, connect = FALSE, curvecols = rep("black",5), signifcolour = TRUE, patternsigsymbol = 19, curtyp = "solid", ssf = "max", pvaluetype = "selome", grid = FALSE, gridlabels = TRUE, patternramp = FALSE, axisticklabels = FALSE, ...)
X |
a matrix of |
n |
the samples size (for a complete composition with no missing data). |
addmarkers |
represent markers by dots in the triangle ( |
newframe |
allows for plotting additional markers in an already existing ternary plot. Overplotting
is achieved by setting |
hwcurve |
draw the HW parabola in the plot ( |
vbounds |
indicate the area corresponding to expected counts > 5 ( |
mafbounds |
indicate the area corresponding to MAF < |
mafvalue |
a critical value for the minor allele frequency (MAF). |
axis |
draw a vertex axis |
region |
the type of acceptance region to be delimited in the triangle |
vertexlab |
labels for the three vertices of the triangle |
alpha |
significance level (0.05 by default) |
vertex.cex |
character expansion factor for the labels of the vertices of the triangle. |
pch |
the plotting character used to represent the markers. |
cc |
value for the continuity correction parameter (0.5 by default). |
markercol |
vector with colours for the marker points in the triangle. |
markerbgcol |
vector with background colours for the marker points in the triangle. |
cex |
expansion factor for the marker points in the triangle. |
axislab |
a label to be put under the horizontal axis. |
verbose |
print information on the numerically found cut-points between curves of the acceptance region and the edges of the triangle. |
markerlab |
labels for the markers in the triangle. |
markerpos |
positions for the marker labels in the triangle (1,2,3 or 4). |
mcex |
character expansion factor for the labels of the markers in the ternary plot. |
connect |
connect the represented markers by a line in the ternary plot. |
curvecols |
a vector with four colour specifications for the different curves that can be used
to delimit the HW acceptance region. E.g. |
signifcolour |
colour the marker points automatically according to the result of a signifance test
(green markers non-siginficant, red markers significant).
|
patternsigsymbol |
plotting character used to represent significant makers when |
curtyp |
style of the drawn curves ( |
ssf |
sample size function ( |
pvaluetype |
method to compute p-values in an exact test
( |
grid |
draw a reference grid for genotype frequencies at (0.2,0.4,0.6,0.8) |
gridlabels |
if |
patternramp |
paint a colour ramp for patterns of genotype frequencies. This will ignore |
axisticklabels |
place numeric labels for allele frequencies on the A-B axis. |
... |
other arguments passed on to the plot function (e.g. |
HWTernaryPlot
automatically colours significant markers in
red, and non-significant markers in green if region
is set to
1, 2 or 7.
minp |
minimum allele frequency above which testing for HWE is appropriate (expected counts exceeding 5). |
maxp |
maximum allele frequency below which testing for HWE is appropriate. |
inrange |
number of markers in the appropriate range. |
percinrange |
percentage of markers in the appropriate. |
nsignif |
number of significant markers (only if |
Jan Graffelman [email protected]
Graffelman, J. and Morales, J. (2008) Graphical Tests for Hardy-Weinberg Equilibrium Based on the Ternary Plot. Human Heredity 65(2):77-84.
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
# # Ternary plot with 1000 SNPs and HWE curve # X <- HWData(nm=1000,n=100) HWTernaryPlot(X,100,region=0,vertex.cex=2,pch=1) # # Genotype frequency pattern of simulated SNPs with uniform # allele frequency distribution # X <- HWData(nm=1000,n=100) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of simulated SNPs with skewed # allele frequency distribution # X <- HWData(nm=1000,n=100,shape1=1,shape2=20) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.50 # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.25 # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.25, using a triangle to mark siginficant markers # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE, region=1, patternsigsymbol = 2) # # Ternary plot of 1000 SNPs simulated under HWE and uniform allele frequency, with # acceptance region for a chi-square test and significant markers in red. # X <- HWData(nm=1000,n=100) HWTernaryPlot(X) # # Ternary plot of 100 SNPs simulated under HWE and uniform allele frequency, with # acceptance region for an exact test and significant markers in red. # X <- HWData(nm=100,n=100) ## Not run: HWTernaryPlot(X,region=7)
# # Ternary plot with 1000 SNPs and HWE curve # X <- HWData(nm=1000,n=100) HWTernaryPlot(X,100,region=0,vertex.cex=2,pch=1) # # Genotype frequency pattern of simulated SNPs with uniform # allele frequency distribution # X <- HWData(nm=1000,n=100) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of simulated SNPs with skewed # allele frequency distribution # X <- HWData(nm=1000,n=100,shape1=1,shape2=20) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.50 # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.25 # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE) # # Genotype frequency pattern of SNPs simulated under HWE with # allele frequency of 0.25, using a triangle to mark siginficant markers # X <- HWData(nm=1000,n=100,p=0.25) HWTernaryPlot(X,patternramp = TRUE, region=1, patternsigsymbol = 2) # # Ternary plot of 1000 SNPs simulated under HWE and uniform allele frequency, with # acceptance region for a chi-square test and significant markers in red. # X <- HWData(nm=1000,n=100) HWTernaryPlot(X) # # Ternary plot of 100 SNPs simulated under HWE and uniform allele frequency, with # acceptance region for an exact test and significant markers in red. # X <- HWData(nm=100,n=100) ## Not run: HWTernaryPlot(X,region=7)
Function HWTriExact
does a standard exact test for Hardy-Weinberg
equilibrium of a tri-allelic variant, and also does joint exact tests
for equilibrium and equality of allele frequencies if the genotype
counts are given separately for both sexes
HWTriExact(x, y = NULL, eps = 1e-10, nperm = 17000, verbose = TRUE)
HWTriExact(x, y = NULL, eps = 1e-10, nperm = 17000, verbose = TRUE)
x |
vector with 6 genotype counts (AA,AB,AC,BB,BC,CC) |
y |
vector with 6 or 3 genotype counts (AA,AB,AC,BB,BC,CC) or (A,B,C) |
eps |
a tolerance that can be set for comparing exact probabilities |
nperm |
number of permutations (only relevant for autosomal stratified by gender) |
verbose |
print output or not |
If only x
is specified, an exact test for an autosomal variant
with three alleles will be performed.
If both x
and y
are supplied as vectors with 6 elements,
a permutation test for HWE and equality of allele frequencies (EAF) for an
autosomal variant is performed, using nperm
permutations. The
distribution of the probabilities is returned in pseudodist
. The
computational cost of a completed enumeration algorithm can be
prohibitive in this case.
If x
is supplied as a length 6 vector, and y
as a length 3
vector, the variant is assumed to be X-chromosomal, x
containing
female genotype counts and y
containing male genotyep counts. In
this case a joint exact test for HWE and EAF for an X-chromosomal
tri-allelic variant is executed.
See the examples in the example section below.
pval |
The p-value of the sample |
pseudodist |
Distribution of probabilities obtained by simulation |
pofthesample |
The probability of the observed sample |
Jan Graffelman [email protected]
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. Molecular Ecology Resources. 18(3) pp. 461–473. doi:10.1111/1755-0998.12748.
# # Autosomal tri-allelic (not accounting for gender) # x <- c(AA=20,AB=31,AC=26,BB=15,BC=12,CC=0) ## Not run: out <- HWTriExact(x) # # Autosomal tri-allelic accounting for gender # males <- c(A=1,B=21,C=34) females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15) ## Not run: out <- HWTriExact(females,males) # # X-chromosomal tri-allelic accounting for gender # males <- c(A=1,B=21,C=34) females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15) ## Not run: out <- HWTriExact(females,males)
# # Autosomal tri-allelic (not accounting for gender) # x <- c(AA=20,AB=31,AC=26,BB=15,BC=12,CC=0) ## Not run: out <- HWTriExact(x) # # Autosomal tri-allelic accounting for gender # males <- c(A=1,B=21,C=34) females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15) ## Not run: out <- HWTriExact(females,males) # # X-chromosomal tri-allelic accounting for gender # males <- c(A=1,B=21,C=34) females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15) ## Not run: out <- HWTriExact(females,males)
Calculates the inverse of Fisher's z transformation
ifisherz(y)
ifisherz(y)
y |
a real number |
a correlation coefficient in the range (-1,1)
Jan Graffelman ([email protected])
r <- 0.5 print(ifisherz(fisherz(r)))
r <- 0.5 print(ifisherz(fisherz(r)))
Function is.mono
tests if a bi-allelic variant is monomorphic or not
is.mono(x)
is.mono(x)
x |
a vector of three or five genotype counts ((AA,AB,BB) or (A,B,AA,AB,BB)), or a three-column or five-column matrix with genotype counts (variants in rows, columns) |
is.mono
assumes autosomal variants are coded in three-element vectors or
three-column matrices, whereas X-chromosomal variants are coded in five-element
vectors or five-column matrices.
A logical or vector of logicals
Jan Graffelman ([email protected])
# # a polymorphic autosomal marker # x <- c(AA=10,AB=20,BB=10) print(is.mono(x)) # # a monomorphic autosomal marker # x <- c(AT=0,AA=100,TT=0) print(is.mono(x)) # # an autosomal marker with only heterozygotes # x <- c(AT=100,AA=0,TT=0) print(is.mono(x)) # # a matrix with low maf autosomal markers # set.seed(123) X <- HWData(100,50,shape1=1,shape2=20) number.monomorphics <- sum(is.mono(X)) print(number.monomorphics) # # a polymorphic X chromosomal marker # x <- c(G=24,C=26,GG=12,CC=13,GC=25) print(is.mono(x)) # # another polymorphic X chromosomal marker # x <- c(G=24,C=1,GG=25,CC=0,GC=0) is.mono(x) # # a monomorphic X chromosomal marker # x <- c(G=24,C=0,GG=12,CC=0,GC=0) is.mono(x) # # a matrix with low maf X-chromosomal markers # set.seed(123) Y <- HWData(100,50,shape=1,shape2=20,x.linked = TRUE) number.monomorphics <- sum(is.mono(Y)) print(number.monomorphics)
# # a polymorphic autosomal marker # x <- c(AA=10,AB=20,BB=10) print(is.mono(x)) # # a monomorphic autosomal marker # x <- c(AT=0,AA=100,TT=0) print(is.mono(x)) # # an autosomal marker with only heterozygotes # x <- c(AT=100,AA=0,TT=0) print(is.mono(x)) # # a matrix with low maf autosomal markers # set.seed(123) X <- HWData(100,50,shape1=1,shape2=20) number.monomorphics <- sum(is.mono(X)) print(number.monomorphics) # # a polymorphic X chromosomal marker # x <- c(G=24,C=26,GG=12,CC=13,GC=25) print(is.mono(x)) # # another polymorphic X chromosomal marker # x <- c(G=24,C=1,GG=25,CC=0,GC=0) is.mono(x) # # a monomorphic X chromosomal marker # x <- c(G=24,C=0,GG=12,CC=0,GC=0) is.mono(x) # # a matrix with low maf X-chromosomal markers # set.seed(123) Y <- HWData(100,50,shape=1,shape2=20,x.linked = TRUE) number.monomorphics <- sum(is.mono(Y)) print(number.monomorphics)
JPTtriallelicsChrX
contains three selected multi-allelic variants
on chromosome 7 from the Japanese sample of the 1000 genomes project.
data("JPTmultiallelicsChr7")
data("JPTmultiallelicsChr7")
List object with fields m4,f4; m5,f5; m6,f6;
The list object contains male and female genotype counts for 3 multi-allelic variants on chromosome 7 of the JPT sample of the 1000 genomes project.
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 10.
data(JPTmultiallelicsChr7) str(JPTmultiallelicsChr7)
data(JPTmultiallelicsChr7) str(JPTmultiallelicsChr7)
JPTtriallelicsChrX
contains four selected multi-allelic variants on the
X chromosome from the Japanese sample of the 1000 genomes project.
data("JPTmultiallelicsChrX")
data("JPTmultiallelicsChrX")
List object with fields m4,f4; m5,f5; m6,f6; m7,f7
The list object contains male and female genotype counts for four multi-allelic variants of the JPT sample of the 1000 genomes project.
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 7.
data(JPTmultiallelicsChrX) m4 <- JPTmultiallelicsChrX$m4 f4 <- JPTmultiallelicsChrX$f4
data(JPTmultiallelicsChrX) m4 <- JPTmultiallelicsChrX$m4 f4 <- JPTmultiallelicsChrX$f4
JPTsnps
contains genotype counts for the two sexes of ten single
nucleotide polymorphisms of the Japanese (JPT) sample of the 1000
Genomes project, consisting of 56 males and 48 females.
The first three columns contain the male genotype counts, and the last three columns contain the female genotype counts.
data("JPTsnps")
data("JPTsnps")
data frame
Puig, X., Ginebra, J. and Graffelman, J. (2019) Bayesian model selection for the study of Hardy-Weinberg proportions and homogeneity of gender allele frequencies.
data(JPTsnps)
data(JPTsnps)
JPTtriallelics
contains six selected tri-allelic variants on
chromosome 7 from the Japanese sample of the 1000 genomes project.
data("JPTtriallelicsChr7")
data("JPTtriallelicsChr7")
A data frame with 6 observations on the following 14 variables.
id
RS identifier
pos
position in base pairs
mAA
number of AA males
mAB
number of AB males
mAC
number of AC males
mBB
number of BB males
mBC
number of BC males
mCC
number of CC males
fAA
number of AA females
fAB
number of AB females
fAC
number of AC females
fBB
number of BB females
fBC
number of BC females
fCC
number of CC females
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 9.
data(JPTtriallelicsChr7) str(JPTtriallelicsChr7)
data(JPTtriallelicsChr7) str(JPTtriallelicsChr7)
JPTtriallelicsChrX
contains five selected tri-allelic variants on the
X chromosome from the Japanese sample of the 1000 genomes project.
data("JPTtriallelicsChrX")
data("JPTtriallelicsChrX")
A data frame with 5 observations on the following 12 variables.
id
Identifier of the polymorphism
pos
Position of the polymorphism in base pairs
chr
Chromosome
A
Number of males with A genotype
B
Number of males with B genotype
C
Number of males with C genotype
AA
Number of AA females
AB
Number of AB females
AC
Number of AC females
BB
Number of BB females
BC
Number of BC females
CC
Number of CC females
Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 6.
data(JPTtriallelicsChrX) str(JPTtriallelicsChrX)
data(JPTtriallelicsChrX) str(JPTtriallelicsChrX)
mac
computes the smallest allele count for a given vector of
genotype counts.
mac(X)
mac(X)
X |
a vector or matrix with genotype counts (AA, AB, BB) |
a vector of the minor allele counts
Jan Graffelman ([email protected])
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) names(X) <- c("AA","AB","BB") print(X) print(mac(X))
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) names(X) <- c("AA","AB","BB") print(X) print(mac(X))
Function maf
computes the minor allele frequency for
a matrix or vector of genotype counts.
maf(x, option = 1, verbose = FALSE)
maf(x, option = 1, verbose = FALSE)
x |
a vector or matrix of with genotype counts |
option |
determines output that is returned. |
verbose |
be silent ( |
a vector of minor allele frequencies or minor allele counts.
Jan Graffelman ([email protected])
# # MAF of a single random marker # set.seed(123) X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) names(X) <- c("AA","AB","BB") print(X) print(maf(X)) # # MAF of MN bloodgroup counts # x <- c(MM=298,MN=489,NN=213) maf(x) # # Both allele frqeuencies, ordered from minor to major # maf(x,2) # # allele counts of MN bloodgroup counts, order from minor to major # maf(x,3) # # MAF of single triallelic marker in triangular format # x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) print(x) GT <- toTriangular(x) print(GT) maf(GT) # # extract all allele frequencies # maf(GT,option=2) # # extract all allele counts # maf(GT,option=3) # # Calculate the MAF for 10 random SNPs under HWE # set.seed(123) Z <- HWData(nm=10) print(Z) # # vector with minor allele frequencies, one per marker # maf(Z) # # Matrix with minor and major allele frequencies, one row per marker # maf(Z,2) # # Matrix with minor and major allele count, one row per marker # maf(Z,3)
# # MAF of a single random marker # set.seed(123) X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1))) names(X) <- c("AA","AB","BB") print(X) print(maf(X)) # # MAF of MN bloodgroup counts # x <- c(MM=298,MN=489,NN=213) maf(x) # # Both allele frqeuencies, ordered from minor to major # maf(x,2) # # allele counts of MN bloodgroup counts, order from minor to major # maf(x,3) # # MAF of single triallelic marker in triangular format # x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) print(x) GT <- toTriangular(x) print(GT) maf(GT) # # extract all allele frequencies # maf(GT,option=2) # # extract all allele counts # maf(GT,option=3) # # Calculate the MAF for 10 random SNPs under HWE # set.seed(123) Z <- HWData(nm=10) print(Z) # # vector with minor allele frequencies, one per marker # maf(Z) # # Matrix with minor and major allele frequencies, one row per marker # maf(Z,2) # # Matrix with minor and major allele count, one row per marker # maf(Z,3)
MakeCounts
creates a matrix of genotype counts, with one row for
each bi-allelic marker, containing 4 columns with the counts AA, AB, BB
and NA (missings) respectively
MakeCounts(X, alleles, pos1 = 1, pos2 = 3, coding = c(AA=0,AB=1,BB=2), sep = "")
MakeCounts(X, alleles, pos1 = 1, pos2 = 3, coding = c(AA=0,AB=1,BB=2), sep = "")
X |
A matrix or dataframe with bi-allelic genotyping information, markers in columns, individuals in rows |
alleles |
a vector of alleles for each marker
(e.g. c("A/T","A/G",...)). Only relevanit if |
pos1 |
position of the first allele in the allele string (1 by default) |
pos2 |
position of the second allele in the allele string (3 by default) |
coding |
indicates how homozygotes and heterozygote are coded as
numbers. Only relevant if |
sep |
allele separator character for genotype data in text format ("" for AA; "/" for "A/A") |
MakeCounts
is thought for bi-allelic marker data only. Missings
are should be coded by NA. It produces the right input for
HWTernaryPlot
.
Heterozygotes may be coded in the data as "AB" or "BA". Both entries will be counted as a heterozygote.
A matrix of 4 columns
Jan Graffelman [email protected]
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG") SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA") SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT") X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5) Y <- MakeCounts(X,c("A/G","C/G","A/G","A/G","C/T")) print(Y) W <- matrix(sample(c(0,1,2,NA),100,replace=TRUE),ncol=5) Z <- MakeCounts(W,coding=c(0,1,2))
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG") SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA") SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT") X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5) Y <- MakeCounts(X,c("A/G","C/G","A/G","A/G","C/T")) print(Y) W <- matrix(sample(c(0,1,2,NA),100,replace=TRUE),ncol=5) Z <- MakeCounts(W,coding=c(0,1,2))
MakeFactor
converts bi-allelic genetic marker data, whether coded numerically
as (0,1,2) or as (GG,GT,TT), etc. into standard factors coded as AA,
AB, BB.
MakeFactor(x, coding = c(0, 1, 2))
MakeFactor(x, coding = c(0, 1, 2))
x |
A vector containing genotyping results |
coding |
Describes the numerical coding of the genotype data in
order AA, AB and BB. Only relevant if |
If x
is a factor, it will be coerced to a factor with
levels AA, AB and BB. Important detail: the produced factors will have
only those levels that are observed in the data. E.g., if genotyping
results only consist of (0,1), then the resulting factor will not have
level BB (which would be an empty category)
A factor variable
Jan Graffelman [email protected]
y <- c(1,1,0,0,2,2) data.frame(y,MakeFactor(y)) y <- c(2,2,3,3,1,1) data.frame(y,MakeFactor(y,coding=c(1,2,3))) data(Markers) data.frame(Markers[,1],MakeFactor(Markers[,1],coding=c(1,2,3)))
y <- c(1,1,0,0,2,2) data.frame(y,MakeFactor(y)) y <- c(2,2,3,3,1,1) data.frame(y,MakeFactor(y,coding=c(1,2,3))) data(Markers) data.frame(Markers[,1],MakeFactor(Markers[,1],coding=c(1,2,3)))
The dataframe contains the genotypes of 3 SNPs and two allele intensities of 146 individuals. The first column is a GT polymorphism that has missing values for several individuals. The second and third column (iG and iG) are the allele intensities of this polymorphism. Column 4 and 5 are covariate SNPs (an AC and an AG polymorphism) that have no missing values.
data(Markers)
data(Markers)
A data frame containing 146 rows and 5 columns
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software, 64(3), 1-23. doi:10.18637/jss.v064.i03.
The dataframe contains the genotype frequencies MM, MN and NN for the MN blood group locus for 216 populations. The data are taken from table 2.5 in Mourant et al., using only entries with a sample size of at least 500.
data(Mourant)
data(Mourant)
A data frame containing 216 observations.
Mourant et al, Table 2.5
Mourant, A. E. and Kope\'c, A. C. and Domaniewska-Sobczak, K. (1976) The Distribution of the Human Blood Groups and other Polymorphisms. Second edition. Oxford University Press, London.
Function n.alleles
determines the number of alleles in a named
genotype vector.
n.alleles(x, ...)
n.alleles(x, ...)
x |
A named genotype vector (e.g. c(AA=10,AB=20,BB=5)) |
... |
extra arguments that are passed on to |
integer
Jan Graffelman ([email protected])
x <- c(AA=25,AB=50,BB=25) k <- n.alleles(x) print(k)
x <- c(AA=25,AB=50,BB=25) k <- n.alleles(x) print(k)
NistSTRs
contains 29 autosomal microsatellites (STRs) of individuals of Caucasian ancestry. The two alleles of an individual are separated into two columns for each STR.
data("NistSTRs")
data("NistSTRs")
A data frame with 361 observations on the following 58 variables.
First allele of CSF1PO
Second allele of CSF1PO
First allele of D10S1248
Second allele of D10S1248
First allele of D12S391
Second allele of D12S391
First allele of D13S317
Second allele of D13S317
First allele of D16S539
Second allele of D16S539
First allele of D18S51
Second allele of D18S51
First allele of D19S433
Second allele of D19S433
First allele of D1S1656
Second allele of D1S1656
First allele of D21S11
Second allele of D21S11
First allele of D22S1045
Second allele of D22S1045
First allele of D2S1338
Second allele of D2S1338
First allele of D2S441
Second allele of D2S441
First allele of D3S1358
Second allele of D3S1358
First allele of D5S818
Second allele of D5S818
First allele of D6S1043
Second allele of D6S1043
First allele of D7S820
Second allele of D7S820
First allele of D8S1179
Second allele of D8S1179
First allele of F13A01
Second allele of F13A01
First allele of F13B
Second allele of F13B
First allele of FESFPS
Second allele of FESFPS
First allele of FGA
Second allele of FGA
First allele of LPL
Second allele of LPL
First allele of Penta_C
Second allele of Penta_C
First allele of Penta_D
Second allele of Penta_D
First allele of Penta_E
Second allele of Penta_E
First allele of SE33
Second allele of SE33
First allele of TH01
Second allele of TH01
First allele of TPOX
Second allele of TPOX
First allele of vWA
Second allele of vWA
http://strbase.nist.gov
Hill, C. et al. (2013) U.S. population data for 29 autosomal STR loci. Forensic science international: Genetics 497 7(3):e82-3.
data(NistSTRs)
data(NistSTRs)
Function order.auto
reorders a named vector of three genotype
counts, such that the sequence (minor homozygote, heterozygote,
major heterzygote) is establised.
order.auto(X)
order.auto(X)
X |
a named vector of genotype counts (e.g. c(AA=25,AB=50,BB=25)) |
a vector
Jan Graffelman [email protected]
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
x <- c(MN=489,MM=298,NN=213) print(x) y <- order.auto(x) print(y)
x <- c(MN=489,MM=298,NN=213) print(x) y <- order.auto(x) print(y)
Function order.x
reorders a named vector of three genotype
counts, such that the sequence (minor hemizygote, major hemizygote,
minor homozygote, heterozygote, major heterzygote) is establised.
order.x(X)
order.x(X)
X |
a named vector of genotype counts (e.g. c(A=10,B=10,AA=25,AB=50,BB=25)) |
a vector
Jan Graffelman [email protected]
Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.
x <- c(A=392, B=212, AA=275, AB=296, BB=80) print(x) y <- order.x(x) print(y)
x <- c(A=392, B=212, AA=275, AB=296, BB=80) print(x) y <- order.x(x) print(y)
qqunif
makes a Q-Q plot against a uniform distribution for the supplied data vector.
qqunif(x, logplot = FALSE, lbs = 1:length(x), texton = FALSE, xylim = NULL, main = "Q-Q plot for a uniform distribution", plotline = 0, xlab = "Expected p-value", ylab = "Observed p-value", colvec=rep("black",length(x)), colline = "black", ...)
qqunif(x, logplot = FALSE, lbs = 1:length(x), texton = FALSE, xylim = NULL, main = "Q-Q plot for a uniform distribution", plotline = 0, xlab = "Expected p-value", ylab = "Observed p-value", colvec=rep("black",length(x)), colline = "black", ...)
x |
The data vector |
logplot |
If |
lbs |
A vector of labels for the points in the Q-Q plot |
texton |
Logical indicating whether labels should be plotted or not |
xylim |
Shared upper limit x and y axis in the plot |
main |
Title for the plot |
plotline |
Setting |
xlab |
label for the x axis |
ylab |
label for the y axis |
colvec |
vector with colours for the points in the QQ plot |
colline |
colour for the line in the plot |
... |
Any additional arguments for the |
pvals |
observed probabilities |
epvals |
expected probabilities |
Jan Graffelman ([email protected])
x <- runif(1000) z <- qqunif(x)
x <- runif(1000) z <- qqunif(x)
function recode
recodes bi-allelic genetic marker information
expressed as strings (e.g. "AA", "AB", "BB") into numerical form.
recode(X, alleles, values = c(0, 1, 2), pos1 = 1, pos2 = 3, minor = FALSE, verbose = FALSE)
recode(X, alleles, values = c(0, 1, 2), pos1 = 1, pos2 = 3, minor = FALSE, verbose = FALSE)
X |
A matrix or dataframe of bi-allelic markers, individuals in rows, markers in columns |
alleles |
a vector with the alleles for each marker (e.g. c("A/T", "A/G", etc)) |
values |
a vector of numerica values for AA, AB and BB, ((0,1,2) by default). |
pos1 |
position of the first allele in the allele string (1 by default). |
pos2 |
position of the second allele in the allele string (3 by default). |
minor |
coding is according to the number of copies of the minor
allele. if |
verbose |
print progress on the conversion or not. |
recode
is written for bi-allelic marker data
only. Heterozygotes may be coded both as AB or BA. By default, the
second allele specified (e.g. "T" in "A/T") is counted in the
recoding, and homozygotes AA are coded as 0 and homozygotes TT as 2.
A numerical matrix, individuals in rows, markers in columns
Jan Graffelman [email protected]
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG") SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA") SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT") X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5) Y <- recode(X,c("A/G","C/G","A/G","A/G","C/T")) print(Y)
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG") SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA") SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG") SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT") X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5) Y <- recode(X,c("A/G","C/G","A/G","A/G","C/T")) print(Y)
Function shannon
calculates the Shannon index and its variance for a vector of counts.
shannon(x)
shannon(x)
x |
a vector of counts |
Hp |
the sample Shannon index |
VHp |
the sample variance of the Shannon index |
Jan Graffelman ([email protected])
# # Shannon index for allele frequencies of a biallelic MN blood group polymorphism # x <- c(MM=298,MN=489,NN=213) p <- af(x) shannon(c(p,1-p))$Hp # # Shannon index for the allele frequencies of an STR # data("NistSTRs") AlleleTable <- table(c(NistSTRs[,1],NistSTRs[,2])) AlleleFreq <- AlleleTable/sum(AlleleTable) shannon(AlleleFreq)$Hp
# # Shannon index for allele frequencies of a biallelic MN blood group polymorphism # x <- c(MM=298,MN=489,NN=213) p <- af(x) shannon(c(p,1-p))$Hp # # Shannon index for the allele frequencies of an STR # data("NistSTRs") AlleleTable <- table(c(NistSTRs[,1],NistSTRs[,2])) AlleleFreq <- AlleleTable/sum(AlleleTable) shannon(AlleleFreq)$Hp
Function strsort
collapses all tokens of a vector of strings in a
single string with sorted tokens
strsort(s)
strsort(s)
s |
a vector of character strings |
a string
Jan Graffelman [email protected]
x <- c("AA","AB","BB","AC","CC") print(strsort(x))
x <- c("AA","AB","BB","AC","CC") print(strsort(x))
Function ThetatoF
converts disequilibrium measure theta to an
inbreeding coefficient.
ThetatoF(p, theta = 4)
ThetatoF(p, theta = 4)
p |
the allele frequency |
theta |
the disequilibrium parameter |
a real number
Jan Graffelman ([email protected])
Rohlfs, R.V. and Weir, B.S. (2008) Distributions of Hardy-Weinberg Equilibrium Test Statistics, Genetics, 180(3) pp. 1609-1616.
f <- ThetatoF(0.5,4)
f <- ThetatoF(0.5,4)
Function toTriangular
converts a named vector of genotype counts
into a triangular matrix format, with homozygotes on the diagonal and
heterozygotes below the diagonal.
toTriangular(x)
toTriangular(x)
x |
A vector of genotype counts |
a matrix
Jan Graffelman [email protected]
x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) print(x) X <- toTriangular(x) print(X)
x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26) print(x) X <- toTriangular(x) print(X)
This dataframe contains genotype counts for six three-allelic polymorphisms (A,B,C) on chromosome X of a sample of individuals from the TSI population (Tuscany, Italy) of the 1,000 genomes project.
data(TSIXTriAllelics)
data(TSIXTriAllelics)
A data frame with 6 observations on the following 10 variables.
ID
Identifier of the polymorphism
A
Male A genotype count
B
Male B genotype count
C
Male C genotype count
AA
Female AA genotype count
AB
Female AB genotype count
AC
Female AC genotype count
BB
Female BB genotype count
BC
Female BC genotype count
CC
Female CC genotype count
Data taken from the 1,000 genomes project at www.internationalgenome.org
Graffelman, J. and Ortoleva, L. (2020) A network algorithm for the X chromosomal exact test for Hardy-Weinberg equilibrium with multiple alleles. Under review.
data(TSIXTriAllelics)
data(TSIXTriAllelics)
Function UniqueGenotypeCounts
creates a matrix containing
only the unique rows in the given matrix, together with their frequency
of occurrence
UniqueGenotypeCounts(X, verbose = TRUE)
UniqueGenotypeCounts(X, verbose = TRUE)
X |
A n by 3 matrix with genotypic counts (AA,AB,BB) |
verbose |
If TRUE then print some statistics |
A matrix with 4 columns, AA, AB, BB, and frequency of occurrence
Jan Graffelman [email protected]
set.seed(123) X <- HWData(n=100,nm=100) print(nrow(X)) Y <- UniqueGenotypeCounts(X) print(nrow(Y)) print(sum(Y$w))
set.seed(123) X <- HWData(n=100,nm=100) print(nrow(X)) Y <- UniqueGenotypeCounts(X) print(nrow(Y)) print(sum(Y$w))
Function vaf
computes the sample variance of the allele frequencies
of a single sample or a matrix of samples.
vaf(X, hw = FALSE)
vaf(X, hw = FALSE)
X |
vector or matrix with genotype counts (AA,AB,BB) |
hw |
assume Hardy-Weinberg proportions ( |
For biallelic markers the variance of the minor allele frequency equals the variance of the major allele frequency.
a numeric vector of variances.
Jan Graffelman [email protected]
Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter 2.
x <- c(MM=298,MN=489,NN=213) pA <- af(x) vA <- vaf(x) cat("allele frequency:",pA,"\n") cat("sample variance allele frequency:",vA,"\n")
x <- c(MM=298,MN=489,NN=213) pA <- af(x) vA <- vaf(x) cat("allele frequency:",pA,"\n") cat("sample variance allele frequency:",vA,"\n")