Package 'HardyWeinberg'

Title:	Statistical Tests and Graphics for Hardy-Weinberg Equilibrium
Description:	Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) <doi:10.1038/hdy.2016.20>, including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related JSS paper <doi:10.18637/jss.v064.i03>.
Authors:	Jan Graffelman [aut, cre], Christopher Chang [ctb], Xavi Puig [ctb], Jan Wigginton [ctb], Leonardo Ortoleva [ctb], William R. Engels [ctb]
Maintainer:	Jan Graffelman <[email protected]>
License:	GPL (>= 2)
Version:	1.7.8
Built:	2025-03-03 07:17:44 UTC
Source:	CRAN

Help Index

Statistical Tests and Graphics for Hardy-Weinberg Equilibrium
Function to compute allele frequencies
Exact test of equality of allele frequencies for males and females
Function to compute X-chromosomal allele frequencies
Calculate allele and genotype counts for X-chromosomal markers
Extract alleles
Calculate triangular genotype matrix for vector(s) of alleles.
Genotype frequencies for 70 SNPs related to Alzheimer's disease
Biallelic polymorphisms sampled from chromosome 22 of the CEU population of the 1000 Genomes project.
Calculate Graffelman-Weir exact density for bi-allelic X-chromosomal variant
Calculate Levene's exact density for k alleles
Calculate Levene's density for a bi-allelic variant
Exact test for equality of allele frequencies in males and females
Fisher's z transformation
Fold a square matrix
Generate genotypic compositions
Label genotype counts of a vector or matrix
Glyoxalase genotype data
Genotype frequencies for 225 SNPs on chromosome 1 of the CHB population.
Calculate expected heterozygosity (He)
Estimate allele frequencies and test for Hardy-Weinberg equilibrium with a tri-allelic ABO system.
Compute Akaike's Information Criterion (AIC) for HWP and EAF models
Perform all tests for Hardy-Weinberg equilibrium
Compute additive log-ratio transformation
Plot genetic markers in additive log-ratio coordinates
Chi square tests for Hardy Weinberg equilibrium
Matrix version of HWChisq
Fast computation of chi-square statistics for Hardy-Weinberg equilibrium
Convert genotype counts to compositions
Compute the centred log-ratio transformation
Plot genetic markers in centred log-ratio coordinates
Compute probability of a genotypic sample
Compute disequilibrium statistic D
Generate genetic marker data in or out of Hardy-Weinberg Equilibrium
Estimation of contributions of two populations to a sample of genotype frequencies with the EM algorithm.
Exact test for Hardy-Weinberg equilibrium
Matrix version of HWExact
Exact test for Hardy-Weinberg equilibrium
Computation of Exact p-values for Hardy-Weinberg equilibrium for sets of SNPs
Computation of inbreeding coefficient
Scatter plot of the genotype frequencies
Compute isometric log ratio coordinates.
Plot bi-allelic genetic markers in isometric log ratio coordinates
Calculate a posteriori density for Lindley's alpha
Calculate a credible interval for Lindley's alpha for HWE,
Perform most relevant likelihood ratio test for Hardy-Weinberg equilibrium and equality of allele frequencies
Likelihood ratio test for Hardy Weinberg equilibrium
Perform likelihood ratio test comparing two nested scenarios for a bi-allelic genetic variant, distinguishing the two sexes.
Test a bi-allelic marker for Hardy-Weinberg equilibrium in the presence of missing genotype information.
Autosomal and X-chromosomal exact tests for HWE via a Network algorithm
Permutation test for Hardy-Weinberg equilibrium
Permutation tests fo variants with multiple alleles
Calculation of posterior probabilities and Bayes factors for Hardy-Weinberg tests at X-chromosomal variants.
Compute the power of a test for Hardy-Weinberg equilibrium.
A Q-Q plot for Hardy-Weinberg equilibrium
Testing a set of microsatellites (STRs) for Hardy-Weinberg equilibrium
Asymptotic test for HWE across strata for a single biallelic marker
Ternary plot with the Hardy-Weinberg acceptance region
Exact test for Hardy-Weinberg equilibrium and equality of allele frequencies for tri-allelic variants.
Inverse Fisher z transformation
Detects autosomal and X-chromosomal monomorphic variants
Multi-allelic autosomal variants of the Japanese population of the 1000 genomes project
Multi-allelic X-chromosomal variants of the Japanese population of the 1000 genomes project
Bi-allelic SNPs from a Japanese population stratified by gender
Tri-allelic variants on chromosome 7 of the Japanese (JPT) sample of the 1000 genomes project
Tri-allelic variants on the X-chromosome of the Japanese (JPT) sample of the 1000 genomes project
Compute the minor allele count.
Function to compute minor allele frequencies
Create genotype counts from bi-allelic marker data
Make factors from genotyping data
SNP data and intensities
Genotype frequencies for blood group locus MN
Number of alleles
NIST autosomal STR data
Reordering of autosomal genotype counts
Reordering of X-chromosomal genotype counts
Q-Q plot for a uniform distribution.
Recode genotype information
Shannon index
Sort tokens of a set of strings
Convert theta to an inbreeding coefficient
Convert a vector of genotype counts to triangular format
Tri-allelic polymorphisms on the X chromosome of the TSI population
Extract unique genotypic compositions from a matrix
Computes the sample variance of the allele frequency for a biallelic marker.

Statistical Tests and Graphics for Hardy-Weinberg Equilibrium

Description

Contains tools for exploring Hardy-Weinberg equilibrium (Hardy, 1908; Weinberg, 1908) for bi and multi-allelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) with bi-allelic variants are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included (Graffelman & Weir, 2016) including Bayesian procedures. Some exact and permutation procedures also work with multi-allelic variants. Special test procedures that jointly address Hardy-Weinberg equilibrium and equality of allele frequencies in both sexes are supplied, for the bi and multi-allelic case. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of bi-allelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots. The functionality of the package is explained in detail in a related paper (Graffelman, 2015).

Details

Package:	HardyWeinberg
Type:	Package
Version:	1.7.7
Date:	2024-03-31
License:	GPL Version 2 or later.

With function HWTernaryPlot one can create ternary plots with acceptance regions for HWE. Several routines implement statistical tests for HWE such as HWChisq, HWExact, HWLratio and HWPerm. Bayesian procedures are available using HWPosterior. Aikaike information criterion for various scenarios can be calculated with HWAIC. Power computations are possible with HWPower.

Author(s)

Jan Graffelman

Maintainer: Jan Graffelman <[email protected]>

References

Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates, Massachusetts.

Graffelman, J. and Morales, J. (2008) Graphical Tests for Hardy-Weinberg Equilibrium Based on the Ternary Plot. Human Heredity 65(2):77-84.

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6): 558-568. doi:10.1038/hdy.2016.20 Nature Publishing Group.

Hardy, G.H. (1908) Mendelian Proportions in a Mixed Population. Science 28(706): 49-50.

Weinberg, W. (1908) On the Demonstration of Heredity in Man. In: Papers on Human Genetics. S. H. Boyer (Ed.) Englewood Cliffs, NJ:Prentice Hall, translated, 1963.

Examples


library(HardyWeinberg)

# draw random SNPs from a population that is in HWE

set.seed(123)

m <- 100 # number of markers
n <- 100 # sample size

X <- HWData(n,m)
out <- HWTernaryPlot(X,100,region=1,vertex.cex=2,signifcolour=TRUE)

library(HardyWeinberg)

# draw random SNPs from a population that is in HWE

set.seed(123)

m <- 100 # number of markers
n <- 100 # sample size

X <- HWData(n,m)
out <- HWTernaryPlot(X,100,region=1,vertex.cex=2,signifcolour=TRUE)

Function to compute allele frequencies

Description

Function af computes the allele frequency for a vector with autosomal or X-chromosomal genotype counts or compositions.

Usage

af(x)
af(x)

Arguments

`x`	a vector or matrix with counts or compositions. `x` must have either three elements (autosomal) or five elements (X-chromosomal)

Details

Function af calculates the A allele frequency for the given order of autosomal genotype counts (AA,AB,BB) or the given order of X chromosomal genotype counts (A,B,AA,AB,BB). The genotypes must be supplied this order. Use maf to calculate the minor allele frequency and more flexible coding.

Value

a vector with allele frequencies

Author(s)

Jan Graffelman ([email protected])

Examples

#
# A simulated random marker
#
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
X <- X/sum(X)
print(X)
print(af(X))
#
# MN blood group counts
#
x <- c(MM=298,MN=489,NN=213)
#
# Calculate frequency M allele
#
af(x)
#
# Calculate frequency N allele
#
af(rev(x))
1 - af(x)
#
# Calculated allele frequencies (p) for a matrix of genotype counts.
#
X <- HWData(nm=10,n=100)
X
p <- af(X)
p
#
# A simulated random marker
#
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
X <- X/sum(X)
print(X)
print(af(X))
#
# MN blood group counts
#
x <- c(MM=298,MN=489,NN=213)
#
# Calculate frequency M allele
#
af(x)
#
# Calculate frequency N allele
#
af(rev(x))
1 - af(x)
#
# Calculated allele frequencies (p) for a matrix of genotype counts.
#
X <- HWData(nm=10,n=100)
X
p <- af(X)
p

Exact test of equality of allele frequencies for males and females

Description

Function AFtest tests equality of allele frequencies for males and females for bi-allelic marker data by means of a Fisher exact test.

Usage

AFtest(x, verbose = TRUE, ...)
AFtest(x, verbose = TRUE, ...)

Arguments

`x`	a vector containg the genotypic counts c(A,B,AA,AB,BB) for a bi-allelic X-chromosomal markers.
`verbose`	verbose = TRUE prints results, verbose = FALSE is silent.
`...`	additional arguments for function `fisher.test`.

Details

Function AFtest constructs the contingency table of sex by allele, and call fisher.test to test for equality of allele frequencies. The test assumes Hardy-Weinberg equilibrium.

Value

`AC`	Two-way table of sex by allele
`pval`	p-value of the test

Author(s)

Jan Graffelman [email protected]

Examples

  rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
  AFtest(rs5968922)
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
  AFtest(rs5968922)

Function to compute X-chromosomal allele frequencies

Description

Function afx computes the allele frequency for a vector or matrix of X-chromosomal genotype counts or frequencies.

Usage

afx(x)
afx(x)

Arguments

`x`	a vector or matrix with X-chromosomal counts or compositions. `x` must have five elements or columns.

Details

Function afx calculates the A allele frequency for the given order of X-chromosomal genotype counts (A,B,AA,AB,BB). The genotypes must be supplied this order. Use maf to calculate the minor allele frequency and more flexible coding.

Value

a vector with allele frequencies

Author(s)

Jan Graffelman ([email protected])

Examples

#
# A simulated random X-chromosomal marker
#
X <- as.vector(rmultinom(1,100,c(0.3,0.2,0.1,0.15,0.25)))
X <- X/sum(X)
print(X)
print(afx(X))
#
# 
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
afx(rs5968922)
#
#
#
y <- c(C=337, G=129, CC=277, CG=209, GG=48) 
afx(y)
#
# Calculated allele frequencies (p) for a matrix of X-chromosomal genotype counts.
#
X <- HWData(nm=10,n=1000,x.linked=TRUE)
X
p <- afx(X)
p
#
# A simulated random X-chromosomal marker
#
X <- as.vector(rmultinom(1,100,c(0.3,0.2,0.1,0.15,0.25)))
X <- X/sum(X)
print(X)
print(afx(X))
#
# 
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
afx(rs5968922)
#
#
#
y <- c(C=337, G=129, CC=277, CG=209, GG=48) 
afx(y)
#
# Calculated allele frequencies (p) for a matrix of X-chromosomal genotype counts.
#
X <- HWData(nm=10,n=1000,x.linked=TRUE)
X
p <- afx(X)
p

Calculate allele and genotype counts for X-chromosomal markers

Description

Function agcounts determines sample size, minor are major allele counts, allele counts in females, numbers of males and females and allele frequencies for a vector of genotypes counts of an X-chromosomal markers.

Usage

agcounts(x, verbose = FALSE)
agcounts(x, verbose = FALSE)

Arguments

`x`	a vector of X-chromosomal genotype counts (A,B,AA,AB,BB)
`verbose`	print the counts if (`verbose = TRUE`)

Value

`n`	sample size
`nA`	number of A alleles
`nB`	number of B alleles
`nf`	number of females
`nm`	number of males
`nAf`	number of A alleles in females
`nBf`	number of B alleles in females
`nt`	total number of alleles
`fAA`	number of AA females
`fAB`	number of AB females
`fBB`	number of BB females
`pA`	overall A allele frequency
`pB`	overall B allele frequency

Author(s)

Jan Graffelman [email protected]

Examples

rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
counts <- agcounts(rs5968922)
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
counts <- agcounts(rs5968922)

Extract alleles

Description

Function alleles extracts the names of the alleles from a named genotype vector.

Usage

alleles(x, fromlabels = TRUE)
alleles(x, fromlabels = TRUE)

Arguments

`x`	A named or unnamed genotype vector (e.g. c(AA=10,AB=20,BB=5))
`fromlabels`	extract genotypes from the labels of the vector elements, or from the vector elements themselves.

Value

A character vector with the alleles

Author(s)

Jan Graffelman ([email protected])

Examples

   x <- c(AA=10,AG=10,GG=10,AT=5)
   als.x <- alleles(x)
   print(als.x)

   y <- rep(names(x),x)
   als.y <- alleles(y,fromlabels = FALSE)
   print(als.y)
x <- c(AA=10,AG=10,GG=10,AT=5)
   als.x <- alleles(x)
   print(als.x)

   y <- rep(names(x),x)
   als.y <- alleles(y,fromlabels = FALSE)
   print(als.y)

Calculate triangular genotype matrix for vector(s) of alleles.

Description

AllelesToTriangular constructs a lower triangular matrix of genotype counts from one or two vectors of alleles. It is particularly useful to create genotype counts for microsatellite data (STRs).

Usage

AllelesToTriangular(A1, A2 = NULL, given=NULL) 
AllelesToTriangular(A1, A2 = NULL, given=NULL)

Arguments

`A1`	The first allele of each individual, or a vector with all alleles, two consecutive ones for each individual.
`A2`	The second allele of each individual (optional).
`given`	A vector of known alleles (optional). This argument can be used to specify alleles that may not exist in the data.

Details

If the data is a single column vector with two succesive alleles for each individual, then specify A1 only. If data consists of two columns, each holding one allele of each individual, then specify A1 and A2. Typical STR data that comes in the format of two repeat lengths for a set of individuals can be transformed into a lower triangular matrix with genotype counts. See the examples below.

Value

A lower triangular matrix with genotype counts.

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GM <- AllelesToTriangular(A1,A2)
print(GM)
data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GM <- AllelesToTriangular(A1,A2)
print(GM)

Genotype frequencies for 70 SNPs related to Alzheimer's disease

Description

The dataframe contains the genotype frequencies MM, Mm and mm for the 70 SNPs for both cases and controls. The data are taken from table 7.11 in Laird & Lange.

Usage

data(Alzheimer)data(Alzheimer)

Format

A data frame containing 70 observations.

Source

Laird, N. M. and Lange, C. Table 7.11, p. 124

References

Laird, N. M. and Lange, C. (2011) The fundamentals of modern statistical genetics. Springer.

Biallelic polymorphisms sampled from chromosome 22 of the CEU population of the 1000 Genomes project.

Description

Matrix CEUchr22 contains 10000 single nucleotide polymorphisms sampled from chromosome 22 of sample of 99 Utah residents with Northern and Western European ancestry.

Usage

data("CEUchr22")data("CEUchr22")

Format

Matrix

Details

The polymorphisms are coded in (0,1,2) format representing the count of the reference allele.

Source

https://www.internationalgenome.org/

References

The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526(7571), pp. 68–74.

Examples

data(CEUchr22)
str(CEUchr22)
data(CEUchr22)
str(CEUchr22)

Calculate Graffelman-Weir exact density for bi-allelic X-chromosomal variant

Description

Function dgraffelmanweir calculate the probability P(NAB=nab and MA=ma|NA=na) for a bi-allelic X-chromosomal variant.

Usage

dgraffelmanweir.bi(x, y)dgraffelmanweir.bi(x, y)

Arguments

`x`	vector with male genotype counts (A,B)
`y`	vector with female genotype counts (AA,AB,BB)

Value

a single real number

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20

Examples

males <- c(A=392, B=212)
females <- c(AA=275, AB=296, BB=80)
prob <- dgraffelmanweir.bi(males,females)
print(prob)
males <- c(A=392, B=212)
females <- c(AA=275, AB=296, BB=80)
prob <- dgraffelmanweir.bi(males,females)
print(prob)

Calculate Levene's exact density for k alleles

Description

Function dlevene calculates Levene's exact density for a diploid system with k alleles.

Usage

dlevene(N)
dlevene(N)

Arguments

`N`	A lower triangular matrix with genotype counts

Details

The supplied matrix of genotype counts should be triangular, with the homozygote counts on the diagonal, and all heterozygote counts below the diagonal.

Value

a single real number

Author(s)

Jan Graffelman ([email protected])

References

Levene, H. (1949) On a matching problem arising in genetics. Annals of Mathematical Statistics, 20, pp. 91-94.

Examples

x <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)
x <- toTriangular(x)
prob <- dlevene(x)
print(prob)
x <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)
x <- toTriangular(x)
prob <- dlevene(x)
print(prob)

Calculate Levene's density for a bi-allelic variant

Description

Program dlevene.bi calculates Levene's density (P(AB|A)) for a bi-allelic variant.

Usage

dlevene.bi(x)
dlevene.bi(x)

Arguments

`x`	a vector of genotype counts (AA,AB,BB)

Value

a single real number

Author(s)

Jan Graffelman ([email protected])

References

Levene, H. (1949) On a matching problem arising in genetics. Annals of Mathematical Statistics, 20, pp. 91-94.

Examples

x <- c(AA=298,AB=489,BB=213)
prob <- dlevene.bi(x)
print(prob)
x <- c(AA=298,AB=489,BB=213)
prob <- dlevene.bi(x)
print(prob)

Exact test for equality of allele frequencies in males and females

Description

EAFExact uses a Fisher Exact test to compare allele frequencies in males and females for variants with k alleles (k >= 2).

Usage

EAFExact(m, f, verbose = TRUE, ...)
EAFExact(m, f, verbose = TRUE, ...)

Arguments

`m`	vector or triangular matrix with male genotype counts
`f`	vector or triangular matrix with female genotype counts
`verbose`	print output (TRUE) or not (FALSE)
`...`	additional arguments for `fisher.test`

Details

For bi-allelic autosomal variants the genotype counts can be supplied as vectors ((AA,AB,BB) for males, and (AA,AB,BB) for females). For X-chromosomal bi-allelic variants the genotype counts can also supplied as vectors ((A,B) for males, and (AA,AB,BB) for females). For multi-allelic autosomal variants male and genotype counts can be supplied as vectors (AA,AB,AC,BB,BC,CC,...) or as a triangular matrix, where matrix rows and colums are labelled with the allele name (A,B,C,...). For multi-allelic X-chromosomal variants, male genotype counts must be supplied as a vector (A,B,C,...) and female genotype counts must be supplied as a triangular matrix. See the examples below.

Value

`pval`	p-value
`tab`	table with allele counts

Author(s)

Jan Graffelman [email protected]

Examples

#
# bi-allelic autosomal
#

m <- c(AA=60,AB=96,BB=44) 
f <- c(AA=44,AB=97,BB=59) 
EAFtest <- EAFExact(m,f)

#
# bi-allelic X-chromosomal
#

males   <- c(A=392, B=212)
females <- c(AA=275, AB=296, BB=80)
EAFtest <- EAFExact(males,females,verbose=TRUE)

#
# tri-allelic autosomal
#

males   <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
females <- c(AA=28,AB=55,AC=33,BB=18,BC=50,CC=16)
EAFtest <- EAFExact(males,females,verbose=TRUE)

#
# tri-allelic X-chromosomal
#

males   <- c(A=15,B=17,C=24) 
females <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) 
EAFtest <- EAFExact(males,females,verbose=TRUE)

#
# bi-allelic autosomal
#

m <- c(AA=60,AB=96,BB=44) 
f <- c(AA=44,AB=97,BB=59) 
EAFtest <- EAFExact(m,f)

#
# bi-allelic X-chromosomal
#

males   <- c(A=392, B=212)
females <- c(AA=275, AB=296, BB=80)
EAFtest <- EAFExact(males,females,verbose=TRUE)

#
# tri-allelic autosomal
#

males   <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
females <- c(AA=28,AB=55,AC=33,BB=18,BC=50,CC=16)
EAFtest <- EAFExact(males,females,verbose=TRUE)

#
# tri-allelic X-chromosomal
#

males   <- c(A=15,B=17,C=24) 
females <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4)) 
EAFtest <- EAFExact(males,females,verbose=TRUE)

Fisher's z transformation

Description

Calculates Fisher's z transformation for a correlation coefficient

Usage

fisherz(r)
fisherz(r)

Arguments

`r`	a correlation coefficient

Value

a real number

Author(s)

Jan Graffelman ([email protected])

Examples

r <- 0.5
print(fisherz(r))
r <- 0.5
print(fisherz(r))

Fold a square matrix

Description

The function fold sums corresponding below and above diagonal elements of a square matrix to form a triangular matrix.

Usage

fold(X, lower = TRUE)
fold(X, lower = TRUE)

Arguments

`X`	a square matrix
`lower`	logical. If `lower=TRUE` a lower triangular matrix is formed, if not an upper triangular matrix.

Details

Useful for constructing triangular matrices of genotype counts

Value

A matrix

Author(s)

Jan Graffelman [email protected]

Examples

allelenames <- paste("A",11:13,sep="")
allele1 <- factor(c("A11","A11","A12","A12","A13","A12"),levels=allelenames)
allele2 <- factor(c("A11","A12","A12","A13","A13","A11"),levels=allelenames)
GC  <- table(allele1,allele2)
GC  <- as.matrix(unclass(GC))
GCf <- fold(GC)
allelenames <- paste("A",11:13,sep="")
allele1 <- factor(c("A11","A11","A12","A12","A13","A12"),levels=allelenames)
allele2 <- factor(c("A11","A12","A12","A13","A13","A11"),levels=allelenames)
GC  <- table(allele1,allele2)
GC  <- as.matrix(unclass(GC))
GCf <- fold(GC)

Generate genotypic compositions

Description

GenerateSamples generates all possible genotypic compositions (AA,AB,BB) for a given sample size n.

Usage

GenerateSamples(n = 5)
GenerateSamples(n = 5)

Arguments

`n`	the desired sample size

Value

returns a matrix with in each row a possible genotypic compostion for the given sample size.

Author(s)

Jan Graffelman [email protected]

Examples

GenerateSamples(5)
GenerateSamples(5)

Label genotype counts of a vector or matrix

Description

Function genlabels sets the names of a vector or matrix of genotype counts.

Usage

genlabels(X)
genlabels(X)

Arguments

`X`	a 3 (or 5) element vector with genotype counts, a matrix of genotype counts (3 or 5 columns)

Value

A vector or a matrix

Author(s)

Jan Graffelman ([email protected])

Examples

x <- c(25,50,25)
x <- genlabels(x)
x <- c(25,50,25)
x <- genlabels(x)

Glyoxalase genotype data

Description

Biallelic glyoxalase genotype data for 17 populations from India

Usage

data("Glyoxalase")data("Glyoxalase")

Format

A data frame with 17 observations on the following 3 variables.

AA: number of homozygote AA individuals
AB: number of heterozygote AB individuals
BB: number of homozygote BB individuals

Source

Olson, J.M. (1993) Table 3.

References

Olson, J.M. (1993) Testing the Hardy-Weinberg law across strata. Annals of Human Genetics, 57, pp. 291-295.

Examples

data(Glyoxalase)
data(Glyoxalase)

Genotype frequencies for 225 SNPs on chromosome 1 of the CHB population.

Description

The dataframe contains the genotype frequencies in generic notation, AA, AB and BB the first 225 polymorphic SNPs without missing data on chromosome 1 of the Han Chinese in Beijing. The data are compiled from the HapMap project, phase 3.2, containing genotype information of 84 individuals.

Usage

data(HapMapCHBChr1)data(HapMapCHBChr1)

Format

A matrix containing 225 rows and 3 columns (AA, AB, BB).

Source

http://hapmap.ncbi.nlm.nih.gov/

References

The International HapMap Consortium (2007). A second generation human haplotype map of over 3.1 million SNPs Nature 449, pp. 851–861.

Calculate expected heterozygosity (He)

Description

Function he calculates the expected heterozygosity for a biallelic genetic variant.

Usage

he(x, bias.correct = TRUE)
he(x, bias.correct = TRUE)

Arguments

`x`	a vector or matrix with genotype counts.
`bias.correct`	if `bias.correct = TRUE` a correction for bias will be applied.

Details

x can be a vector of genotype counts for a single marker (AA,AB,BB) or a three-column matrix of genotype counts for multiple markers.

Value

a vector of expected heterozygosities.

Author(s)

Jan Graffelman ([email protected])

Examples

#
# He for a single marker
#
x <- c(MM=298,MN=489,NN=213)
he(x)
#
# He for a matrix of rmarkers
#
set.seed(123)
X <- HWData(10,100)
he(X)
he(X,bias.correct = FALSE)
#
# He for a single marker
#
x <- c(MM=298,MN=489,NN=213)
he(x)
#
# He for a matrix of rmarkers
#
set.seed(123)
X <- HWData(10,100)
he(X)
he(X,bias.correct = FALSE)

Estimate allele frequencies and test for Hardy-Weinberg equilibrium with a tri-allelic ABO system.

Description

Function HWABO takes four genotype counts ("A","B","AB","OO") and estimates the three allele frequencies using the EM algorithm.

Usage

HWABO(x, p = c(1/3, 1/3, 1/3), maxiter = 50, tol = 1e-10, verbose = TRUE) 
HWABO(x, p = c(1/3, 1/3, 1/3), maxiter = 50, tol = 1e-10, verbose = TRUE)

Arguments

`x`	a vector with genotype counts ("A","B","AB","OO").
`p`	a vector with initial allele frequencies (by default (1/3,1/3,1/3)).
`maxiter`	maximum number of iterations.
`tol`	tolerance for convergence, 1e-10 by default
`verbose`	print iteration history or not.

Value

`pn`	vector with estimated allele frequencies.
`It.hist`	iteration history with log-likelihood.
`expected`	expected genotype counts under HWE.

Author(s)

Jan Graffelman [email protected]

Examples

   x <- c(fA=182,fB=60,nAB=17,nOO=176)
   al.fre <- HWABO(x)
#al2 <- HWABO(x,p=c(0.99,0.01,0.01),maxiter=25)
#al3 <- HWABO(x,p=c(0.01,0.99,0.01),maxiter=25)
#al4 <- HWABO(x,p=c(0.01,0.01,0.99),maxiter=25)
x <- c(fA=182,fB=60,nAB=17,nOO=176)
   al.fre <- HWABO(x)
#al2 <- HWABO(x,p=c(0.99,0.01,0.01),maxiter=25)
#al3 <- HWABO(x,p=c(0.01,0.99,0.01),maxiter=25)
#al4 <- HWABO(x,p=c(0.01,0.01,0.99),maxiter=25)

Compute Akaike's Information Criterion (AIC) for HWP and EAF models

Description

Function HWAIC calculates Akaike's Information Criterion for ten different models that describe a bi-allelic genetic variant: M11: Hardy-Weinberg proportions and equality of allele frequencies in the sexes (HWP & EAF); M12: EAF and HWP in males only; M13: EAF and HWP in females only; M14: EAF and equality of inbreeding coefficients in the sexes (EIC); M15: EAF only; M21: HWP in both sexes; M22: HWP for males only; M23: HWP for females only; M24: EIC only; M25: None of the previous.

Usage

HWAIC(x, y, tracing = 0, tol = 0.000001)
HWAIC(x, y, tracing = 0, tol = 0.000001)

Arguments

`x`	Male genotype counts (AA,AB,BB)
`y`	Female genotype counts (AA,AB,BB)
`tracing`	Activate tracing in the maximization of some likelihoods (0=no tracing; 1:tracing)
`tol`	tolerance for iterative maximization of some likelihoods

Details

The log-likelihood for the six models is calculated. For two models (C and E) this is done numerically using package RSolnp.

Value

A named vector containing 6 values for AIC

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Weir, B.S. (2018) On the testing of Hardy-Weinberg proportions and equality of allele frequencies in males and females at bi-allelic genetic markers. Genetic Epidemiology 42(1) pp. 34-48. doi:10.1002/gepi.22079

Examples

males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11) 
stats <- HWAIC(males,females)
print(stats)
males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11) 
stats <- HWAIC(males,females)
print(stats)

Perform all tests for Hardy-Weinberg equilibrium

Description

HWAlltests performs all classical frequentists tests for Hardy-Weinberg equilibrium and lists their p-values.

Usage

HWAlltests(x, verbose = TRUE, include.permutation.test = FALSE, x.linked = FALSE) 
HWAlltests(x, verbose = TRUE, include.permutation.test = FALSE, x.linked = FALSE)

Arguments

`x`	a vector with a set of genotype counts (AA, AB, BB)
`verbose`	print output if set to TRUE
`include.permutation.test`	turns on the permutation test if set to TRUE
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.

Details

By default the permutation test is not performed in order to reduce computing time.

Value

A dataframe with test statistics and p-values.

Author(s)

Jan Graffelman [email protected]

Examples

x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HWAlltests(x,verbose=TRUE)
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HWAlltests(x,verbose=TRUE)

Compute additive log-ratio transformation

Description

HWAlr computes the additive log-ratio transformation for genotype counts of bi-allelic genetic markers.

Usage

HWAlr(X, zeroadj = 0.5, denominator = 2)
HWAlr(X, zeroadj = 0.5, denominator = 2)

Arguments

`X`	A matrix of genotype counts (columns AA, AB and BB)
`zeroadj`	A zero adjustment parameter (0.5 by default)
`denominator`	The genotype count put in the denominator of the log-ratio (1=AA, 2=AB, 3=BB)

Value

A matrix or vector of log-ratio coordinates

Author(s)

Jan Graffelman ([email protected])

References

Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a non-parametric compositional approach. In: Vera Pawlowsky-Glahn and Antonella Buccianti (eds.) Compositional Data Analysis: Theory and Applications, John Wiley & Sons, Ltd, pp. 207-215

Examples

   X <- HWData(100,100)
   Y <- HWAlr(X)
X <- HWData(100,100)
   Y <- HWAlr(X)

Plot genetic markers in additive log-ratio coordinates

Description

HWAlrPlot creates a scatter plot of the log-ratio coordinates of bi-allelic genetic markers. Hardy-Weinberg equilibrium is indicated by a straigh line in the plot.

Usage

HWAlrPlot(X, zeroadj = 0.5)
HWAlrPlot(X, zeroadj = 0.5)

Arguments

`X`	A matrix of genotype counts (columns AA, AB, BB)
`zeroadj`	Zero-adjustment parameter. Zero counts in the count matrix are substituted by `zeroadj` which is 0.5 by default.

Value

NULL

Author(s)

Jan Graffelman ([email protected])

References

Examples

   X <- HWClo(HWData(100,100))
   HWAlrPlot(X)
X <- HWClo(HWData(100,100))
   HWAlrPlot(X)

Chi square tests for Hardy Weinberg equilibrium

Description

HWChisq performs the chi-square test for Hardy-Weinberg equilibrium both for autosomal and X-chromosomal markers; it can dealt both with bi-allelic and multi-allelic variants.

Usage

HWChisq(X, cc = 0.5, verbose = TRUE, x.linked = FALSE, phifixed = NULL) 
HWChisq(X, cc = 0.5, verbose = TRUE, x.linked = FALSE, phifixed = NULL)

Arguments

`X`	For bi-allelic variants, `X` is a vector containg the genotypic counts (AA,AB,BB for autosomal markers c(A,B,AA,AB,BB) for X-chromosomal markers). For multi-allelic variants, `X` is a lower triangular matrix with genotype counts, homozygotes on the diagonal and heterozygotes below the diagonal.
`cc`	`cc` the continuity correction parameter, the correction is only applied to bi-allelic markers (default `cc = 0.5`).
`verbose`	`verbose = TRUE` prints results, `verbose = FALSE` is silent.
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.
`phifixed`	(For X-chromosomal markers only) `phifixed=NULL` indicates that the fraction of males (females) should be estimated from the data (default). If set to any other value (e.g. `phifixed=0.5`) then the sample is assumed to come from a population with the specified fraction of males.

Details

HWChisq does a chi-square test for Hardy-Weinberg equilibrium, and by default applies a continuity correction. For extreme allele frequencies, the continuity correction can lead to excessive type 1 error rates, and is better turned off in that case. The continuity correction can be turned off by specifying cc=0.

HWChisq can do the chi-square test for both autosomal and X-chrosomal markers. By setting x.linked = TRUE the marker will be assumed to be on the X-chromosome, and the count vector supplied should have 5 elements instead of 3 elements for an autosomal marker. For X-chromsomal markers argument phifixed is in general best left to its default value (NULL). Only in specific situations where the theoretical population sex ratio is known (e.g. in simulation studies where a universe with known gender ratio is sampled) phifixed could be set to the theoretical ratio of interest.

With bi-allelic variants, when alternative is set to less, a one-sided test for against a negative inbreeding coefficient (heterozygote excess) is performed. When alternative is set to greater a one-sided test for against a positive inbreeding coefficient (lack of heterozygotes) is performed.

For multi-allelic variants, which typically do have some rare alleles and rare genotypes, the asymptotic chi-square test is in general not recommended, and exact test procedures or permutation tests are recommended (see HWExact or HWPerm.mult).

Value

HWChisq returns a list with the components:

`chisq`	value of the chi-square statistic. NA is returned if the marker is monomorphic.
`pval`	p-value of the chi-square test for Hardy-Weinberg equilibrium.
`D`	Half the deviation from Hardy-Weinberg equilibrium for the AB genotype.
`p`	the allele frequency of A.
`f`	the inbreeding coefficient.
`expected`	the expected counts under Hardy-Weinberg equilibrium.
`chi.contrib`	the contributions of the different genotypes to the chi-square statistic.

Author(s)

Jan Graffelman [email protected]

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.

For the chi-square test for X-linked markers:

Graffelman, J. & Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20

Examples

#
# Test for an autosomal blood group marker
#
x <- c(MM=298,MN=489,NN=213)
HW.test <- HWChisq(x,verbose=TRUE)
#
# Same test without continuity correction
#
HW.test <- HWChisq(x,cc=0,verbose=TRUE)
#
# Test for X-chromsomal SNPs.
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWChisq(rs5968922,cc=0,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWChisq(y,x.linked=TRUE,cc=0)
#
# Test a multi-allelic microsatellite
#
data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GC <- AllelesToTriangular(A1,A2)
HW.test <- HWChisq(GC)
#
# retaining only the three  most common alleles
#
ii <- (A1 == 10 | A1 == 11 | A1 == 12) & (A2 == 10 | A2 == 11 | A2 == 12)
A1s <- A1[ii]
A2s <- A2[ii]
GC <- AllelesToTriangular(A1s,A2s)
HW.test <- HWChisq(GC)
#
# Test for an autosomal blood group marker
#
x <- c(MM=298,MN=489,NN=213)
HW.test <- HWChisq(x,verbose=TRUE)
#
# Same test without continuity correction
#
HW.test <- HWChisq(x,cc=0,verbose=TRUE)
#
# Test for X-chromsomal SNPs.
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWChisq(rs5968922,cc=0,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWChisq(y,x.linked=TRUE,cc=0)
#
# Test a multi-allelic microsatellite
#
data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GC <- AllelesToTriangular(A1,A2)
HW.test <- HWChisq(GC)
#
# retaining only the three  most common alleles
#
ii <- (A1 == 10 | A1 == 11 | A1 == 12) & (A2 == 10 | A2 == 11 | A2 == 12)
A1s <- A1[ii]
A2s <- A2[ii]
GC <- AllelesToTriangular(A1s,A2s)
HW.test <- HWChisq(GC)

Matrix version of HWChisq

Description

Function HWChisqMat executes the Chisquare test for HWE for each row in a matrix. This function is deprecated and it is better to use the faster HWChisqStats instead.

Usage

HWChisqMat(X, ...)
HWChisqMat(X, ...)

Arguments

`X`	A n times 3 matrix of genotypic counts (AA,AB,BB)
`...`	extra arguments that are passed on to `HWChisqStats`

Value

a vector with chi-square statistics or p-values

Author(s)

Jan Graffelman [email protected]

Examples

X <- HWData(100,100)
colnames(X) <- c("MM","MN","NN")
Results <- HWChisqMat(X)
X <- HWData(100,100)
colnames(X) <- c("MM","MN","NN")
Results <- HWChisqMat(X)

Fast computation of chi-square statistics for Hardy-Weinberg equilibrium

Description

HWChisqStats is a function for the fast computation of chi-square statistics (or the corresponding p-values) for a large set of bi-allelic markers (typically SNPs).

Usage

HWChisqStats(X, x.linked = FALSE, pvalues = FALSE)
HWChisqStats(X, x.linked = FALSE, pvalues = FALSE)

Arguments

`X`	A matrix with genotype counts, one row per marker. `X` should have 5 columns for an X-chromosomal data set and 3 columns for an autosomal data set.
`x.linked`	Logical indicating whether the markers are autosomal (`x.linked=FALSE`) or X-chromosomal (`x.linked=TRUE`).
`pvalues`	Logical indicated whether chi-square statistics should be returned (`pvalues=FALSE`) or whether p-values should be returned (`pvalues=TRUE`).

Details

Matrix X should strictly comply with the following format. For an autosomal dataset is should contain the 3 genotype counts in order (AA,AB,BB). For an X-chromosomal dataset it should contain the 5 genotype counts in order (A,B,AA,AB,BB) where A and B are the male counts and AA, AB and BB the female counts.

This function was written for speed improvement, and should be much faster than looping over the rows of X with HWChisq. There is no error checking on the supplied data matrix.

Value

A vector of chi-square statistics

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at 2 bi-allelic genetic markers on the X chromosome.

Examples

#
# Autosomal example
#
set.seed(123)
X <- HWData(1000,100)
monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0)
X <- X[!monom,] # exclude monomorphics
Chisq.stats <- HWChisqStats(X,x.linked=FALSE,pvalues=FALSE)
Chisq.pvals <- HWChisqStats(X,x.linked=FALSE,pvalues=TRUE)
#
# X-chromosomal example
#
X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE)
Chisq.stats <- HWChisqStats(X,x.linked=TRUE,pvalues=FALSE)
Chisq.pvals <- HWChisqStats(X,x.linked=TRUE,pvalues=TRUE)
#
# Autosomal example
#
set.seed(123)
X <- HWData(1000,100)
monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0)
X <- X[!monom,] # exclude monomorphics
Chisq.stats <- HWChisqStats(X,x.linked=FALSE,pvalues=FALSE)
Chisq.pvals <- HWChisqStats(X,x.linked=FALSE,pvalues=TRUE)
#
# X-chromosomal example
#
X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE)
Chisq.stats <- HWChisqStats(X,x.linked=TRUE,pvalues=FALSE)
Chisq.pvals <- HWChisqStats(X,x.linked=TRUE,pvalues=TRUE)

Convert genotype counts to compositions

Description

Function HWClo divides each row of a matrix by its total, and so produces matrix of compositions.

Usage

HWClo(X)
HWClo(X)

Arguments

`X`	A matrix of (genotype) counts

Value

A matrix

Author(s)

Jan Graffelman [email protected]

Examples

  X <- HWData(2,100)
  Y <- HWClo(X)
X <- HWData(2,100)
  Y <- HWClo(X)

Compute the centred log-ratio transformation

Description

HWClr computes the centred log-ratio transformation for genotype counts of bi-allelic genetic markers.

Usage

HWClr(X, zeroadj = 0.5)
HWClr(X, zeroadj = 0.5)

Arguments

`X`	A matrix of genotype counts (columns AA, AB and BB)
`zeroadj`	A zero adjustment parameter (0.5 by default)

Value

A matrix or vector of log-ratio coordinates

Author(s)

Jan Graffelman ([email protected])

References

Examples

   X <- HWData(100,100)
   Y <- HWClr(X)
X <- HWData(100,100)
   Y <- HWClr(X)

Plot genetic markers in centred log-ratio coordinates

Description

HWClrPlot creates a scatter plot of the centred log-ratio coordinates of bi-allelic genetic markers. Hardy-Weinberg equilibrium is indicated by a straigh line in the plot.

Usage

HWClrPlot(X, zeroadj = 0.5)
HWClrPlot(X, zeroadj = 0.5)

Arguments

`X`	A matrix of genotype counts (columns AA, AB, BB)
`zeroadj`	Zero-adjustment parameter. Zero counts in the count matrix are substituted by `zeroadj` which is 0.5 by default.

Value

NULL

Author(s)

Jan Graffelman ([email protected])

References

Examples

   X <- HWClo(HWData(100,100))
   HWClrPlot(X)
X <- HWClo(HWData(100,100))
   HWClrPlot(X)

Compute probability of a genotypic sample

Description

Computes the probability of a particular genotypic sample given the allele count, sample size and number of heterozygotes.

Usage

HWCondProbAB(n, nA, nAB)
HWCondProbAB(n, nA, nAB)

Arguments

`n`	`n` is the total sample size (total number of individuals)
`nA`	`nA` is the number of A alleles in the sample
`nAB`	`nAB` is the number of heterozygotes in the sample

Value

`p`	probability of the particular sample

Author(s)

Jan Graffelman ([email protected])

Examples

x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
n <- sum(x)
nM <- 2*x[1]+x[2]
nMN <- x[2]
p <- HWCondProbAB(n,nM,nMN)
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
n <- sum(x)
nM <- 2*x[1]+x[2]
nMN <- x[2]
p <- HWCondProbAB(n,nM,nMN)

Compute disequilibrium statistic D

Description

Function HWD computes Weir's disequilibrium coefficient D.

Usage

HWD(X)
HWD(X)

Arguments

`X`	a vector of genotype counts (AA, AB, BB)

Value

Returns the disequilibrium coefficient

Author(s)

Jan Graffelman [email protected]

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.

Examples

x <- c(MM=298,MN=489,NN=213)
D <- HWD(x)
cat("Disequilibrium coefficient: ",D,"\n")
x <- c(MM=298,MN=489,NN=213)
D <- HWD(x)
cat("Disequilibrium coefficient: ",D,"\n")

Generate genetic marker data in or out of Hardy-Weinberg Equilibrium

Description

HWData generates samples of genotypic counts under various schemes. It mainly uses sampling from the multinomial distribution for given or random allele frequencies, either assuming Hardy-Weinberg proportions or a specified degree of inbreeding. Sampling can also be performed conditional on the allele frequency. The same procedures are also available for X linked markers.

Usage

HWData(nm = 100, n = rep(100, nm), f = rep(0, nm), p = NULL, conditional = FALSE,
exactequilibrium = FALSE, x.linked = FALSE, nA = NULL, n.males = round(0.5 * n),
shape1 = 1, shape2 = 1, counts = TRUE)
HWData(nm = 100, n = rep(100, nm), f = rep(0, nm), p = NULL, conditional = FALSE,
exactequilibrium = FALSE, x.linked = FALSE, nA = NULL, n.males = round(0.5 * n),
shape1 = 1, shape2 = 1, counts = TRUE)

Arguments

`nm`	The number of bi-allelic markers.
`n`	The sample sizes.
`f`	The inbreeding coefficients (only for autosomal markers)
`p`	a vector of allele frequencies
`conditional`	if `TRUE` The Levene-Haldane distribution will be used for sampling autosomal markers, the Graffelman-Weir distribution for sampling X chromosomal markers. If `FALSE` a multinomial distribution is used
`exactequilibrium`	generates data in exact HWE if set to `TRUE`
`x.linked`	Simulated autosomal markers (`x.linked=FALSE`, the default) or X-chromosomal markers (`x.linked=TRUE`)
`nA`	A vector of minor allele counts, one for each marker. If not specified, it will be calculated from `p`
`n.males`	The number of males (only relevant if `x.linked = TRUE`)
`shape1`	First shape parameter of the beta distribution used to generate allele frequencies
`shape2`	Second shape parameter of the beta distribution used to generate allele frequencies
`counts`	If `counts=TRUE` the output are genotype counts, if `counts=TRUE` relative genotype frequencies

Details

Option pfixed is deprecated and replaced by conditional Option pdist is deprecated and replaced by parameters shape1 and shape2

HWData returns a matrix of genotype counts, nm by 3 for autsomal markers or nm by 5 for X-chromosomal markers.

If the inbreeding coefficient is specified (f) it will only take effect for autosomal markers (x.linked=FALSE) and multinomial sampling (conditional=FALSE).

Value

`X`	A matrix containing the genotype counts.

Author(s)

Jan Graffelman ([email protected])

Examples

#
# Generate 100 SNPs with uniform allele frequency under the equilibrium assumption.
#
out <- HWData(nm=100,n=100)
#
# Generate genotype frequencies of 100 SNPs with uniform allele frequency assuming exact equilbrium.
#
X <- HWData(nm=100, exactequilibrium = TRUE, counts = FALSE)
#
# Generate 100 SNPs (as counts), all having an expected A allele frequency of 0.50
#
X <- HWData(nm=100,p=0.5)
#
#  Generate 100 SNPs, 50 with A allele frequency 0.25 and 50 with A allele frequency 0.75,
#  assuming fixed allele frequencies. 
#
X <- HWData(nm=100,p=rep(c(0.25,0.75),50), conditional = TRUE)
#
#  Generate 100 SNPs with a skewed (beta) distribution of the allele frequency,
#  rich in variants with a low minor allele frequency.
#
X <- HWData(nm=100,shape1=1,shape2=10)
#
# Generate 100 X chromosomal SNPs with uniformly distributed allele frequency.
#
X <- HWData(nm=100, x.linked = TRUE)
#
# Generate 100 SNPs with uniform allele frequency under the equilibrium assumption.
#
out <- HWData(nm=100,n=100)
#
# Generate genotype frequencies of 100 SNPs with uniform allele frequency assuming exact equilbrium.
#
X <- HWData(nm=100, exactequilibrium = TRUE, counts = FALSE)
#
# Generate 100 SNPs (as counts), all having an expected A allele frequency of 0.50
#
X <- HWData(nm=100,p=0.5)
#
#  Generate 100 SNPs, 50 with A allele frequency 0.25 and 50 with A allele frequency 0.75,
#  assuming fixed allele frequencies. 
#
X <- HWData(nm=100,p=rep(c(0.25,0.75),50), conditional = TRUE)
#
#  Generate 100 SNPs with a skewed (beta) distribution of the allele frequency,
#  rich in variants with a low minor allele frequency.
#
X <- HWData(nm=100,shape1=1,shape2=10)
#
# Generate 100 X chromosomal SNPs with uniformly distributed allele frequency.
#
X <- HWData(nm=100, x.linked = TRUE)

Estimation of contributions of two populations to a sample of genotype frequencies with the EM algorithm.

Description

Function HWEM estimates the relative contributions of two populations with different allele frequencies that contributed to the genotype frequencies of sample which is a mixture of these two populations.

Usage

HWEM(x, p = NULL, G = NULL, delta.init = c(0.5, 0.5), itmax = 50,
     eps = 1e-06, verbose = FALSE)
HWEM(x, p = NULL, G = NULL, delta.init = c(0.5, 0.5), itmax = 50,
     eps = 1e-06, verbose = FALSE)

Arguments

`x`	a 3x1 vector of genotype frequencies of the mixed sample, either counts or proportions.
`p`	a vector with two allele frequencies.
`G`	a 3x2 matrix with genotype frequencies as columns.
`delta.init`	initial values for the contributions of each populations
`itmax`	maximum number of iterations
`eps`	tolerance criterion for convergence
`verbose`	`verbose = TRUE` prints results, `verbose = FALSE` is silent.

Details

HWEM employs the EM algorithm to estimate the contribution of two populations to the genotype frequencies of a bi-allelic marker, assumed to be typed for a mixed sample with individuals coming from two different populations with different allele frequencies. To estimate the contributions, both the mixed sample genotype frequencies (x) and either the genotype frequencies (G) or the allele frequencies (p) of the original populations must be specified. In case only allele frequencies are specified, Hardy-Weinberg proportions are assumed for the genotype frequencies.

Value

a vector with two proportions, ordered according to the specified allele or genotype frequencies.

Author(s)

Jan Graffelman [email protected]

References

Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1) pp. 1–38.

Examples

# genotype frequencies population 1
g1 <- c(0.034, 0.330, 0.636)
# genotype frequencies population 2
g2 <- c(0.349, 0.493, 0.158)
# sample from the mixed population
x  <- c(0.270, 0.453,0.277)
#
# estimation based on genotype frequencies
#
G <- cbind(g1,g2)
contributions <- HWEM(x,G=G)
#
# estimation based on allele frequencies
#
p <- c(af(g1),af(g2))
contributions <- HWEM(x,p=p)
# genotype frequencies population 1
g1 <- c(0.034, 0.330, 0.636)
# genotype frequencies population 2
g2 <- c(0.349, 0.493, 0.158)
# sample from the mixed population
x  <- c(0.270, 0.453,0.277)
#
# estimation based on genotype frequencies
#
G <- cbind(g1,g2)
contributions <- HWEM(x,G=G)
#
# estimation based on allele frequencies
#
p <- c(af(g1),af(g2))
contributions <- HWEM(x,p=p)

Exact test for Hardy-Weinberg equilibrium

Description

HWExact performs an exact test for Hardy-Weinberg equilibrium

Usage

HWExact(X, alternative = "two.sided", pvaluetype = "selome", eps=1e-10, x.linked =
FALSE, verbose = TRUE)
HWExact(X, alternative = "two.sided", pvaluetype = "selome", eps=1e-10, x.linked =
FALSE, verbose = TRUE)

Arguments

`X`	vector with the genotype counts AA, AB, BB
`alternative`	`two.sided` (default) will perform a two-sided test where both an excess and a dearth of heterozygotes count as evidence against HWE. `less` is a one-sided test where only dearth of heterozygotes counts a evidence against HWE, `greater` is a one-sided test where only excess of heterozygotes counts as evidence against HWE.
`pvaluetype`	if `pvaluetype` is set to `dost` then the p-value of a two-sided test is computed as twice the tail area of a one-sided test. When set to `selome`, the p-value is computed as the sum of the probabilities of all samples less or equally likely as the current sample. When set to `midp`, the p-value is computed as half the probability of the current sample + the probabilities of all samples that are more extreme.
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.
`eps`	a tolerance that can be set for comparing probabilities in order to include tied outcomes
`verbose`	print results or not.

Details

HWExact uses the recursion equations described by Wigginton et. al.

For testing large sets of bi-allelic variants, use the faster code in HWExactStats.

For large samples, HWExact may give the error message: "evaluation nested too deeply: infinite recursion". This can usually be resolved by increasing R's limit on nested expressions with options(expressions=10000) or a higher limit. With higher limits, the error message "protect(): protection stack overflow" can occur. This error can usually be resolved by increasing R's protection stack with the command line option --max-ppsize 100000 or higer values. However, with such large samples the exact test will give virtually the same result as a chi-square test, and it may be easier to use HWChisq in these circumstances.

Value

`pval`	p-value of the exact test
`prob`	probabilities of all possible samples with the same sample size and minor allele count
`pofthesample`	probability of the observed sample

Author(s)

Jan Graffelman ([email protected])

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.

Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium, American Journal of Human Genetics (76) pp. 887-893.

Graffelman, J. and Moreno, V. (2013) The mid p-value in exact tests for Hardy-Weinberg equilibrium, Statistical Applications in Genetics and Molecular Biology 12(4) pp. 433-448.

Examples

#
# Example for an autosomal marker using the standard exact p-value
#
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HW.test <- HWExact(x,verbose=TRUE)
#
# Example for an autosomal marker using the mid p-value
#
HW.test <- HWExact(x,verbose=TRUE,pvaluetype="midp")
#
# Example x-linked markers
#
rs5968922  <- c(A=392, B=212, AA=275, AB=296, BB=80 )
HWExact(rs5968922,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWExact(y,x.linked=TRUE)
#
#
#
# Example for an autosomal marker using the standard exact p-value
#
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HW.test <- HWExact(x,verbose=TRUE)
#
# Example for an autosomal marker using the mid p-value
#
HW.test <- HWExact(x,verbose=TRUE,pvaluetype="midp")
#
# Example x-linked markers
#
rs5968922  <- c(A=392, B=212, AA=275, AB=296, BB=80 )
HWExact(rs5968922,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWExact(y,x.linked=TRUE)
#
#

Matrix version of HWExact

Description

Function HWExactMat is deprecated; use HWExactStats instead.

Usage

HWExactMat(X, ...)
HWExactMat(X, ...)

Arguments

`X`	A n times 3 matrix of genotypic counts (AA,AB,BB)
`...`	extra arguments that are passed on to `HWExact`

Value

pvalvec

Vector with the p-values of each test

Author(s)

Jan Graffelman [email protected]

Examples

X <- HWData(100,100)
colnames(X) <- c("MM","MN","NN")
Results <- HWExactMat(X)
Output <- cbind(X,Results)
print(Output)
X <- HWData(100,100)
colnames(X) <- c("MM","MN","NN")
Results <- HWExactMat(X)
Output <- cbind(X,Results)
print(Output)

Exact test for Hardy-Weinberg equilibrium

Description

HWExactPrevious performs an exact test for Hardy-Weinberg equilibrium

Usage

HWExactPrevious(X, alternative = "two.sided", pvaluetype = "selome",
x.linked = FALSE, verbose = FALSE)
HWExactPrevious(X, alternative = "two.sided", pvaluetype = "selome",
x.linked = FALSE, verbose = FALSE)

Arguments

`X`	vector with the genotype counts AA, AB, BB
`alternative`	`two.sided` (default) will perform a two-sided test where both an excess and a dearth of heterozygotes count as evidence against HWE. `less` is a one-sided test where only dearth of heterozygotes counts a evidence against HWE, `greater` is a one-sided test where only excess of heterozygotes counts as evidence against HWE.
`pvaluetype`	if `pvaluetype` is set to `dost` then the p-value of a two-sided test is computed as twice the tail area of a one-sided test. When set to `selome`, the p-value is computed as the sum of the probabilities of all samples less or equally likely as the current sample. When set to `midp`, the p-value is computed as half the probability of the current sample + the probabilities of all samples that are more extreme.
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.
`verbose`	print results or not.

Details

HWExactPrevious uses the recursion equations described by Wigginton et. al.

For large samples, HWExactPrevious may give the error message: "evaluation nested too deeply: infinite recursion". This can usually be resolved by increasing R's limit on nested expressions with options(expressions=10000) or a higher limit. With higher limits, the error message "protect(): protection stack overflow" can occur. This error can usually be resolved by increasing R's protection stack with the command line option --max-ppsize 100000 or higer values. However, with such large samples the exact test will give virtually the same result as a chi-square test, and it may be easier to use HWChisq in these circumstances.

Value

`pval`	p-value of the exact test
`prob`	probabilities of all possible samples with the same sample size and minor allele count
`pofthesample`	probability of the observed sample

Author(s)

Jan Graffelman ([email protected])

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter3.

Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium, American Journal of Human Genetics (76) pp. 887-893.

Examples

#
# Example autosomal marker
#
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
## Not run: HW.test <- HWExactPrevious(x,verbose=TRUE)
#
# Example x-linked marker
#
rs5968922  <- c(A=392, B=212, AA=275, AB=296, BB=80 )
## Not run: HWExactPrevious(rs5968922,x.linked=TRUE,verbose=TRUE) 
#
# Example autosomal marker
#
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
## Not run: HW.test <- HWExactPrevious(x,verbose=TRUE)
#
# Example x-linked marker
#
rs5968922  <- c(A=392, B=212, AA=275, AB=296, BB=80 )
## Not run: HWExactPrevious(rs5968922,x.linked=TRUE,verbose=TRUE)

Computation of Exact p-values for Hardy-Weinberg equilibrium for sets of SNPs

Description

HWExactStats is a function for the computation of Exact p-values for a large set of bi-allelic markers (typically SNPs).

Usage

HWExactStats(X, x.linked = FALSE, plinkcode = TRUE, midp = FALSE,...)
HWExactStats(X, x.linked = FALSE, plinkcode = TRUE, midp = FALSE,...)

Arguments

`X`	A matrix with genotype counts, one row per marker. `X` should have 5 columns for an X-chromosomal data set and 3 columns for an autosomal data set.
`x.linked`	Logical indicating whether the markers are autosomal (`x.linked=FALSE`) or X-chromosomal (`x.linked=TRUE`).
`plinkcode`	Logical indicating whether to use faster C++ code from the PLINK software.
`midp`	Logical indicating whether to use the mid p-value for the C++ code or not
`...`	extra arguments that are passed on to `HWExact`

Details

Argument plinkcode=TRUE (the default) will use C++ code for faster calculation (functions SNPHWE2 and SNPHWEX) with larger datasets. The C++ code was generously shared by Christopher Chang, and the same code is used in the program PLINK (2.0).

Value

A vector of p-values

Author(s)

Jan Graffelman [email protected] (R code) and Christopher Chang [email protected] (C++ code)

References

Graffelman, J. and Weir, B.S. (2016) Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. Heredity 116(6) pp. 558–568. doi:10.1038/hdy.2016.20

Purcell et al. (2007) PLINK: A Toolset for Whole-Genome Association and Population-Based Linkage Analysis. American Journal of Human Genetics 81(3) pp. 559–575.

Examples

#
# Autosomal example
#
set.seed(123)
X <- HWData(1000,100)
monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0)
X <- X[!monom,] # exclude monomorphics
Exact.pvalues <- HWExactStats(X,x.linked=FALSE)
#
# X-chromosomal example
#
X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE)
Exact.pvalues <- HWExactStats(X,x.linked=TRUE)
#
# Autosomal example
#
set.seed(123)
X <- HWData(1000,100)
monom <- (X[,2]==0 & X[,1]==0) | (X[,2]==0 & X[,3]==0)
X <- X[!monom,] # exclude monomorphics
Exact.pvalues <- HWExactStats(X,x.linked=FALSE)
#
# X-chromosomal example
#
X <- HWData(1000,100,n.males=50,nA=75,x.linked=TRUE)
Exact.pvalues <- HWExactStats(X,x.linked=TRUE)

Computation of inbreeding coefficient

Description

HWf computes the inbreeding coefficient for sample of genotype counts, or a matrix of genotype counts.

Usage

HWf(X)
HWf(X)

Arguments

`X`	a vector or matrix of genotype counts (AA, AB, BB)

Details

For monomorphic markers a warning is issued, and the estimate for the inbreeding coefficient is NaN.

Value

Returns a single inbreeding coefficient (intraclass correlation coefficient), if X is a single sample, or a vector of inbreeding coefficients, if X is a matrix with genotype counts.

Author(s)

Jan Graffelman [email protected]

References

Crow, J. F. and Kimura, M. (1970) An introduction to population genetics theory. Harper & Row, publishers, New York

Examples

#
# A single sample
#
x <- c(MM=298,MN=489,NN=213)
fhat <- HWf(x)
fhat
#
# Multiple samples
#
set.seed(123)
X <- HWData(nm=100,n=1000)
fhat <- HWf(X)
#
# A single sample
#
x <- c(MM=298,MN=489,NN=213)
fhat <- HWf(x)
fhat
#
# Multiple samples
#
set.seed(123)
X <- HWData(nm=100,n=1000)
fhat <- HWf(X)

Scatter plot of the genotype frequencies

Description

HWGenotypePlot makes a scatterplots of the AB or BB frequency versus the AA frequency and represents a blue curve indicating the Hardy-Weinberg equilibrium condition.

Usage

HWGenotypePlot(X, plottype = 1, xlab = expression(f[AA]), ylab =
ifelse(plottype == 1, expression(f[AB]), expression(f[BB])), asp = 1,
pch = 19, xlim = c(0, 1), ylim = c(0, 1), cex = 1, cex.axis = 2, cex.lab = 2, ...)
HWGenotypePlot(X, plottype = 1, xlab = expression(f[AA]), ylab =
ifelse(plottype == 1, expression(f[AB]), expression(f[BB])), asp = 1,
pch = 19, xlim = c(0, 1), ylim = c(0, 1), cex = 1, cex.axis = 2, cex.lab = 2, ...)

Arguments

`X`	A matrix of genotype counts or frequencies with three columns (AA, AB, BB)
`plottype`	`plottype=1` produces a plot of AB versus AA, `plottype=2` produced a plot of BB versus AA.
`xlab`	A label for the x axis
`ylab`	A label for the y axis
`asp`	Aspec ratio (1 by default)
`pch`	Plotting charachter (19 by default)
`xlim`	Limits for the x axis (0-1 by default)
`ylim`	Limits for the y axis (0-1 by default)
`cex`	Character expansion factor (1 by default)
`cex.axis`	Character expansion factor for the axes (2 by default)
`cex.lab`	Character expansion factor for labels of axis (2 by default)
`...`	Additional arguments for the `plot` function

Value

NULL

Author(s)

Jan Graffelman [email protected]

Examples

n <- 100 # sample size
m <- 100 # number of markers
Xc <- HWClo(HWData(n,m))
HWGenotypePlot(Xc,plottype=1,main="Heterozygote-homozygote scatterplot")
n <- 100 # sample size
m <- 100 # number of markers
Xc <- HWClo(HWData(n,m))
HWGenotypePlot(Xc,plottype=1,main="Heterozygote-homozygote scatterplot")

Compute isometric log ratio coordinates.

Description

HWIlr computes isometric log ratio coordinates for genotypic compositions (AA, AB, BB)

Usage

HWIlr(X, zeroadj = 0.5)
HWIlr(X, zeroadj = 0.5)

Arguments

`X`	A matrix of genotype counts, markers in rows, counts for AA, AB and BB in three columns
`zeroadj`	Adjustment for zeros (0.5 by defaults)

Value

A matrix of log ratio coordinates.

Author(s)

Jan Graffelman ([email protected])

References

Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barcelo-Vidal, C. (2003) Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology 35(3), pp. 279-300.

Examples

X <- HWData(100,100)
Y <- HWIlr(X)
X <- HWData(100,100)
Y <- HWIlr(X)

Plot bi-allelic genetic markers in isometric log ratio coordinates

Description

HWIlrPlot makes a scatter plot of the isometric log ratio coordinates for bia-llelic markers.

Usage

HWIlrPlot(X, zeroadj = 0.5, ...)
HWIlrPlot(X, zeroadj = 0.5, ...)

Arguments

`X`	Matrix of genotype counts, one marker per row, AA, AB and BB in three columns
`zeroadj`	Adjustment for zero values (0.5 by default)
`...`	Additional arguments for function `plot`

Value

A matrix of log ratio coordinates.

Author(s)

Jan Graffelman ([email protected])

References

Graffelman, J. and Egozcue, J. J. (2011) Hardy-Weinberg equilibrium: a nonparametric compositional approach. In Pawlowsky-Glahn, V. and Buccianti A., editors, Compositional Data Analysis: Theory and Applications, pages 208-217, John Wiley & Sons, Ltd.

Examples

X <- HWClo(HWData(100,100))
HWIlrPlot(X)
X <- HWClo(HWData(100,100))
HWIlrPlot(X)

Calculate a posteriori density for Lindley's alpha

Description

Function HWLindley calculates the posterior density for disequilibrium measure alpha, as defined by Lindley (1988).

Usage

HWLindley(alphaseq = seq(-3, 3, by = 0.01), x)
HWLindley(alphaseq = seq(-3, 3, by = 0.01), x)

Arguments

`alphaseq`	a single value or a sequence of values for alpha
`x`	the genotype count vector in format (AA,AB,BB)

Details

Numerical integration is used to compute the density.

Value

a vector with values of the density for each value in alphaseq

Author(s)

Jan Graffelman [email protected]

References

Lindley, D.V. (1988) Statistical Inference Concerning Hardy-Weinberg Equilibrium. In: Bernardo, J.M., DeGroot, M.H., Lindley, D.V. and Smith, A.F.M. Bayesian Statistics, 3, pp. 307-326. Oxford University Press.

Examples

x <- c(MM=298,MN=489,NN=213)
post.dens <- HWLindley(seq(-1,1,by=0.01),x)
## Not run: 
plot(seq(-1,1,by=0.01),post.dens,type="l",xlab=expression(alpha),
     ylab=expression(pi(alpha)))
segments(0,0,0,HWLindley(0,x),lty="dotted",col="red")

## End(Not run)
x <- c(MM=298,MN=489,NN=213)
post.dens <- HWLindley(seq(-1,1,by=0.01),x)
## Not run: 
plot(seq(-1,1,by=0.01),post.dens,type="l",xlab=expression(alpha),
     ylab=expression(pi(alpha)))
segments(0,0,0,HWLindley(0,x),lty="dotted",col="red")

## End(Not run)

Calculate a credible interval for Lindley's alpha for HWE,

Description

Function HWLindley.cri calculates a Bayesian credible interval using Lindley's posterior density for equilibrium paramater alpha.

Usage

HWLindley.cri(x, verbose = TRUE, limits = c(0.025, 0.975))
HWLindley.cri(x, verbose = TRUE, limits = c(0.025, 0.975))

Arguments

`x`	a vector of three genotype counts in order (AA,AB,BB).
`verbose`	print output (`verbose=TRUE`) or be silent
`limits`	upper and lower probability limits of the interval

Details

The limits are found by numerical integration over Lindley's density.

Value

a vector with the lower and upper limit of the credible interval

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. (2020) Statistical tests for the Hardy-Weinberg equilibrium. Wiley StatsRef: Statistics Reference Online doi:10.1002/9781118445112.stat08274.

Examples

#
# MN blood group data
#
x <- c(MM=298,MN=489,NN=213)
#
# credible interval of 95%
#
HWLindley.cri(x)
#
# credible interval of 90%
#
HWLindley.cri(x,limits=c(0.05,0.95))
#
# MN blood group data
#
x <- c(MM=298,MN=489,NN=213)
#
# credible interval of 95%
#
HWLindley.cri(x)
#
# credible interval of 90%
#
HWLindley.cri(x,limits=c(0.05,0.95))

Perform most relevant likelihood ratio test for Hardy-Weinberg equilibrium and equality of allele frequencies

Description

Function HWLRAllTests performs a set of likelihood ratio tests in relation with Hardy-Weinberg proportions (HWP) and equality of allele frequencies (EAF) for autosomal bi-allelic genetic variants.

Usage

HWLRAllTests(x, y)
HWLRAllTests(x, y)

Arguments

`x`	Male genotype counts (AA,AB,BB)
`y`	Female genotype counts (AA,AB,BB)

Details

Function HWLRAllTests calls HWLRtest and calculates the p-value of six different tests: 1) joint HWP and EAF (A-F); 2) EAF irrespective of HWP (C-F); 3) HWP irrespective of EAF (D-F); 4) HWP versus EIC (given EAF) (A-B); 5) EIC irrespective of EAF (E-F) and 6) HWP versus EIC. Letters refer to scenarios described by Graffelman & Weir (2018).

Value

A named vector with six p-values

Author(s)

Jan Graffelman [email protected]

References

Examples

males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11)
pvalues <- HWLRAllTests(males,females)
print(pvalues)
males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11)
pvalues <- HWLRAllTests(males,females)
print(pvalues)

Likelihood ratio test for Hardy Weinberg equilibrium

Description

HWLratio performs the Likelihood ratio test for Hardy Weinberg equilibrium, both for autosomal and X-chromosomal markers.

Usage

HWLratio(X, verbose = TRUE, x.linked = FALSE) 
HWLratio(X, verbose = TRUE, x.linked = FALSE)

Arguments

`X`	`X` a vector containing the genotypic counts (AA,AB,BB).
`verbose`	`verbose = TRUE` prints results, `verbose = FALSE` is silent.
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.

Value

HWLratio returns a list with the components:

`Lambda`	the likelihood ratio
`G2`	-2*log(Lambda)
`pval`	the p-value

Author(s)

Jan Graffelman [email protected]

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter 3.

Examples

x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HW.test <- HWLratio(x,verbose=TRUE)
#
# Test for X-chromsomal SNPs.
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWLratio(rs5968922,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWLratio(y,x.linked=TRUE)
#
#
x <- c(298,489,213)
names(x) <- c("MM","MN","NN")
HW.test <- HWLratio(x,verbose=TRUE)
#
# Test for X-chromsomal SNPs.
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWLratio(rs5968922,x.linked=TRUE,verbose=TRUE)
#
#
#
y <- c(GG=48, CG=209, CC=277, G=129, C=337) 
HWLratio(y,x.linked=TRUE)
#
#

Perform likelihood ratio test comparing two nested scenarios for a bi-allelic genetic variant, distinguishing the two sexes.

Description

Program HWLRtest performs a likelihood ratio test comparing two scenarios for an autosomal bi-allelic genetic variant. The scenarios concern Hardy-Weinberg proportions (HWP) and equality of allele frequencies (EAF) in both sexes. The different scenarios are described by Graffelman & Weir (2017).

Usage

HWLRtest(x, y, scene.null = "S1", scene.alt = "S6", verbose = TRUE, tracing = 0)
HWLRtest(x, y, scene.null = "S1", scene.alt = "S6", verbose = TRUE, tracing = 0)

Arguments

`x`	Male genotype counts
`y`	Female genotype counts
`scene.null`	Scenario under the null hypothesis (E.g. "S1")
`scene.alt`	Scenario under the alternative hypothesis (E.g. "S6")
`verbose`	print output or not
`tracing`	Show tracing of the numeric likelihood maximization (1) or not (0).

Details

The different scenarios are indicated with S1, S2, S3, S4, S6 and S6. S1 refers to Hardy-Weinber proportions and equality of allele frequencies. S2 refers to equality of allele frequencies and equality of inbreeding coefficients for the two sexes. S3 refers to equality of allele frequencies irrespective of HWP. S4 refers to HWP irrespective of allele frequencies. S5 refers to equality of inbreeding coefficients irrespective of allele frequencies. S6 is unrestricted.

Value

`G2`	Likelihood ratio statistic
`df`	Degrees of freedom of the likelihood ratio statistic
`pval`	p-value

Author(s)

Jan Graffelman [email protected]

References

Examples

males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11)

#
# test EAF
#

lr1.out <- HWLRtest(males,females,scene.null="S3",scene.alt="S6")

#
# test EIC given EAF
#


lr2.out <- HWLRtest(males,females,scene.null="S2",scene.alt="S3")

#
# test HWP versus EIC, given EAF.
#

lr3.out <- HWLRtest(males,females,scene.null="S1",scene.alt="S2")

males <- c(AA=11,AB=32,BB=13) 
females <- c(AA=14,AB=23,BB=11)

#
# test EAF
#

lr1.out <- HWLRtest(males,females,scene.null="S3",scene.alt="S6")

#
# test EIC given EAF
#


lr2.out <- HWLRtest(males,females,scene.null="S2",scene.alt="S3")

#
# test HWP versus EIC, given EAF.
#

lr3.out <- HWLRtest(males,females,scene.null="S1",scene.alt="S2")

Test a bi-allelic marker for Hardy-Weinberg equilibrium in the presence of missing genotype information.

Description

Function HWMissing imputes missing genotype data with a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Multiple imputation algorithms implemented in the Mice package are used to obtain imputed data sets. Inference for HWE is carried out by estimating the inbreeding coefficient or exact p-values for each imputed data set, and by combining all estimates using Rubin's pooling rules.

Usage

HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05,
    varest = "oneovern", statistic = "chisquare",  alternative =
"two.sided", ...)
HWMissing(X, imputecolumn = 1, m = 50, coding = c(0,1,2), verbose = FALSE, alpha = 0.05,
    varest = "oneovern", statistic = "chisquare",  alternative =
"two.sided", ...)

Arguments

`X`	An input data frame. By default, the first column should contain the SNP with missing values.
`imputecolumn`	Indicates which column of the supplied data frame is to be imputed (by default, the first colum, `imputecolumn=1`
`m`	The number of imputations (50 by default)
`coding`	Indicates how the genotype data is coded (e.g. 0 for AA, 1 for AB, and 2 for BB).
`verbose`	`verbose = TRUE` prints results, `verbose = FALSE` is silent.
`alpha`	significance level (0.05 by default) used when computing confidence intervals
`varest`	Estimator for the variance of the inbreeding coefficient. `varest="oneovern"` is the default and sets the variance under the null (1/n). `varest="bailey"` uses an approximation (see details).
`statistic`	If `statistic = "chisquare"` then inbreeding coefficients (equivalent to chisquare statistics) will be computed for each imputed data set and then combined. If `statistic = "exact"` then one-sided exact tests will be computed for each imputed data set and the resulting p-values will be combined.
`alternative`	`two.sided` (default) will perform a two-sided test where both an excess and a dearth of heterozygotes count as evidence against HWE. `less` is a one-sided test where only dearth of heterozygotes counts a evidence against HWE, `greater` is a one-sided test where only excess of heterozygotes counts as evidence against HWE.
`...`	additional options for function `mice` of the Mice package

Details

The function HWMissing tests one genetic marker (e.g. a SNP) with missings for HWE. By default, this marker is supposed to be the first column of dataframe X. The other columns of X contain covariates to be used in the imputation model. Covariates will typically be other, correlated markers or allele intensities of the SNP to be imputed. Covariate markers should be coded as factor variables whereas allele intensities should be numerical variables. By default, a polytomous regression model will be used to impute the missings. If the covariates also contain missings, an imputation method for each column of X can be specified by using the method of mice (see example below).

If there are no covariates, missings can be imputed under the MCAR assumption. In that case, missings are imputed by taking a random sample from the observed data. This is what HWMissing will do if no covariates are supplied, X being a single factor variable.

Several estimators for the variance of the inbreeding coefficient have been described in the literature. The asymptotic variance of the inbreeding coefficient under the null hypothesis is 1/n, and is used if varest = "oneovern" is used. This is the recommended option. Alternatively, the approximation described in Weir (p. 66) can be used with varest = "bailey".

Value

`Res`	A vector with the inbreeding coefficient, a confidence interval for the inbreeding coefficient, a p-value for a HWE test and missing data statistics.
`Xmat`	A matrix with the genotypic composition of each of the `m` imputed data sets.

Author(s)

Jan Graffelman [email protected]

References

Little, R. J. A. and Rubin, D. B. (2002) Statistical analysis with missing data. Second edition, New York, John Wiley & sons.

Graffelman, J., S\'anchez, M., Cook, S. and Moreno, V. (2013) Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information. PLoS ONE 8(12): e83316. doi:10.1371/journal.pone.0083316

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

data(Markers)
## Not run: 
set.seed(123)
Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR.
set.seed(123)
Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities.
set.seed(123)
Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs

## End(Not run)
data(Markers)
## Not run: 
set.seed(123)
Results <- HWMissing(Markers[,1],m=50,verbose=TRUE)$Res # no covariates, imputation assuming MCAR.
set.seed(123)
Results <- HWMissing(Markers[,1:3],m=50,verbose=TRUE)$Res # impute with two allele intensities.
set.seed(123)
Results <- HWMissing(Markers[,c(1,4,5)],m=50,verbose=TRUE)$Res # impute with two covariate SNPs

## End(Not run)

Autosomal and X-chromosomal exact tests for HWE via a Network algorithm

Description

Program HWNetwork implements a network algorithm for efficient calculation of exact test p-values in HWE tests with multiple alleles.

Usage

HWNetwork(a1, a2, ma = NULL, fe = NULL, gender = NULL, verbose = TRUE)
HWNetwork(a1, a2, ma = NULL, fe = NULL, gender = NULL, verbose = TRUE)

Arguments

`a1`	the first allele (expressed as a number)
`a2`	the second allele (expressed as a number; NA if the variant is X chromosomal)
`ma`	alternative format: vector of male X chromosomal allele counts.
`fe`	triangular matrix of female genotype counts
`gender`	gender of the individual (1=male; 2=female)
`verbose`	be silent (`verbose=FALSE`) or informative (`verbose=TRUE`)

Details

Function HWNetwork accepts data in two formats. Original genotype data (e.g. repeat numbers of microsatellites) can be supplied, or the data can be supplied in summarized form as a male genotype count vector and a female genotype count matrix. If one of the two male alleles is missing (NA) the variant will taken to be X-chromosomal. If all males have two alleles, the variant will taken to be autosomal.

Value

the exact p-value of the test.

Author(s)

Jan Graffelman [email protected]

References

Aoki, S. (2003) Network algorithm for the Exact Test of Hardy-Weinberg Proportion for Multiple Alleles. Biometrical Journal 45(4), pp. 471-490.

Engels, W. R. (2009) Exact Tests for Hardy-Weinberg Proportions. Genetics 183, pp. 1431-1441.

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

#
# From vectors with counts of genotypes
#
data(TSIXTriAllelics)
ma <- as.matrix(TSIXTriAllelics[1,2:4])
names(ma) <- c("A","B","C")

fe <- TSIXTriAllelics[1,5:10]
names(fe) <- c("AA","AB","AC","BB","BC","CC")

fe <- HardyWeinberg:::toTriangularfixed(fe)

HWNetwork(ma=ma,fe=fe)
#
# From vectors with counts of genotypes
#
data(TSIXTriAllelics)
ma <- as.matrix(TSIXTriAllelics[1,2:4])
names(ma) <- c("A","B","C")

fe <- TSIXTriAllelics[1,5:10]
names(fe) <- c("AA","AB","AC","BB","BC","CC")

fe <- HardyWeinberg:::toTriangularfixed(fe)

HWNetwork(ma=ma,fe=fe)

Permutation test for Hardy-Weinberg equilibrium

Description

Function HWPerm does a permutation test for Hardy-Weinberg equilibrium using a user-supplied test statistic.

Usage

HWPerm(x, nperm = 17000, verbose = TRUE, x.linked = FALSE,
FUN = ifelse(x.linked,Chisquare.x,Chisquare), eps=1e-10, ...) 
HWPerm(x, nperm = 17000, verbose = TRUE, x.linked = FALSE,
FUN = ifelse(x.linked,Chisquare.x,Chisquare), eps=1e-10, ...)

Arguments

`x`	A vector of genotype counts (AA,AB,BB)
`nperm`	The number of permutations
`verbose`	`verbose` = TRUE will print results, `verbose` = FALSE is silent.
`x.linked`	`x.linked=FALSE` indicates the marker is autosomal (default), and `x.linked=TRUE` indicates it resides on the X-chromosome.
`FUN`	An function call for calculating the test statistic for HWE (see examples below)
`eps`	Tolerance for comparison of floating point numbers (1e-10 by default)
`...`	Additional parameters for the function call argument `FUN`

Details

The set of alleles for the observed sample is permuted. Consequently, the test is conditional on allele frequency.

Value

HWPerm returns a list with the components:

`stat`	value of the chosen test statistic for the observed sample.
`pval`	p-value of the permutation test.

Author(s)

Jan Graffelman [email protected]

References

Ziegler, A. & Konig, I.R. (2006) A statistical approach to genetic epidemiology. Wiley.

Examples

x <- c(MM=298,MN=489,NN=213)
## Not run: 
#
# use a default chi-square statistic
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE)
#
# use a chi-square statistic with continuity correction.
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE,
FUN=function(z) HWChisq(z,verbose=FALSE)$chisq,cc=0.5) # 
#
# use a likelihood ratio statistic.
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE,
FUN=function(y) HWLratio(y,verbose=FALSE)$G2)
# 
# use an exact test p-value
#
HWPerm(x,nperm=10000,verbose=TRUE,FUN=function(y) 1-HWExact(y,verbose=FALSE)$pval)
# 
#
# Permutation test for a marker on the X chromosome
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWPerm(rs5968922,nperm=10000,x.linked=TRUE,verbose=TRUE)

## End(Not run)
x <- c(MM=298,MN=489,NN=213)
## Not run: 
#
# use a default chi-square statistic
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE)
#
# use a chi-square statistic with continuity correction.
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE,
FUN=function(z) HWChisq(z,verbose=FALSE)$chisq,cc=0.5) # 
#
# use a likelihood ratio statistic.
#
HW.test <- HWPerm(x,nperm=10000,verbose=TRUE,
FUN=function(y) HWLratio(y,verbose=FALSE)$G2)
# 
# use an exact test p-value
#
HWPerm(x,nperm=10000,verbose=TRUE,FUN=function(y) 1-HWExact(y,verbose=FALSE)$pval)
# 
#
# Permutation test for a marker on the X chromosome
#
rs5968922 <-  c(A=392, B=212, AA=275, AB=296, BB=80)
HW.test <- HWPerm(rs5968922,nperm=10000,x.linked=TRUE,verbose=TRUE)

## End(Not run)

Permutation tests fo variants with multiple alleles

Description

Function HWPerm.mult implements permutation tests for Hardy-Weinberg equilibrium for autosomal and X-chromosomal variants.

Usage

HWPerm.mult(x, y = NULL, nperm = 17000, eps = 1e-10, verbose = TRUE, ...)
HWPerm.mult(x, y = NULL, nperm = 17000, eps = 1e-10, verbose = TRUE, ...)

Arguments

`x`	vector or triangular matrix with male genotype counts
`y`	vector or triangular matrix with female genotype counts
`nperm`	number of permutations (17.000 by default)
`eps`	a tolerance for the comparison of floating point numbers
`verbose`	print output or not
`...`	addtional arguments

Details

This function approximates exact test probabilities for joint tests for HWE and equality of allele frequencies for variants with multiple alleles. For purely bi-allelic variant HWPerm can be used which allows for more statistics than just probabilities.

If argument y is not specified, gender is considered irrelevant, and x contains total genotype counts. If x and y are specified, x should contain male genotype counts and y female genotype counts. x and y can be vectors if the variant is bi-allelic, but are assumed lower triangular if there are more than two alleles. x is still a vector if there are multiple alleles but the variant is X-chromosomal. See the examples given below.

Value

`pofthesample`	probability of the observed sample
`pseudodist`	probabilities of simulated samples
`pval`	p-value

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. Molecular Ecology Resources. 18(3) pp. 461–473. doi:10.1111/1755-0998.12748

Examples

#
# bi-allelic autosomal
#

x1 <- c(AA=298,AB=489,BB=213)
## Not run: 
out <- HWPerm.mult(x1)

## End(Not run)

#
# bi-allelic X-chromosomal
#

x2.m <- c(A=39, B=21)
x2.f <- toTriangular(c(AA=28, AB=30, BB=8))
## Not run: 
out <- HWPerm.mult(x2.m,x2.f)

## End(Not run)

#
# autosomal k alleles not accounting for gender
#

x3 <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)
x3 <- toTriangular(x3)
## Not run: 
out <- HWPerm.mult(x3)

## End(Not run)

#
# X-chromosomal k alleles 
#

x4.m <- c(A=15,B=17,C=24) 
x4.f <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4))
## Not run: 
out <- HWPerm.mult(x4.m,x4.f)

## End(Not run)

#
# Autosomal k alleles accounting for gender
#

x5.m <- toTriangular(c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0))
x5.f <- toTriangular(c(AA=8,AB=12,AC=13,BB=8,BC=7,CC=0))
## Not run: 
out <- HWPerm.mult(x5.m,x5.f)

## End(Not run)

#
# Autosomal STR with multipe alleles
#

data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GenotypeCounts <- AllelesToTriangular(A1,A2)
print(GenotypeCounts)
## Not run: 
out <- HWPerm.mult(GenotypeCounts)

## End(Not run)
#
# bi-allelic autosomal
#

x1 <- c(AA=298,AB=489,BB=213)
## Not run: 
out <- HWPerm.mult(x1)

## End(Not run)

#
# bi-allelic X-chromosomal
#

x2.m <- c(A=39, B=21)
x2.f <- toTriangular(c(AA=28, AB=30, BB=8))
## Not run: 
out <- HWPerm.mult(x2.m,x2.f)

## End(Not run)

#
# autosomal k alleles not accounting for gender
#

x3 <- c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0)
x3 <- toTriangular(x3)
## Not run: 
out <- HWPerm.mult(x3)

## End(Not run)

#
# X-chromosomal k alleles 
#

x4.m <- c(A=15,B=17,C=24) 
x4.f <- toTriangular(c(AA=4,AB=2,AC=13,BB=6,BC=19,CC=4))
## Not run: 
out <- HWPerm.mult(x4.m,x4.f)

## End(Not run)

#
# Autosomal k alleles accounting for gender
#

x5.m <- toTriangular(c(AA=12,AB=19,AC=13,BB=7,BC=5,CC=0))
x5.f <- toTriangular(c(AA=8,AB=12,AC=13,BB=8,BC=7,CC=0))
## Not run: 
out <- HWPerm.mult(x5.m,x5.f)

## End(Not run)

#
# Autosomal STR with multipe alleles
#

data(NistSTRs)
A1 <- NistSTRs[,1]
A2 <- NistSTRs[,2]
GenotypeCounts <- AllelesToTriangular(A1,A2)
print(GenotypeCounts)
## Not run: 
out <- HWPerm.mult(GenotypeCounts)

## End(Not run)

Calculation of posterior probabilities and Bayes factors for Hardy-Weinberg tests at X-chromosomal variants.

Description

Function HWPosterior calculates posterior probabilities and Bayes factors for tests for Hardy-Weinberg equilibrium of autosomal and X-chromosomal variants.

Usage

HWPosterior(males, females, verbose = TRUE, prior.af = c(0.5,0.5), prior.gf =
            c(0.333,0.333,0.333), x.linked = FALSE, precision = 0.05) 
HWPosterior(males, females, verbose = TRUE, prior.af = c(0.5,0.5), prior.gf =
            c(0.333,0.333,0.333), x.linked = FALSE, precision = 0.05)

Arguments

`males`	A vector of male genotype counts. For autosomal variants, this should be a three-element named vector like (AA,AB,BB); for X-chromosomal variants it should be a two-element vector giving the counts of the hemizygous genotypes like (A,B).
`females`	A vector of female genotype counts. This should be a three-element named vector like (AA,AB,BB)
`verbose`	prints results if `verbose = TRUE`
`prior.af`	Beta prior parameters for male and female allele frequencies
`prior.gf`	Dirichlet prior parameters for female genotype frequencies
`x.linked`	logical indicating whether the variant is autosomal or X-chromosomal
`precision`	precision parameter for marginal likelihoods that require numeric integration

Details

For X-chromosomal variants, four possible models are considered, and the posterior probabilities and Bayes factors for each model are calculated.

For autosomal variants, ten possible scenarios are considered, and the posterior probabilities for all models are calculated.

In general, default Dirichlet priors are used for genotype frequencies, and beta prior are used for allele frequencies.

Value

For X-chromosomal variants, a matrix with posterior probabilities and Bayes factors will be produced. For autosomal variants, a vector of posterior probabilities is produced.

Author(s)

Xavi Puig [email protected] and Jan Graffelman [email protected]

References

Puig, X., Ginebra, J. and Graffelman, J. (2017) A Bayesian test for Hardy-Weinberg equilibrium of bi-allelic X-chromosomal markers. Heredity 119(4):226–236. doi:10.1038/hdy.2017.30.

Puig, X., Ginebra, J. and Graffelman, J. (2019) Bayesian model selection for the study of Hardy-Weinberg proportions and homogeneity of gender allele frequencies. Heredity 123(5), pp. 549-564. doi:10.1038/s41437-019-0232-0

Examples

#
# An X-chromosomal example
#
males   <- c(A=43,B=13)
females <- c(AA=26,AB=19,BB=3)
out     <- HWPosterior(males,females,verbose=TRUE,x.linked=TRUE)
#
# An autosomal example
#
data(JPTsnps)
males     <- JPTsnps[1,1:3]
females   <- JPTsnps[1,4:6]
post.prob <- HWPosterior(males,females,x.linked=FALSE)
#
# An X-chromosomal example
#
males   <- c(A=43,B=13)
females <- c(AA=26,AB=19,BB=3)
out     <- HWPosterior(males,females,verbose=TRUE,x.linked=TRUE)
#
# An autosomal example
#
data(JPTsnps)
males     <- JPTsnps[1,1:3]
females   <- JPTsnps[1,4:6]
post.prob <- HWPosterior(males,females,x.linked=FALSE)

Compute the power of a test for Hardy-Weinberg equilibrium.

Description

HWPower is a function that computes the power of a test for Hardy-Weinberg equilibrium.

Usage

HWPower(n = 100, nA = 100, pA = 0.5, y = c(AA=25,AB=50,BB=25),
alpha = 0.05, theta = 4, f = NULL, test = "exact",
alternative = "two.sided", pvaluetype = "selome", cc = 0.5)
HWPower(n = 100, nA = 100, pA = 0.5, y = c(AA=25,AB=50,BB=25),
alpha = 0.05, theta = 4, f = NULL, test = "exact",
alternative = "two.sided", pvaluetype = "selome", cc = 0.5)

Arguments

`n`	The sample size
`nA`	The minor allele count
`pA`	The minor allele frequency
`y`	A sample of genotype counts (AA,AB,BB)
`alpha`	The significance level (0.05 by default)
`theta`	The degree of disequilibrium (`theta` = 4 is equilibrium, `theta` > 4 is heterozygote excess, `theta` < 4 is heterozygote dearth)
`f`	The inbreeding coefficient. Overrules `theta` if specified.
`test`	The type of test for which power is to be computed. Can be "exact" (default) or "chisq" (chi-square)
`alternative`	The nature of the alternative hypothesis ("two.sided" (default), "greater" or "less")
`pvaluetype`	The type of p-value used in an exact test ("selome", "dost" or "midp")
`cc`	Continuity correction parameter for the chi-square test (0.5 by default)

Details

HWPower uses the Levene-Haldane distribution (distribution of the number of heterzygotes given the minor allele count) for computing power.

HWPower can be used in three different way. In principle, the power is calcuted on the basis of the sample size (n) and the minor allele count (nA). Alternatively, the user may specifiy sample size (n) and minor allele frequency (pA). Finally, power can also be calculated directly from a sample of genotype counts. In that case the calculated power is the power for a sample of the given sample size and minor allele count. The three ways to use HWPower are illustrated in the example section.

Value

if test = "exact" the power of the exact test is computed for the given significance level and minor allele count.

if test = "chisq" the power of the chi-square test is computed for the given significance level and minor allele count.

Author(s)

Jan Graffelman ([email protected])

References

Graffelman, J. and Moreno, V. (2013) The Mid p-value in exact tests for Hardy-Weinberg proportions. Statistical Applications in Genetics and Molecular Biology 12(4): 433-448.

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

pw.chisq <- HWPower(n=100,nA=100,alpha=0.05,test="chisq",theta=16)
print(pw.chisq)
pw.exact <- HWPower(n=100,nA=100,alpha=0.05,test="exact",theta=16,pvaluetype="selome")
print(pw.exact)
pw.exact <- HWPower(n=100,nA=100)
print(pw.exact)
pw.exact <- HWPower(n=100,pA=0.5)
print(pw.exact)
pw.exact <- HWPower(y=c(AA=25,AB=50,BB=25))
print(pw.exact)
pw.chisq <- HWPower(n=100,nA=100,alpha=0.05,test="chisq",theta=16)
print(pw.chisq)
pw.exact <- HWPower(n=100,nA=100,alpha=0.05,test="exact",theta=16,pvaluetype="selome")
print(pw.exact)
pw.exact <- HWPower(n=100,nA=100)
print(pw.exact)
pw.exact <- HWPower(n=100,pA=0.5)
print(pw.exact)
pw.exact <- HWPower(y=c(AA=25,AB=50,BB=25))
print(pw.exact)

A Q-Q plot for Hardy-Weinberg equilibrium

Description

HWQqplot creates a Q-Q plot for the p-values obtained in an Exact test for Hardy-Weinberg equilibrium. Empirical p-values are plotted against multiple simulated quantiles of the theoretical p-value distribution.

Usage

HWQqplot(X, nsim = 100, fit = "curve", logplot = FALSE,
main = "Q-Q plot for HWE", mm = NULL, pvaluetype = "selome", ...)
HWQqplot(X, nsim = 100, fit = "curve", logplot = FALSE,
main = "Q-Q plot for HWE", mm = NULL, pvaluetype = "selome", ...)

Arguments

`X`	Data matrix with genotype counts, one row for each sample, 3 columns
`nsim`	Number of samples drawn from the null distribution (100 by default)
`fit`	If `fit` is set to "line" straight lines will be fitted to the simulated samples, if set to "curve", ascending curves will be shown.
`logplot`	If `logplot` is set to true, then the log10 of the p-values will be used in the plot. If not, untransformed p-values will be used.
`main`	Title for the plot
`mm`	Maximal value for x and y axis in the plot
`pvaluetype`	Type of p-value to be used in an exact test. Can be "selome" (default), "midp" or "dost".
`...`	Any additional arguments for the `plot` instruction

Details

HWQqplot constructs a Q-Q plot of the p-values of an exact test for Hardy-Weinberg equilibrium. Under the null, this p-value is not uniform. HWQqplot samples from the theoretical null distribution, taking into account that markers may vary in allele frequency and in sample size (due to missing values). For each simulated sample a grey curve or line is shown. A green reference line with intercept 0 and slope 1 is also shown in the plot.

Value

NULL

Author(s)

Jan Graffelman [email protected]

References

Rohlfs, R.V. and Weir, B.S. (2008) Distributions of Hardy-Weinberg equilibrium test statistics. Genetics 180, pp. 1609-1616.

Examples

## Not run: 
set.seed(1234)
n <- 200 # sample size
m <- 100 # number of markers
X <- HWData(n,m)
HWQqplot(X,logplot=TRUE,pvaluetype="selome",main="Q-Q Plot for HWE")

## End(Not run)
## Not run: 
set.seed(1234)
n <- 200 # sample size
m <- 100 # number of markers
X <- HWData(n,m)
HWQqplot(X,logplot=TRUE,pvaluetype="selome",main="Q-Q Plot for HWE")

## End(Not run)

Testing a set of microsatellites (STRs) for Hardy-Weinberg equilibrium

Description

Function HWStr is a wrapper function arount HWPerm.mult and HWChisq in order to test a set of STRs for Hardy-Weinberg equilibrium.

Usage

HWStr(X, test = "permutation", verbose = FALSE, ...)
HWStr(X, test = "permutation", verbose = FALSE, ...)

Arguments

`X`	A data matrix with STRs in columns, with there two alleles coded in successive columns.
`test`	`permutation` for a permutation test, or `"chisq"` for a chisquare test
`verbose`	be silent if set to `FALSE` or print info for each STR if `TRUE`.
`...`	possible extra parameters to be passed on to `HWPerm.mult` such as `nperm` or `eps`.

Details

It is recommended to test by a permutation test. By default, 17.000 permutations are used. Exact testing is not implemented. Missing genotypes are excluded on a per STR basis.

Value

A data frame with the

`strnames`	STR name
`N`	sample size
`Nt`	number of alleles
`MinorAF`	minor allele frequency
`MajorAF`	major allele frequency
`Ho`	observed heterozygosity
`He`	expected heterozygosity (without bias correction)
`Hp`	Shannon index of allele frequencies
`pval`	p-value of the HWE test

Author(s)

Jan Graffelman [email protected]

Examples

data(NistSTRs)
## Not run: 
Results <- HWStr(NistSTRs,test="permutation")

## End(Not run)
data(NistSTRs)
## Not run: 
Results <- HWStr(NistSTRs,test="permutation")

## End(Not run)

Asymptotic test for HWE across strata for a single biallelic marker

Description

Function HWStrata implements Olson's asymptotic test for HWE for a stratified sample of single biallelic polymorphism.

Usage

HWStrata(X, verbose = TRUE)
HWStrata(X, verbose = TRUE)

Arguments

`X`	A three-column matrix of genotype counts (e.g. with columns AA, AB, BB)
`verbose`	print output if `verbose = TRUE`

Details

See the references for the related homogeneity assumption.

Value

`T2`	The test statistic
`pval`	The p-value

Author(s)

Jan Graffelman [email protected]

References

Olson J.M. (1993) Testing the Hardy-Weinberg law across strata. Annals of Human Genetics 57(4):291-295.

Olson, J.M. and Foley, M. (1996) Testing for homogeneity of Hardy-Weinberg disequilibrium using data sampled from several populations. Biometrics 52(3) pp. 971-979.

Examples

#
# Test across strata
#
data("Glyoxalase")
Glyoxalase <- as.matrix(Glyoxalase)
HWStrata(Glyoxalase)
#
# Stratified exact testing, testing each sample
#
HWExactStats(Glyoxalase)
#
# Test across strata
#
data("Glyoxalase")
Glyoxalase <- as.matrix(Glyoxalase)
HWStrata(Glyoxalase)
#
# Stratified exact testing, testing each sample
#
HWExactStats(Glyoxalase)

Ternary plot with the Hardy-Weinberg acceptance region

Description

HWTernaryPlot is a routine that draws a ternary plot for three-way genotypic compositions (AA,AB,BB), and represents the acceptance region for different tests for Hardy-Weinberg equilibrium (HWE) in the plot. This allows for graphical testing of a large set of markers (e.g. SNPs) for HWE. The (non) significance of the test for HWE can be inferred from the position of the marker in the ternary plot. Different statistical tests for HWE can be done graphically with this routine: the ordinary chisquare test, the chisquare test with continuity correction and the Haldane's exact test.

Usage

HWTernaryPlot(X, n = NA, addmarkers = TRUE, newframe = TRUE, hwcurve = TRUE, 
vbounds = FALSE, mafbounds = FALSE, mafvalue = 0.05, axis = 0, region = 1, 
vertexlab = colnames(X), alpha = 0.05, vertex.cex = 1, pch = 19, cc = 0.5, 
markercol = "black", markerbgcol = "black", cex = 0.75, axislab = "", 
verbose = FALSE, markerlab = NULL, markerpos = NULL, mcex = 1, connect =
FALSE, curvecols = rep("black",5), signifcolour = TRUE, patternsigsymbol = 19,
curtyp = "solid", ssf = "max", pvaluetype = "selome", grid = FALSE,
gridlabels = TRUE, patternramp = FALSE, axisticklabels = FALSE, ...)
HWTernaryPlot(X, n = NA, addmarkers = TRUE, newframe = TRUE, hwcurve = TRUE, 
vbounds = FALSE, mafbounds = FALSE, mafvalue = 0.05, axis = 0, region = 1, 
vertexlab = colnames(X), alpha = 0.05, vertex.cex = 1, pch = 19, cc = 0.5, 
markercol = "black", markerbgcol = "black", cex = 0.75, axislab = "", 
verbose = FALSE, markerlab = NULL, markerpos = NULL, mcex = 1, connect =
FALSE, curvecols = rep("black",5), signifcolour = TRUE, patternsigsymbol = 19,
curtyp = "solid", ssf = "max", pvaluetype = "selome", grid = FALSE,
gridlabels = TRUE, patternramp = FALSE, axisticklabels = FALSE, ...)

Arguments

`X`	a matrix of `n` genotypic compositions or counts. If it is a matrix of compositions, X should have (`n` rows that sum 1, and 3 columns, with the relative frequencies of AA, AB and BB respectively. Argument `n` should be supplied as well. If X is a matrix of raw genotypic counts, it should have 3 columns with the absolute counts of AA, AB and BB respectively. Argument `n` may be supplied and will be used for painting acceptance regions. If not supplied `n` is computed from the data in `X`.
`n`	the samples size (for a complete composition with no missing data).
`addmarkers`	represent markers by dots in the triangle (`addmarkers=TRUE`) or not (`addmarkers=FALSE`).
`newframe`	allows for plotting additional markers in an already existing ternary plot. Overplotting is achieved by setting `newframe` to `FALSE`. Setting `newframe = TRUE` (default) will create a new ternary plot.
`hwcurve`	draw the HW parabola in the plot (`hwcurve=TRUE)` or not (`hwcurve=FALSE`).
`vbounds`	indicate the area corresponding to expected counts > 5 (`vbounds=TRUE`) or not (`vbounds=FALSE`).
`mafbounds`	indicate the area corresponding to MAF < `mafvalue`.
`mafvalue`	a critical value for the minor allele frequency (MAF).
`axis`	draw a vertex axis 0 = no axis is drawn 1 = draw the AA axis 2 = draw the AB axis 3 = draw the BB axis 4 = draw tick on the A-B axis
`region`	the type of acceptance region to be delimited in the triangle 0 = no acceptance region is drawn 1 = draw the acceptance region corresponding to a Chi-square test 2 = draw the acceptance region corresponding to a Chi-square test with continuity correction 3 = draw the acceptance region corresponding to a Chi-square test with continuity correction for D > 0 4 = draw the acceptance region corresponding to a Chi-square test with continuity correction for D < 0 5 = draw the acceptance regions for all preceding tests simultaneously 6 = draw the acceptance region corresponding to a Chi-square test with continuity correction with the upper limit for D > 0 and the lower limit for D < 0 7 = draw the acceptance region corresponding to a two-sided exact test
`vertexlab`	labels for the three vertices of the triangle
`alpha`	significance level (0.05 by default)
`vertex.cex`	character expansion factor for the labels of the vertices of the triangle.
`pch`	the plotting character used to represent the markers.
`cc`	value for the continuity correction parameter (0.5 by default).
`markercol`	vector with colours for the marker points in the triangle.
`markerbgcol`	vector with background colours for the marker points in the triangle.
`cex`	expansion factor for the marker points in the triangle.
`axislab`	a label to be put under the horizontal axis.
`verbose`	print information on the numerically found cut-points between curves of the acceptance region and the edges of the triangle.
`markerlab`	labels for the markers in the triangle.
`markerpos`	positions for the marker labels in the triangle (1,2,3 or 4).
`mcex`	character expansion factor for the labels of the markers in the ternary plot.
`connect`	connect the represented markers by a line in the ternary plot.
`curvecols`	a vector with four colour specifications for the different curves that can be used to delimit the HW acceptance region. E.g. `curvecols=c("red", "green","blue","black","purple")` will paint the Hardy-Weinberg curve red, the limits of the acceptance region for an ordinary chi-square test for HWE green, the limits of the acceptance region for a chi-square test with continuity correction when D > 0 blue and the limits of the acceptance region for a chi-square test with continuity correction when D < 0 black, and the limits of the exact acceptance region purple.
`signifcolour`	colour the marker points automatically according to the result of a signifance test (green markers non-siginficant, red markers significant). `signifcolour` only takes effect if `region` is set to 1, 2 or 7.
`patternsigsymbol`	plotting character used to represent significant makers when `patternramp=TRUE`
`curtyp`	style of the drawn curves (`"dashed","solid","dotted",...`)
`ssf`	sample size function (`"max","min","mean","median",...`). Indicates how the sample size for drawing acceptance regions is determined from the matrix of counts.
`pvaluetype`	method to compute p-values in an exact test (`"dost"` or `"selome"`)
`grid`	draw a reference grid for genotype frequencies at (0.2,0.4,0.6,0.8)
`gridlabels`	if `grid=TRUE` plot labels at the grid
`patternramp`	paint a colour ramp for patterns of genotype frequencies. This will ignore `signifcolour`, and use colour to represent marker frequencies. The ramp is a green-red gradient for the relative frequency of the pattern.
`axisticklabels`	place numeric labels for allele frequencies on the A-B axis.
`...`	other arguments passed on to the plot function (e.g. `main` for a main title).

Details

HWTernaryPlot automatically colours significant markers in red, and non-significant markers in green if region is set to 1, 2 or 7.

Value

`minp`	minimum allele frequency above which testing for HWE is appropriate (expected counts exceeding 5).
`maxp`	maximum allele frequency below which testing for HWE is appropriate.
`inrange`	number of markers in the appropriate range.
`percinrange`	percentage of markers in the appropriate.
`nsignif`	number of significant markers (only if `region` equals 1,2 or 7.)

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Morales, J. (2008) Graphical Tests for Hardy-Weinberg Equilibrium Based on the Ternary Plot. Human Heredity 65(2):77-84.

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

#
# Ternary plot with 1000 SNPs and HWE curve
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X,100,region=0,vertex.cex=2,pch=1)

#
# Genotype frequency pattern of simulated SNPs with uniform
# allele frequency distribution
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of simulated SNPs with skewed 
# allele frequency distribution
#

X <- HWData(nm=1000,n=100,shape1=1,shape2=20)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.50
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.25
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.25, using a triangle to mark siginficant markers
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE, region=1, patternsigsymbol = 2)

#
# Ternary plot of 1000 SNPs simulated under HWE and uniform allele frequency, with
# acceptance region for a chi-square test and significant markers in red.
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X)

#
# Ternary plot of 100 SNPs simulated under HWE and uniform allele frequency, with
# acceptance region for an exact test and significant markers in red.
#

X <- HWData(nm=100,n=100)
## Not run: HWTernaryPlot(X,region=7)
#
# Ternary plot with 1000 SNPs and HWE curve
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X,100,region=0,vertex.cex=2,pch=1)

#
# Genotype frequency pattern of simulated SNPs with uniform
# allele frequency distribution
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of simulated SNPs with skewed 
# allele frequency distribution
#

X <- HWData(nm=1000,n=100,shape1=1,shape2=20)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.50
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.25
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE)

#
# Genotype frequency pattern of SNPs simulated under HWE with
# allele frequency of 0.25, using a triangle to mark siginficant markers
#

X <- HWData(nm=1000,n=100,p=0.25)
HWTernaryPlot(X,patternramp = TRUE, region=1, patternsigsymbol = 2)

#
# Ternary plot of 1000 SNPs simulated under HWE and uniform allele frequency, with
# acceptance region for a chi-square test and significant markers in red.
#

X <- HWData(nm=1000,n=100)
HWTernaryPlot(X)

#
# Ternary plot of 100 SNPs simulated under HWE and uniform allele frequency, with
# acceptance region for an exact test and significant markers in red.
#

X <- HWData(nm=100,n=100)
## Not run: HWTernaryPlot(X,region=7)

Exact test for Hardy-Weinberg equilibrium and equality of allele frequencies for tri-allelic variants.

Description

Function HWTriExact does a standard exact test for Hardy-Weinberg equilibrium of a tri-allelic variant, and also does joint exact tests for equilibrium and equality of allele frequencies if the genotype counts are given separately for both sexes

Usage

HWTriExact(x, y = NULL, eps = 1e-10, nperm = 17000, verbose = TRUE)
HWTriExact(x, y = NULL, eps = 1e-10, nperm = 17000, verbose = TRUE)

Arguments

`x`	vector with 6 genotype counts (AA,AB,AC,BB,BC,CC)
`y`	vector with 6 or 3 genotype counts (AA,AB,AC,BB,BC,CC) or (A,B,C)
`eps`	a tolerance that can be set for comparing exact probabilities
`nperm`	number of permutations (only relevant for autosomal stratified by gender)
`verbose`	print output or not

Details

If only x is specified, an exact test for an autosomal variant with three alleles will be performed.

If both x and y are supplied as vectors with 6 elements, a permutation test for HWE and equality of allele frequencies (EAF) for an autosomal variant is performed, using nperm permutations. The distribution of the probabilities is returned in pseudodist. The computational cost of a completed enumeration algorithm can be prohibitive in this case.

If x is supplied as a length 6 vector, and y as a length 3 vector, the variant is assumed to be X-chromosomal, x containing female genotype counts and y containing male genotyep counts. In this case a joint exact test for HWE and EAF for an X-chromosomal tri-allelic variant is executed.

See the examples in the example section below.

Value

`pval`	The p-value of the sample
`pseudodist`	Distribution of probabilities obtained by simulation
`pofthesample`	The probability of the observed sample

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. Molecular Ecology Resources. 18(3) pp. 461–473. doi:10.1111/1755-0998.12748.

Examples

#
# Autosomal tri-allelic (not accounting for gender)
#

x <- c(AA=20,AB=31,AC=26,BB=15,BC=12,CC=0)
## Not run: out <- HWTriExact(x)

#
# Autosomal tri-allelic accounting for gender
#

males   <- c(A=1,B=21,C=34) 
females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15)
## Not run: out <- HWTriExact(females,males)

#
# X-chromosomal tri-allelic accounting for gender
#

males   <- c(A=1,B=21,C=34) 
females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15)
## Not run: out <- HWTriExact(females,males)
#
# Autosomal tri-allelic (not accounting for gender)
#

x <- c(AA=20,AB=31,AC=26,BB=15,BC=12,CC=0)
## Not run: out <- HWTriExact(x)

#
# Autosomal tri-allelic accounting for gender
#

males   <- c(A=1,B=21,C=34) 
females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15)
## Not run: out <- HWTriExact(females,males)

#
# X-chromosomal tri-allelic accounting for gender
#

males   <- c(A=1,B=21,C=34) 
females <- c(AA=0,AB=1,AC=0,BB=8,BC=24,CC=15)
## Not run: out <- HWTriExact(females,males)

Inverse Fisher z transformation

Description

Calculates the inverse of Fisher's z transformation

Usage

ifisherz(y)
ifisherz(y)

Arguments

`y`	a real number

Value

a correlation coefficient in the range (-1,1)

Author(s)

Jan Graffelman ([email protected])

Examples

r <- 0.5
print(ifisherz(fisherz(r)))
r <- 0.5
print(ifisherz(fisherz(r)))

Detects autosomal and X-chromosomal monomorphic variants

Description

Function is.mono tests if a bi-allelic variant is monomorphic or not

Usage

is.mono(x)
is.mono(x)

Arguments

`x`	a vector of three or five genotype counts ((AA,AB,BB) or (A,B,AA,AB,BB)), or a three-column or five-column matrix with genotype counts (variants in rows, columns)

Details

is.mono assumes autosomal variants are coded in three-element vectors or three-column matrices, whereas X-chromosomal variants are coded in five-element vectors or five-column matrices.

Value

A logical or vector of logicals

Author(s)

Jan Graffelman ([email protected])

Examples

#
# a polymorphic autosomal marker
#
x <- c(AA=10,AB=20,BB=10)
print(is.mono(x))
#
# a monomorphic autosomal marker
#
x <- c(AT=0,AA=100,TT=0)
print(is.mono(x))
#
# an autosomal marker with only heterozygotes
#
x <- c(AT=100,AA=0,TT=0)
print(is.mono(x))
#
# a matrix with low maf autosomal markers
#
set.seed(123)
X <- HWData(100,50,shape1=1,shape2=20)
number.monomorphics <- sum(is.mono(X))
print(number.monomorphics)
#
# a polymorphic X chromosomal marker
#
x <- c(G=24,C=26,GG=12,CC=13,GC=25)
print(is.mono(x))
#
# another polymorphic X chromosomal marker
#
x <- c(G=24,C=1,GG=25,CC=0,GC=0)
is.mono(x)
#
# a monomorphic X chromosomal marker
#
x <- c(G=24,C=0,GG=12,CC=0,GC=0)
is.mono(x)
#
# a matrix with low maf X-chromosomal markers
#
set.seed(123)
Y <- HWData(100,50,shape=1,shape2=20,x.linked = TRUE)
number.monomorphics <- sum(is.mono(Y))
print(number.monomorphics)
#
# a polymorphic autosomal marker
#
x <- c(AA=10,AB=20,BB=10)
print(is.mono(x))
#
# a monomorphic autosomal marker
#
x <- c(AT=0,AA=100,TT=0)
print(is.mono(x))
#
# an autosomal marker with only heterozygotes
#
x <- c(AT=100,AA=0,TT=0)
print(is.mono(x))
#
# a matrix with low maf autosomal markers
#
set.seed(123)
X <- HWData(100,50,shape1=1,shape2=20)
number.monomorphics <- sum(is.mono(X))
print(number.monomorphics)
#
# a polymorphic X chromosomal marker
#
x <- c(G=24,C=26,GG=12,CC=13,GC=25)
print(is.mono(x))
#
# another polymorphic X chromosomal marker
#
x <- c(G=24,C=1,GG=25,CC=0,GC=0)
is.mono(x)
#
# a monomorphic X chromosomal marker
#
x <- c(G=24,C=0,GG=12,CC=0,GC=0)
is.mono(x)
#
# a matrix with low maf X-chromosomal markers
#
set.seed(123)
Y <- HWData(100,50,shape=1,shape2=20,x.linked = TRUE)
number.monomorphics <- sum(is.mono(Y))
print(number.monomorphics)

Multi-allelic autosomal variants of the Japanese population of the 1000 genomes project

Description

JPTtriallelicsChrX contains three selected multi-allelic variants on chromosome 7 from the Japanese sample of the 1000 genomes project.

Usage

data("JPTmultiallelicsChr7")data("JPTmultiallelicsChr7")

Format

List object with fields m4,f4; m5,f5; m6,f6;

Details

The list object contains male and female genotype counts for 3 multi-allelic variants on chromosome 7 of the JPT sample of the 1000 genomes project.

Source

The The 1000 genomes project.

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 10.

Examples

data(JPTmultiallelicsChr7)
str(JPTmultiallelicsChr7)
data(JPTmultiallelicsChr7)
str(JPTmultiallelicsChr7)

Multi-allelic X-chromosomal variants of the Japanese population of the 1000 genomes project

Description

JPTtriallelicsChrX contains four selected multi-allelic variants on the X chromosome from the Japanese sample of the 1000 genomes project.

Usage

data("JPTmultiallelicsChrX")data("JPTmultiallelicsChrX")

Format

List object with fields m4,f4; m5,f5; m6,f6; m7,f7

Details

The list object contains male and female genotype counts for four multi-allelic variants of the JPT sample of the 1000 genomes project.

Source

The The 1000 genomes project.

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 7.

Examples

data(JPTmultiallelicsChrX)
m4 <- JPTmultiallelicsChrX$m4
f4 <- JPTmultiallelicsChrX$f4
data(JPTmultiallelicsChrX)
m4 <- JPTmultiallelicsChrX$m4
f4 <- JPTmultiallelicsChrX$f4

Bi-allelic SNPs from a Japanese population stratified by gender

Description

JPTsnps contains genotype counts for the two sexes of ten single nucleotide polymorphisms of the Japanese (JPT) sample of the 1000 Genomes project, consisting of 56 males and 48 females.

The first three columns contain the male genotype counts, and the last three columns contain the female genotype counts.

Usage

data("JPTsnps")data("JPTsnps")

Format

data frame

Source

The The 1000 genomes project.

References

Puig, X., Ginebra, J. and Graffelman, J. (2019) Bayesian model selection for the study of Hardy-Weinberg proportions and homogeneity of gender allele frequencies.

Examples

data(JPTsnps)
data(JPTsnps)

Tri-allelic variants on chromosome 7 of the Japanese (JPT) sample of the 1000 genomes project

Description

JPTtriallelics contains six selected tri-allelic variants on chromosome 7 from the Japanese sample of the 1000 genomes project.

Usage

data("JPTtriallelicsChr7")data("JPTtriallelicsChr7")

Format

A data frame with 6 observations on the following 14 variables.

id: RS identifier
pos: position in base pairs
mAA: number of AA males
mAB: number of AB males
mAC: number of AC males
mBB: number of BB males
mBC: number of BC males
mCC: number of CC males
fAA: number of AA females
fAB: number of AB females
fAC: number of AC females
fBB: number of BB females
fBC: number of BC females
fCC: number of CC females

Source

The The 1000 genomes project.

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 9.

Examples

data(JPTtriallelicsChr7)
str(JPTtriallelicsChr7)
data(JPTtriallelicsChr7)
str(JPTtriallelicsChr7)

Tri-allelic variants on the X-chromosome of the Japanese (JPT) sample of the 1000 genomes project

Description

JPTtriallelicsChrX contains five selected tri-allelic variants on the X chromosome from the Japanese sample of the 1000 genomes project.

Usage

data("JPTtriallelicsChrX")data("JPTtriallelicsChrX")

Format

A data frame with 5 observations on the following 12 variables.

id: Identifier of the polymorphism
pos: Position of the polymorphism in base pairs
chr: Chromosome
A: Number of males with A genotype
B: Number of males with B genotype
C: Number of males with C genotype
AA: Number of AA females
AB: Number of AB females
AC: Number of AC females
BB: Number of BB females
BC: Number of BC females
CC: Number of CC females

Source

The The 1000 genomes project.

References

Graffelman, J. and Weir, B.S. (2017) Multi-allelic exact tests for Hardy-Weinberg equilibrium that account for gender. doi: 10.1101/172874. Table 6.

Examples

data(JPTtriallelicsChrX)
str(JPTtriallelicsChrX)
data(JPTtriallelicsChrX)
str(JPTtriallelicsChrX)

Compute the minor allele count.

Description

mac computes the smallest allele count for a given vector of genotype counts.

Usage

mac(X)
mac(X)

Arguments

`X`	a vector or matrix with genotype counts (AA, AB, BB)

Value

a vector of the minor allele counts

Author(s)

Jan Graffelman ([email protected])

Examples

   X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
   names(X) <- c("AA","AB","BB")
   print(X)
   print(mac(X))
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
   names(X) <- c("AA","AB","BB")
   print(X)
   print(mac(X))

Function to compute minor allele frequencies

Description

Function maf computes the minor allele frequency for a matrix or vector of genotype counts.

Usage

maf(x, option = 1, verbose = FALSE) 
maf(x, option = 1, verbose = FALSE)

Arguments

`x`	a vector or matrix of with genotype counts
`option`	determines output that is returned. `option=1` calculates the minor allele frequency; `option=2` returns all allele frequencies; `option=3` returns the allele counts.
`verbose`	be silent (`verbose=FALSE`) or not.

Value

a vector of minor allele frequencies or minor allele counts.

Author(s)

Jan Graffelman ([email protected])

Examples

#
# MAF of a single random marker
#
set.seed(123)
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
names(X) <- c("AA","AB","BB")
print(X)
print(maf(X))
#
# MAF of MN bloodgroup counts
#
x <- c(MM=298,MN=489,NN=213)
maf(x)
#
# Both allele frqeuencies, ordered from minor to major
#
maf(x,2)
#
# allele counts of MN bloodgroup counts, order from minor to major
#
maf(x,3)
#
# MAF of single triallelic marker in triangular format
#
x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
print(x)
GT <- toTriangular(x)
print(GT)
maf(GT)
#
# extract all allele frequencies
#
maf(GT,option=2)
#
# extract all allele counts
#
maf(GT,option=3)
#
# Calculate the MAF for 10 random SNPs under HWE
#
set.seed(123)
Z <- HWData(nm=10)
print(Z)
#
# vector with minor allele frequencies, one per marker
#
maf(Z)
#
# Matrix with minor and major allele frequencies, one row per marker 
#
maf(Z,2)
#
# Matrix with minor and major allele count, one row per marker 
#
maf(Z,3)
#
# MAF of a single random marker
#
set.seed(123)
X <- as.vector(rmultinom(1,100,c(0.5,0.4,0.1)))
names(X) <- c("AA","AB","BB")
print(X)
print(maf(X))
#
# MAF of MN bloodgroup counts
#
x <- c(MM=298,MN=489,NN=213)
maf(x)
#
# Both allele frqeuencies, ordered from minor to major
#
maf(x,2)
#
# allele counts of MN bloodgroup counts, order from minor to major
#
maf(x,3)
#
# MAF of single triallelic marker in triangular format
#
x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
print(x)
GT <- toTriangular(x)
print(GT)
maf(GT)
#
# extract all allele frequencies
#
maf(GT,option=2)
#
# extract all allele counts
#
maf(GT,option=3)
#
# Calculate the MAF for 10 random SNPs under HWE
#
set.seed(123)
Z <- HWData(nm=10)
print(Z)
#
# vector with minor allele frequencies, one per marker
#
maf(Z)
#
# Matrix with minor and major allele frequencies, one row per marker 
#
maf(Z,2)
#
# Matrix with minor and major allele count, one row per marker 
#
maf(Z,3)

Create genotype counts from bi-allelic marker data

Description

MakeCounts creates a matrix of genotype counts, with one row for each bi-allelic marker, containing 4 columns with the counts AA, AB, BB and NA (missings) respectively

Usage

MakeCounts(X, alleles, pos1 = 1, pos2 = 3, coding = c(AA=0,AB=1,BB=2), sep = "")
MakeCounts(X, alleles, pos1 = 1, pos2 = 3, coding = c(AA=0,AB=1,BB=2), sep = "")

Arguments

`X`	A matrix or dataframe with bi-allelic genotyping information, markers in columns, individuals in rows
`alleles`	a vector of alleles for each marker (e.g. c("A/T","A/G",...)). Only relevanit if `X` is a matrix with text entries.
`pos1`	position of the first allele in the allele string (1 by default)
`pos2`	position of the second allele in the allele string (3 by default)
`coding`	indicates how homozygotes and heterozygote are coded as numbers. Only relevant if `X` is a matrix with numeric entries.
`sep`	allele separator character for genotype data in text format ("" for AA; "/" for "A/A")

Details

MakeCounts is thought for bi-allelic marker data only. Missings are should be coded by NA. It produces the right input for HWTernaryPlot.

Heterozygotes may be coded in the data as "AB" or "BA". Both entries will be counted as a heterozygote.

Value

A matrix of 4 columns

Author(s)

Jan Graffelman [email protected]

Examples

SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG")
SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA")
SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT")
X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5)
Y <- MakeCounts(X,c("A/G","C/G","A/G","A/G","C/T"))
print(Y)
W <- matrix(sample(c(0,1,2,NA),100,replace=TRUE),ncol=5)
Z <- MakeCounts(W,coding=c(0,1,2))
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG")
SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA")
SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT")
X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5)
Y <- MakeCounts(X,c("A/G","C/G","A/G","A/G","C/T"))
print(Y)
W <- matrix(sample(c(0,1,2,NA),100,replace=TRUE),ncol=5)
Z <- MakeCounts(W,coding=c(0,1,2))

Make factors from genotyping data

Description

MakeFactor converts bi-allelic genetic marker data, whether coded numerically as (0,1,2) or as (GG,GT,TT), etc. into standard factors coded as AA, AB, BB.

Usage

MakeFactor(x, coding = c(0, 1, 2))
MakeFactor(x, coding = c(0, 1, 2))

Arguments

`x`	A vector containing genotyping results
`coding`	Describes the numerical coding of the genotype data in order AA, AB and BB. Only relevant if `x` is numerical

Details

If x is a factor, it will be coerced to a factor with levels AA, AB and BB. Important detail: the produced factors will have only those levels that are observed in the data. E.g., if genotyping results only consist of (0,1), then the resulting factor will not have level BB (which would be an empty category)

Value

A factor variable

Author(s)

Jan Graffelman [email protected]

Examples

y <- c(1,1,0,0,2,2)
data.frame(y,MakeFactor(y))

y <- c(2,2,3,3,1,1)
data.frame(y,MakeFactor(y,coding=c(1,2,3)))

data(Markers)
data.frame(Markers[,1],MakeFactor(Markers[,1],coding=c(1,2,3)))
y <- c(1,1,0,0,2,2)
data.frame(y,MakeFactor(y))

y <- c(2,2,3,3,1,1)
data.frame(y,MakeFactor(y,coding=c(1,2,3)))

data(Markers)
data.frame(Markers[,1],MakeFactor(Markers[,1],coding=c(1,2,3)))

SNP data and intensities

Description

The dataframe contains the genotypes of 3 SNPs and two allele intensities of 146 individuals. The first column is a GT polymorphism that has missing values for several individuals. The second and third column (iG and iG) are the allele intensities of this polymorphism. Column 4 and 5 are covariate SNPs (an AC and an AG polymorphism) that have no missing values.

Usage

data(Markers)data(Markers)

Format

A data frame containing 146 rows and 5 columns

References

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software, 64(3), 1-23. doi:10.18637/jss.v064.i03.

Genotype frequencies for blood group locus MN

Description

The dataframe contains the genotype frequencies MM, MN and NN for the MN blood group locus for 216 populations. The data are taken from table 2.5 in Mourant et al., using only entries with a sample size of at least 500.

Usage

data(Mourant)data(Mourant)

Format

A data frame containing 216 observations.

Source

Mourant et al, Table 2.5

References

Mourant, A. E. and Kope\'c, A. C. and Domaniewska-Sobczak, K. (1976) The Distribution of the Human Blood Groups and other Polymorphisms. Second edition. Oxford University Press, London.

Number of alleles

Description

Function n.alleles determines the number of alleles in a named genotype vector.

Usage

n.alleles(x, ...)
n.alleles(x, ...)

Arguments

`x`	A named genotype vector (e.g. c(AA=10,AB=20,BB=5))
`...`	extra arguments that are passed on to `alleles`

Value

integer

Author(s)

Jan Graffelman ([email protected])

Examples

   x <- c(AA=25,AB=50,BB=25)
   k <- n.alleles(x)
   print(k)
x <- c(AA=25,AB=50,BB=25)
   k <- n.alleles(x)
   print(k)

NIST autosomal STR data

Description

NistSTRs contains 29 autosomal microsatellites (STRs) of individuals of Caucasian ancestry. The two alleles of an individual are separated into two columns for each STR.

Usage

data("NistSTRs")data("NistSTRs")

Format

A data frame with 361 observations on the following 58 variables.

‘⁠CSF1PO-1⁠’: First allele of CSF1PO
‘⁠CSF1PO-2⁠’: Second allele of CSF1PO
‘⁠D10S1248-1⁠’: First allele of D10S1248
‘⁠D10S1248-2⁠’: Second allele of D10S1248
‘⁠D12S391-1⁠’: First allele of D12S391
‘⁠D12S391-2⁠’: Second allele of D12S391
‘⁠D13S317-1⁠’: First allele of D13S317
‘⁠D13S317-2⁠’: Second allele of D13S317
‘⁠D16S539-1⁠’: First allele of D16S539
‘⁠D16S539-2⁠’: Second allele of D16S539
‘⁠D18S51-1⁠’: First allele of D18S51
‘⁠D18S51-2⁠’: Second allele of D18S51
‘⁠D19S433-1⁠’: First allele of D19S433
‘⁠D19S433-2⁠’: Second allele of D19S433
‘⁠D1S1656-1⁠’: First allele of D1S1656
‘⁠D1S1656-2⁠’: Second allele of D1S1656
‘⁠D21S11-1⁠’: First allele of D21S11
‘⁠D21S11-2⁠’: Second allele of D21S11
‘⁠D22S1045-1⁠’: First allele of D22S1045
‘⁠D22S1045-2⁠’: Second allele of D22S1045
‘⁠D2S1338-1⁠’: First allele of D2S1338
‘⁠D2S1338-2⁠’: Second allele of D2S1338
‘⁠D2S441-1⁠’: First allele of D2S441
‘⁠D2S441-2⁠’: Second allele of D2S441
‘⁠D3S1358-1⁠’: First allele of D3S1358
‘⁠D3S1358-2⁠’: Second allele of D3S1358
‘⁠D5S818-1⁠’: First allele of D5S818
‘⁠D5S818-2⁠’: Second allele of D5S818
‘⁠D6S1043-1⁠’: First allele of D6S1043
‘⁠D6S1043-2⁠’: Second allele of D6S1043
‘⁠D7S820-1⁠’: First allele of D7S820
‘⁠D7S820-2⁠’: Second allele of D7S820
‘⁠D8S1179-1⁠’: First allele of D8S1179
‘⁠D8S1179-2⁠’: Second allele of D8S1179
‘⁠F13A01-1⁠’: First allele of F13A01
‘⁠F13A01-2⁠’: Second allele of F13A01
‘⁠F13B-1⁠’: First allele of F13B
‘⁠F13B-2⁠’: Second allele of F13B
‘⁠FESFPS-1⁠’: First allele of FESFPS
‘⁠FESFPS-2⁠’: Second allele of FESFPS
‘⁠FGA-1⁠’: First allele of FGA
‘⁠FGA-2⁠’: Second allele of FGA
‘⁠LPL-1⁠’: First allele of LPL
‘⁠LPL-2⁠’: Second allele of LPL
‘⁠Penta_C-1⁠’: First allele of Penta_C
‘⁠Penta_C-2⁠’: Second allele of Penta_C
‘⁠Penta_D-1⁠’: First allele of Penta_D
‘⁠Penta_D-2⁠’: Second allele of Penta_D
‘⁠Penta_E-1⁠’: First allele of Penta_E
‘⁠Penta_E-2⁠’: Second allele of Penta_E
‘⁠SE33-1⁠’: First allele of SE33
‘⁠SE33-2⁠’: Second allele of SE33
‘⁠TH01-1⁠’: First allele of TH01
‘⁠TH01-2⁠’: Second allele of TH01
‘⁠TPOX-1⁠’: First allele of TPOX
‘⁠TPOX-2⁠’: Second allele of TPOX
‘⁠vWA-1⁠’: First allele of vWA
‘⁠vWA-2⁠’: Second allele of vWA

Source

http://strbase.nist.gov

References

Hill, C. et al. (2013) U.S. population data for 29 autosomal STR loci. Forensic science international: Genetics 497 7(3):e82-3.

Examples

data(NistSTRs)
data(NistSTRs)

Reordering of autosomal genotype counts

Description

Function order.auto reorders a named vector of three genotype counts, such that the sequence (minor homozygote, heterozygote, major heterzygote) is establised.

Usage

order.auto(X)order.auto(X)

Arguments

`X`	a named vector of genotype counts (e.g. c(AA=25,AB=50,BB=25))

Value

a vector

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

x <- c(MN=489,MM=298,NN=213)
print(x)
y <- order.auto(x)
print(y)
x <- c(MN=489,MM=298,NN=213)
print(x)
y <- order.auto(x)
print(y)

Reordering of X-chromosomal genotype counts

Description

Function order.x reorders a named vector of three genotype counts, such that the sequence (minor hemizygote, major hemizygote, minor homozygote, heterozygote, major heterzygote) is establised.

Usage

order.x(X)order.x(X)

Arguments

`X`	a named vector of genotype counts (e.g. c(A=10,B=10,AA=25,AB=50,BB=25))

Value

a vector

Author(s)

Jan Graffelman [email protected]

References

Graffelman, J. (2015) Exploring Diallelic Genetic Markers: The HardyWeinberg Package. Journal of Statistical Software 64(3): 1-23. doi:10.18637/jss.v064.i03.

Examples

x <-  c(A=392, B=212, AA=275, AB=296, BB=80)
print(x)
y <- order.x(x)
print(y)
x <-  c(A=392, B=212, AA=275, AB=296, BB=80)
print(x)
y <- order.x(x)
print(y)

Q-Q plot for a uniform distribution.

Description

qqunif makes a Q-Q plot against a uniform distribution for the supplied data vector.

Usage

qqunif(x, logplot = FALSE, lbs = 1:length(x), texton = FALSE, 
          xylim = NULL, main = "Q-Q plot for a uniform distribution",
          plotline = 0, xlab = "Expected p-value", 
          ylab = "Observed p-value", colvec=rep("black",length(x)),
	  colline = "black", ...)
qqunif(x, logplot = FALSE, lbs = 1:length(x), texton = FALSE, 
          xylim = NULL, main = "Q-Q plot for a uniform distribution",
          plotline = 0, xlab = "Expected p-value", 
          ylab = "Observed p-value", colvec=rep("black",length(x)),
	  colline = "black", ...)

Arguments

`x`	The data vector
`logplot`	If `logplot` is set to true, then the log10 of the p-values will be used in the plot. If not, untransformed p-values will be used.
`lbs`	A vector of labels for the points in the Q-Q plot
`texton`	Logical indicating whether labels should be plotted or not
`xylim`	Shared upper limit x and y axis in the plot
`main`	Title for the plot
`plotline`	Setting `plotline=0` plots a (0,1) line; `plotline=1` plots a robust fit,
`xlab`	label for the x axis
`ylab`	label for the y axis
`colvec`	vector with colours for the points in the QQ plot
`colline`	colour for the line in the plot
`...`	Any additional arguments for the `plot` instruction

Value

`pvals`	observed probabilities
`epvals`	expected probabilities

Author(s)

Jan Graffelman ([email protected])

Examples

   x <- runif(1000)
   z <- qqunif(x)
x <- runif(1000)
   z <- qqunif(x)

Recode genotype information

Description

function recode recodes bi-allelic genetic marker information expressed as strings (e.g. "AA", "AB", "BB") into numerical form.

Usage

recode(X, alleles, values = c(0, 1, 2), pos1 = 1, pos2 = 3, minor
= FALSE, verbose = FALSE)recode(X, alleles, values = c(0, 1, 2), pos1 = 1, pos2 = 3, minor
= FALSE, verbose = FALSE)

Arguments

`X`	A matrix or dataframe of bi-allelic markers, individuals in rows, markers in columns
`alleles`	a vector with the alleles for each marker (e.g. c("A/T", "A/G", etc))
`values`	a vector of numerica values for AA, AB and BB, ((0,1,2) by default).
`pos1`	position of the first allele in the allele string (1 by default).
`pos2`	position of the second allele in the allele string (3 by default).
`minor`	coding is according to the number of copies of the minor allele. if `minor = TRUE`, the value of 2 reflects two copies of the minor allele, and the value 0 reflects no copies of the minor allele.
`verbose`	print progress on the conversion or not.

Details

recode is written for bi-allelic marker data only. Heterozygotes may be coded both as AB or BA. By default, the second allele specified (e.g. "T" in "A/T") is counted in the recoding, and homozygotes AA are coded as 0 and homozygotes TT as 2.

Value

A numerical matrix, individuals in rows, markers in columns

Author(s)

Jan Graffelman [email protected]

Examples

SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG")
SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA")
SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT")
X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5)
Y <- recode(X,c("A/G","C/G","A/G","A/G","C/T"))
print(Y)
SNP1 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP2 <- c("CG","GG","CC","GG","GG","CG","CG","CG","CG")
SNP3 <- c("AA","AA","AA","AG","AA","AG","AA","AA","AA")
SNP4 <- c("GG","GG","GG","GG","GG","GG","GG","GG","GG")
SNP5 <- c("CC","CC","CC","CC","CC","CC","CT","CT","CT")
X <- cbind(SNP1,SNP2,SNP3,SNP4,SNP5)
Y <- recode(X,c("A/G","C/G","A/G","A/G","C/T"))
print(Y)

Shannon index

Description

Function shannon calculates the Shannon index and its variance for a vector of counts.

Usage

shannon(x)
shannon(x)

Arguments

`x`	a vector of counts

Value

`Hp`	the sample Shannon index
`VHp`	the sample variance of the Shannon index

Author(s)

Jan Graffelman ([email protected])

Examples

#
# Shannon index for allele frequencies of a biallelic MN blood group polymorphism
#
x <- c(MM=298,MN=489,NN=213)
p <- af(x)
shannon(c(p,1-p))$Hp
#
# Shannon index for the allele frequencies of an STR
#
data("NistSTRs")
AlleleTable <- table(c(NistSTRs[,1],NistSTRs[,2]))
AlleleFreq  <- AlleleTable/sum(AlleleTable)
shannon(AlleleFreq)$Hp
#
# Shannon index for allele frequencies of a biallelic MN blood group polymorphism
#
x <- c(MM=298,MN=489,NN=213)
p <- af(x)
shannon(c(p,1-p))$Hp
#
# Shannon index for the allele frequencies of an STR
#
data("NistSTRs")
AlleleTable <- table(c(NistSTRs[,1],NistSTRs[,2]))
AlleleFreq  <- AlleleTable/sum(AlleleTable)
shannon(AlleleFreq)$Hp

Sort tokens of a set of strings

Description

Function strsort collapses all tokens of a vector of strings in a single string with sorted tokens

Usage

strsort(s)
strsort(s)

Arguments

`s`	a vector of character strings

Value

a string

Author(s)

Jan Graffelman [email protected]

Examples

x <- c("AA","AB","BB","AC","CC")
print(strsort(x))
x <- c("AA","AB","BB","AC","CC")
print(strsort(x))

Convert theta to an inbreeding coefficient

Description

Function ThetatoF converts disequilibrium measure theta to an inbreeding coefficient.

Usage

ThetatoF(p, theta = 4)
ThetatoF(p, theta = 4)

Arguments

`p`	the allele frequency
`theta`	the disequilibrium parameter

Value

a real number

Author(s)

Jan Graffelman ([email protected])

References

Rohlfs, R.V. and Weir, B.S. (2008) Distributions of Hardy-Weinberg Equilibrium Test Statistics, Genetics, 180(3) pp. 1609-1616.

Examples

f <- ThetatoF(0.5,4)
f <- ThetatoF(0.5,4)

Convert a vector of genotype counts to triangular format

Description

Function toTriangular converts a named vector of genotype counts into a triangular matrix format, with homozygotes on the diagonal and heterozygotes below the diagonal.

Usage

toTriangular(x)
toTriangular(x)

Arguments

`x`	A vector of genotype counts

Value

a matrix

Author(s)

Jan Graffelman [email protected]

Examples

x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
print(x)
X <- toTriangular(x)
print(X)
x <- c(AA=20,AB=52,AC=34,BB=17,BC=51,CC=26)
print(x)
X <- toTriangular(x)
print(X)

Tri-allelic polymorphisms on the X chromosome of the TSI population

Description

This dataframe contains genotype counts for six three-allelic polymorphisms (A,B,C) on chromosome X of a sample of individuals from the TSI population (Tuscany, Italy) of the 1,000 genomes project.

Usage

data(TSIXTriAllelics)data(TSIXTriAllelics)

Format

A data frame with 6 observations on the following 10 variables.

ID: Identifier of the polymorphism
A: Male A genotype count
B: Male B genotype count
C: Male C genotype count
AA: Female AA genotype count
AB: Female AB genotype count
AC: Female AC genotype count
BB: Female BB genotype count
BC: Female BC genotype count
CC: Female CC genotype count

Source

Data taken from the 1,000 genomes project at www.internationalgenome.org

References

Graffelman, J. and Ortoleva, L. (2020) A network algorithm for the X chromosomal exact test for Hardy-Weinberg equilibrium with multiple alleles. Under review.

Examples

data(TSIXTriAllelics)
data(TSIXTriAllelics)

Extract unique genotypic compositions from a matrix

Description

Function UniqueGenotypeCounts creates a matrix containing only the unique rows in the given matrix, together with their frequency of occurrence

Usage

UniqueGenotypeCounts(X, verbose = TRUE)
UniqueGenotypeCounts(X, verbose = TRUE)

Arguments

`X`	A n by 3 matrix with genotypic counts (AA,AB,BB)
`verbose`	If TRUE then print some statistics

Value

A matrix with 4 columns, AA, AB, BB, and frequency of occurrence

Author(s)

Jan Graffelman [email protected]

Examples

set.seed(123)
X <- HWData(n=100,nm=100)
print(nrow(X))
Y <- UniqueGenotypeCounts(X)
print(nrow(Y))
print(sum(Y$w))
set.seed(123)
X <- HWData(n=100,nm=100)
print(nrow(X))
Y <- UniqueGenotypeCounts(X)
print(nrow(Y))
print(sum(Y$w))

Computes the sample variance of the allele frequency for a biallelic marker.

Description

Function vaf computes the sample variance of the allele frequencies of a single sample or a matrix of samples.

Usage

vaf(X, hw = FALSE)
vaf(X, hw = FALSE)

Arguments

`X`	vector or matrix with genotype counts (AA,AB,BB)
`hw`	assume Hardy-Weinberg proportions (`hw=TRUE`) or not (`hw=FALSE`)

Details

For biallelic markers the variance of the minor allele frequency equals the variance of the major allele frequency.

Value

a numeric vector of variances.

Author(s)

Jan Graffelman [email protected]

References

Weir, B.S. (1996) Genetic data analysis II. Sinauer Associates, Massachusetts. See Chapter 2.

Examples

x <- c(MM=298,MN=489,NN=213)
pA <- af(x)
vA <- vaf(x)
cat("allele frequency:",pA,"\n")
cat("sample variance allele frequency:",vA,"\n")
x <- c(MM=298,MN=489,NN=213)
pA <- af(x)
vA <- vaf(x)
cat("allele frequency:",pA,"\n")
cat("sample variance allele frequency:",vA,"\n")

Package 'HardyWeinberg'

Help Index

Statistical Tests and Graphics for Hardy-Weinberg Equilibrium

Description

Details

Author(s)

References

Examples

Function to compute allele frequencies

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Exact test of equality of allele frequencies for males and females

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Function to compute X-chromosomal allele frequencies

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Calculate allele and genotype counts for X-chromosomal markers

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Extract alleles

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Calculate triangular genotype matrix for vector(s) of alleles.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Genotype frequencies for 70 SNPs related to Alzheimer's disease

Description

Usage

Format

Source

References

Biallelic polymorphisms sampled from chromosome 22 of the CEU population of the 1000 Genomes project.

Description

Usage

Format

Details

Source

References

Examples

Calculate Graffelman-Weir exact density for bi-allelic X-chromosomal variant

Description

Usage

Arguments

Value