Package 'hapsim' reference manual

Title:	Haplotype Data Simulation
Description:	Package for haplotype-based genotype simulations. Haplotypes are generated such that their allele frequencies and linkage disequilibrium coefficients match those estimated from an input data set.
Authors:	Giovanni Montana
Maintainer:	Apostolos Dimitromanolakis <[email protected]>
License:	GPL (>= 2)
Version:	0.31
Built:	2024-11-09 06:15:35 UTC
Source:	CRAN

ACE data set

Description

ACE (angiotensin I converting enzyme) data set

Usage

data(ACEdata)data(ACEdata)

Format

A data set with 22 haplotypes and 52 SNPs.

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Estimates allele frequencies

Description

Estimates allele frequencies from a binary matrix

Usage

allelefreqs(dat)allelefreqs(dat)

Arguments

dat

A binary matrix, rows are haplotypes and columns are binary markers

Value

A list containing:

`freqs`	Vector of allele "0" frequencies
`all.polym`	If TRUE, all loci are polymorphic
`non.polym`	Vector of non-polymorphic loci, if any

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples


data(ACEdata)
x <- allelefreqs(ACEdata)
hist(x$freqs)

data(ACEdata)
x <- allelefreqs(ACEdata)
hist(x$freqs)

Diversity score

Description

Compute a measure of genetic diversity at each locus

Usage

divlocus(dat)
divlocus(dat)

Arguments

dat

A binary matrix, rows are haplotypes and columns are binary markers

Details

This function implements a measure of diversity for a locus $j$ as in Clayton (2002). If $z_ij$ represents the allele $j$ of haplotype $i$ , for $i=1,...,N$ and assuming that alleles are coded as 0 and 1, the diversity measure can be written as

$D_j = 2*N( \sum_{i=1}^N z_{ij}^2 - (\sum_{i=1}^N z_{ij}) ^2 )$

Value

A vector containing the diversity measure for all markers

Author(s)

Giovanni Montana

References

D. Clayton. Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci. 2002. www-gene.cimr.cam.ac.uk/clayton/software/stata/htSNP/htsnp.pdf

Examples

data(ACEdata)
divlocus(ACEdata)
data(ACEdata)
divlocus(ACEdata)

Haplotype object creator

Description

Creates an haplotype data object needed for simulating haplotypes with haplosim. This object also contains some summary statistics about the real data.

Usage

haplodata(dat)haplodata(dat)

Arguments

dat

A binary matrix, rows are haplotypes and columns are binary markers

Value

A list containing:

`freqs`	Allele frequencies
`cor`	Correlation matrix (LD coefficients)
`div`	Locus-specific diversity measure
`cov`	Covariance matrix for the normal distribution

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples


data(ACEdata)

# creates the haplotype object
x <- haplodata(ACEdata) 

# simulates 100 random haplotypes
y <- haplosim(100, x) 

data(ACEdata)

# creates the haplotype object
x <- haplodata(ACEdata) 

# simulates 100 random haplotypes
y <- haplosim(100, x)

Haplotype frequencies

Description

Compute haplotype frequencies

Usage

haplofreqs(dat, firstl, lastl)
haplofreqs(dat, firstl, lastl)

Arguments

`dat`	A binary matrix, rows are haplotypes and columns are binary markers
`firstl`	Position of the first locus
`lastl`	Position of the last locus

Value

A vector of haplotype frequencies

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples

data(ACEdata)
freqs <- haplofreqs(ACEdata, 17, 22)
data(ACEdata)
freqs <- haplofreqs(ACEdata, 17, 22)

Haplotype data simulator

Description

Generates a random sample of haplotypes, given an haplotype object created from a data set

Usage

haplosim(n, hap, which.snp = NULL, seed = NULL, force.polym = TRUE, summary = TRUE)haplosim(n, hap, which.snp = NULL, seed = NULL, force.polym = TRUE, summary = TRUE)

Arguments

`n`	Number of haplotypes to generate
`hap`	Haplotype object created with `haplodata`
`which.snp`	A vector specifying which SNPs to include
`seed`	Seed for the random number generator
`force.polym`	if TRUE, all loci are polymorphic
`summary`	if TRUE, additional summary statistics are returned

Value

A list containing:

`data`	Simulated sample
`freqs`	Allele frequency vector
`cor`	Correlation matrix
`div`	Locus-specific diversity scores
`mse.freqs`	MSE of allele frequencies
`mse.cor`	MSE of correlations

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples


#
# Example 1
#

data(ACEdata)

# create the haplotype object 
x <- haplodata(ACEdata) 

# simulates a first sample of 100 haplotypes using all markers
y1 <- haplosim(100, x) 

# compares allele frequencies in real and simulated samples
plot(x$freqs, y1$freqs, title=paste("MSE:",y1$mse.freqs)); abline(a=0, b=1)

# compares LD coefficients in real and simulated samples
ldplot(mergemats(x$cor, y1$cor), ld.type='r') 

# simulates a second sample of 1000 haplotypes using the first 20 markers only
y2 <- haplosim(1000, which.snp=seq(20), x) 

#
# Example 2
#

# simulate a sample of 500 haplotypes based on the ACE data set
set.seed(100)
data(ACEdata)
n <- 500
x <- haplodata(ACEdata)
y <- haplosim(n, x)

# compute the haplotype frequencies
# an haplotype starts at markers 17 and ends at marker 22
freq1 <- haplofreqs(ACEdata, 17, 22)
freq2 <- haplofreqs(y$data, 17, 22)

# extract the set of haplotypic configurations that are shared
# by real and simulated data and their frequencies
commonhapls <- intersect(names(freq1),names(freq2)) 
cfreq1 <- freq1[commonhapls]
cfreq2 <- freq2[commonhapls]

# compare real vs simulated haplotype frequencies
par(mar=c(10.1, 4.1, 4.1, 2.1), xpd=TRUE)
legend.text <- names(cfreq1)
bp <- barplot(cbind(cfreq1,cfreq2), main="Haplotype Frequencies",
       names.arg=c("Real","Simulated"), col=heat.colors(length(legend.text)))
legend(mean(range(bp)), -0.3, legend.text, xjust = 0.5,
       fill=heat.colors(length(legend.text)), horiz = TRUE)
chisq.test(x=n*cfreq2, p=cfreq1, simulate.p.value = TRUE, rescale.p = TRUE)

#
# Example 1
#

data(ACEdata)

# create the haplotype object 
x <- haplodata(ACEdata) 

# simulates a first sample of 100 haplotypes using all markers
y1 <- haplosim(100, x) 

# compares allele frequencies in real and simulated samples
plot(x$freqs, y1$freqs, title=paste("MSE:",y1$mse.freqs)); abline(a=0, b=1)

# compares LD coefficients in real and simulated samples
ldplot(mergemats(x$cor, y1$cor), ld.type='r') 

# simulates a second sample of 1000 haplotypes using the first 20 markers only
y2 <- haplosim(1000, which.snp=seq(20), x) 

#
# Example 2
#

# simulate a sample of 500 haplotypes based on the ACE data set
set.seed(100)
data(ACEdata)
n <- 500
x <- haplodata(ACEdata)
y <- haplosim(n, x)

# compute the haplotype frequencies
# an haplotype starts at markers 17 and ends at marker 22
freq1 <- haplofreqs(ACEdata, 17, 22)
freq2 <- haplofreqs(y$data, 17, 22)

# extract the set of haplotypic configurations that are shared
# by real and simulated data and their frequencies
commonhapls <- intersect(names(freq1),names(freq2)) 
cfreq1 <- freq1[commonhapls]
cfreq2 <- freq2[commonhapls]

# compare real vs simulated haplotype frequencies
par(mar=c(10.1, 4.1, 4.1, 2.1), xpd=TRUE)
legend.text <- names(cfreq1)
bp <- barplot(cbind(cfreq1,cfreq2), main="Haplotype Frequencies",
       names.arg=c("Real","Simulated"), col=heat.colors(length(legend.text)))
legend(mean(range(bp)), -0.3, legend.text, xjust = 0.5,
       fill=heat.colors(length(legend.text)), horiz = TRUE)
chisq.test(x=n*cfreq2, p=cfreq1, simulate.p.value = TRUE, rescale.p = TRUE)

LD plot

Description

Creates a linkage disequilibrium plot from a matrix of pair-wise LD coefficients

Usage

ldplot(ld.mat, ld.type, color = heat.colors(50), title = NULL)ldplot(ld.mat, ld.type, color = heat.colors(50), title = NULL)

Arguments

`ld.mat`	A square matrix of LD coefficients
`ld.type`	A character value specifying what coefficients are used as input: either 'r' for correlation coefficients or 'd' for D/Dprime scores
`color`	A range of colors to be used for drawing. Default is `heat.colors`
`title`	Character string for the title of the plot

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Examples


data(ACEdata)

# LD plot of ACEdata using r^2 coefficients
ldplot(cor(ACEdata), ld.type='r') 

data(ACEdata)

# LD plot of ACEdata using r^2 coefficients
ldplot(cor(ACEdata), ld.type='r')

Merges two LD matrices

Description

Merges two LD matrices. It can be used to compare the LD coefficients estimated in the real and simulated data sets

Usage

mergemats(mat1, mat2)mergemats(mat1, mat2)

Arguments

`mat1`	First square matrix
`mat2`	Second square matrix of same dimensions

Value

The resulting matrix has upper triangular matrix from mat1 and lower triangular matrix from mat2

Author(s)

Giovanni Montana

References

Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.

Package 'hapsim'

Help Index

ACE data set

Description

Usage

Format

References

Estimates allele frequencies

Description

Usage

Arguments

Value

Author(s)

References

Examples

Diversity score

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Haplotype object creator

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Haplotype frequencies

Description

Usage

Arguments

Value

Author(s)

References

Examples

Haplotype data simulator

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

LD plot

Description

Usage

Arguments

Author(s)

References

Examples

Merges two LD matrices

Description

Usage

Arguments

Value

Author(s)

References