Title: | Haplotype Data Simulation |
---|---|
Description: | Package for haplotype-based genotype simulations. Haplotypes are generated such that their allele frequencies and linkage disequilibrium coefficients match those estimated from an input data set. |
Authors: | Giovanni Montana |
Maintainer: | Apostolos Dimitromanolakis <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.31 |
Built: | 2024-11-09 06:15:35 UTC |
Source: | CRAN |
ACE (angiotensin I converting enzyme) data set
data(ACEdata)
data(ACEdata)
A data set with 22 haplotypes and 52 SNPs.
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
Estimates allele frequencies from a binary matrix
allelefreqs(dat)
allelefreqs(dat)
dat |
A binary matrix, rows are haplotypes and columns are binary markers |
A list containing:
freqs |
Vector of allele "0" frequencies |
all.polym |
If TRUE, all loci are polymorphic |
non.polym |
Vector of non-polymorphic loci, if any |
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
data(ACEdata) x <- allelefreqs(ACEdata) hist(x$freqs)
data(ACEdata) x <- allelefreqs(ACEdata) hist(x$freqs)
Compute a measure of genetic diversity at each locus
divlocus(dat)
divlocus(dat)
dat |
A binary matrix, rows are haplotypes and columns are binary markers |
This function implements a measure of diversity for a locus
as in Clayton (2002). If
represents the allele
of haplotype
, for
and assuming that alleles are coded as 0 and 1, the
diversity measure can be written as
A vector containing the diversity measure for all markers
Giovanni Montana
D. Clayton. Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci. 2002. www-gene.cimr.cam.ac.uk/clayton/software/stata/htSNP/htsnp.pdf
data(ACEdata) divlocus(ACEdata)
data(ACEdata) divlocus(ACEdata)
Creates an haplotype data object needed for simulating haplotypes with haplosim
. This object also contains some summary statistics about the real data.
haplodata(dat)
haplodata(dat)
dat |
A binary matrix, rows are haplotypes and columns are binary markers |
A list containing:
freqs |
Allele frequencies |
cor |
Correlation matrix (LD coefficients) |
div |
Locus-specific diversity measure |
cov |
Covariance matrix for the normal distribution |
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
See also haplosim
data(ACEdata) # creates the haplotype object x <- haplodata(ACEdata) # simulates 100 random haplotypes y <- haplosim(100, x)
data(ACEdata) # creates the haplotype object x <- haplodata(ACEdata) # simulates 100 random haplotypes y <- haplosim(100, x)
Compute haplotype frequencies
haplofreqs(dat, firstl, lastl)
haplofreqs(dat, firstl, lastl)
dat |
A binary matrix, rows are haplotypes and columns are binary markers |
firstl |
Position of the first locus |
lastl |
Position of the last locus |
A vector of haplotype frequencies
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
data(ACEdata) freqs <- haplofreqs(ACEdata, 17, 22)
data(ACEdata) freqs <- haplofreqs(ACEdata, 17, 22)
Generates a random sample of haplotypes, given an haplotype object created from a data set
haplosim(n, hap, which.snp = NULL, seed = NULL, force.polym = TRUE, summary = TRUE)
haplosim(n, hap, which.snp = NULL, seed = NULL, force.polym = TRUE, summary = TRUE)
n |
Number of haplotypes to generate |
hap |
Haplotype object created with |
which.snp |
A vector specifying which SNPs to include |
seed |
Seed for the random number generator |
force.polym |
if TRUE, all loci are polymorphic |
summary |
if TRUE, additional summary statistics are returned |
A list containing:
data |
Simulated sample |
freqs |
Allele frequency vector |
cor |
Correlation matrix |
div |
Locus-specific diversity scores |
mse.freqs |
MSE of allele frequencies |
mse.cor |
MSE of correlations |
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
See also haplodata
# # Example 1 # data(ACEdata) # create the haplotype object x <- haplodata(ACEdata) # simulates a first sample of 100 haplotypes using all markers y1 <- haplosim(100, x) # compares allele frequencies in real and simulated samples plot(x$freqs, y1$freqs, title=paste("MSE:",y1$mse.freqs)); abline(a=0, b=1) # compares LD coefficients in real and simulated samples ldplot(mergemats(x$cor, y1$cor), ld.type='r') # simulates a second sample of 1000 haplotypes using the first 20 markers only y2 <- haplosim(1000, which.snp=seq(20), x) # # Example 2 # # simulate a sample of 500 haplotypes based on the ACE data set set.seed(100) data(ACEdata) n <- 500 x <- haplodata(ACEdata) y <- haplosim(n, x) # compute the haplotype frequencies # an haplotype starts at markers 17 and ends at marker 22 freq1 <- haplofreqs(ACEdata, 17, 22) freq2 <- haplofreqs(y$data, 17, 22) # extract the set of haplotypic configurations that are shared # by real and simulated data and their frequencies commonhapls <- intersect(names(freq1),names(freq2)) cfreq1 <- freq1[commonhapls] cfreq2 <- freq2[commonhapls] # compare real vs simulated haplotype frequencies par(mar=c(10.1, 4.1, 4.1, 2.1), xpd=TRUE) legend.text <- names(cfreq1) bp <- barplot(cbind(cfreq1,cfreq2), main="Haplotype Frequencies", names.arg=c("Real","Simulated"), col=heat.colors(length(legend.text))) legend(mean(range(bp)), -0.3, legend.text, xjust = 0.5, fill=heat.colors(length(legend.text)), horiz = TRUE) chisq.test(x=n*cfreq2, p=cfreq1, simulate.p.value = TRUE, rescale.p = TRUE)
# # Example 1 # data(ACEdata) # create the haplotype object x <- haplodata(ACEdata) # simulates a first sample of 100 haplotypes using all markers y1 <- haplosim(100, x) # compares allele frequencies in real and simulated samples plot(x$freqs, y1$freqs, title=paste("MSE:",y1$mse.freqs)); abline(a=0, b=1) # compares LD coefficients in real and simulated samples ldplot(mergemats(x$cor, y1$cor), ld.type='r') # simulates a second sample of 1000 haplotypes using the first 20 markers only y2 <- haplosim(1000, which.snp=seq(20), x) # # Example 2 # # simulate a sample of 500 haplotypes based on the ACE data set set.seed(100) data(ACEdata) n <- 500 x <- haplodata(ACEdata) y <- haplosim(n, x) # compute the haplotype frequencies # an haplotype starts at markers 17 and ends at marker 22 freq1 <- haplofreqs(ACEdata, 17, 22) freq2 <- haplofreqs(y$data, 17, 22) # extract the set of haplotypic configurations that are shared # by real and simulated data and their frequencies commonhapls <- intersect(names(freq1),names(freq2)) cfreq1 <- freq1[commonhapls] cfreq2 <- freq2[commonhapls] # compare real vs simulated haplotype frequencies par(mar=c(10.1, 4.1, 4.1, 2.1), xpd=TRUE) legend.text <- names(cfreq1) bp <- barplot(cbind(cfreq1,cfreq2), main="Haplotype Frequencies", names.arg=c("Real","Simulated"), col=heat.colors(length(legend.text))) legend(mean(range(bp)), -0.3, legend.text, xjust = 0.5, fill=heat.colors(length(legend.text)), horiz = TRUE) chisq.test(x=n*cfreq2, p=cfreq1, simulate.p.value = TRUE, rescale.p = TRUE)
Creates a linkage disequilibrium plot from a matrix of pair-wise LD coefficients
ldplot(ld.mat, ld.type, color = heat.colors(50), title = NULL)
ldplot(ld.mat, ld.type, color = heat.colors(50), title = NULL)
ld.mat |
A square matrix of LD coefficients |
ld.type |
A character value specifying what coefficients are used as input: either 'r' for correlation coefficients or 'd' for D/Dprime scores |
color |
A range of colors to be used for drawing. Default is |
title |
Character string for the title of the plot |
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.
data(ACEdata) # LD plot of ACEdata using r^2 coefficients ldplot(cor(ACEdata), ld.type='r')
data(ACEdata) # LD plot of ACEdata using r^2 coefficients ldplot(cor(ACEdata), ld.type='r')
Merges two LD matrices. It can be used to compare the LD coefficients estimated in the real and simulated data sets
mergemats(mat1, mat2)
mergemats(mat1, mat2)
mat1 |
First square matrix |
mat2 |
Second square matrix of same dimensions |
The resulting matrix has upper triangular matrix from mat1
and lower triangular matrix from mat2
Giovanni Montana
Montana, G. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. 2005.