Title: | Tools for Mapping of Quantitative Traits of Genetically Related Individuals and Calculating Identity Coefficients from Pedigrees |
---|---|
Description: | This software provides tools for quantitative trait mapping in populations such as advanced intercross lines where relatedness among individuals should not be ignored. It can estimate background genetic variance components, impute missing genotypes, simulate genotypes, perform a genome scan for putative quantitative trait loci (QTL), and plot mapping results. It also has functions to calculate identity coefficients from pedigrees, especially suitable for pedigrees that consist of a large number of generations, or estimate identity coefficients from genotypic data in certain circumstances. |
Authors: | Riyan Cheng [aut, cre] |
Maintainer: | Riyan Cheng <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.14 |
Built: | 2024-12-01 07:59:41 UTC |
Source: | CRAN |
Select genetic variance components via Akaike's information criterion (AIC).
aicVC(y, x, v = list(E=diag(length(y))), initpar, k = 2, init = 1, keep = 1, direction = c("forward", "backward"), nit = 25, msg = FALSE, control = list(), hessian = FALSE)
aicVC(y, x, v = list(E=diag(length(y))), initpar, k = 2, init = 1, keep = 1, direction = c("forward", "backward"), nit = 25, msg = FALSE, control = list(), hessian = FALSE)
y |
A numeric vector or a numeric matrix of one column (representing a phenotype for instance). |
x |
A data frame or matrix, representing covariates if not missing. |
v |
A list of variance components of interest. Note:
|
initpar |
Optional initial parameter values. |
k |
Penalty on a parameter. The selection criterion is the known "AIC" if |
init |
Indicates which variance components for the initial model. By default, |
keep |
Indicator of which variance components should be forced into the final model. By default, |
direction |
The mode of search. Either "forward" or "backward" with default "forward". |
nit |
Maximum number of iterations for optimization. Ignored if there are not more than two variance components. |
msg |
A logical variable. True if one wants to track the process for monitoring purpose. |
control |
A list of control parameters to be passed to |
hessian |
Logical. Should a numerically differentiated Hessian matrix be returned? |
In genome-wide association studies (GWAS), random effects are usually added to a model to account for polygenic variation. Abney et al (2000) showed that five variance components including the most interesting additive and dominance variance components are potentially induced by polygenes. The above function is intended for selecting variance components that contribute "most" to a quantitative trait.
Function estVC
is called by the above function to estimate the parameters and maximum likelihood in each model. Refer to estVC
for more information.
aic |
AIC of the final model. |
model |
Gives parameter estimates, log-likihood, and other information. |
lik |
Log-likelihood of the model selected at each intermediate step. |
trace |
Indicates which variance components were selected at each intermediate step. |
estVC
for more information.
data(miscEx) ## Not run: # forward selection # any variance component will be selected # if AIC improve by 1e-5 or larger pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) o<- aicVC(y=pheno$bwt, x=pheno$sex, k=0, v=v, msg=TRUE) o # forward selection of<- aicVC(y=pheno$bwt, x=pheno$sex, v=v, k=1/2, direction="for", msg=TRUE) of # backward elimination ob<- aicVC(y=pheno$bwt, x=pheno$sex, v=v, k=1/2, init=1:2, direction="back", msg=TRUE) ob ## End(Not run)
data(miscEx) ## Not run: # forward selection # any variance component will be selected # if AIC improve by 1e-5 or larger pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) o<- aicVC(y=pheno$bwt, x=pheno$sex, k=0, v=v, msg=TRUE) o # forward selection of<- aicVC(y=pheno$bwt, x=pheno$sex, v=v, k=1/2, direction="for", msg=TRUE) of # backward elimination ob<- aicVC(y=pheno$bwt, x=pheno$sex, v=v, k=1/2, init=1:2, direction="back", msg=TRUE) ob ## End(Not run)
Estimate the best linear unbiased prediction (BLUP) for various effects in the model.
blup(object)
blup(object)
object |
fixed |
BLUP for fixed effects. |
R , etc.
|
BLUP for random effects. |
data(miscEx) ## Not run: # only consider additive genetic variance component pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii],D=gmF8$DD[ii,ii]) vc<- estVC(y=pheno$bwt, x=pheno$sex, v=v) b<- blup(vc) ## End(Not run)
data(miscEx) ## Not run: # only consider additive genetic variance component pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii],D=gmF8$DD[ii,ii]) vc<- estVC(y=pheno$bwt, x=pheno$sex, v=v) b<- blup(vc) ## End(Not run)
Calculate Jacquard condensed identity coefficients from a pedigree.
cic(ped, ids, inter, df=3, ask = FALSE, msg = FALSE)
cic(ped, ids, inter, df=3, ask = FALSE, msg = FALSE)
ped |
A pedigree, which is a data frame (id, father/sire, mother/dam, ...). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ... If "sex" is included, male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If a founder is inbred, its ID should be tagged by character 'i' (e.g. 1i, 2i, etc.). Note: 0 is reserved for unknown |
ids |
IDs of the individuals for which to calculate the Jacquard condensed identity coefficients. If missing, all individuals in the pedigree |
inter |
Intermediate generations, if given, where coefficients are calculated bottom-up. |
df |
If |
ask |
If true, users will be asked whether to proceed. |
msg |
If true, will print out some messages. |
The coefficients will be calculated for individuals with IDs specified by ids
. All individuals will be considered if ids
is missing. This is not recommended if the total number of individuals in the pedigree is large. Instead, it is recommended that ids
is specified for interested individuals only
df
is a tuning parameter. It should not be 0 (or smaller than 1) if the pedigree is large in depth (many generations) but the number of individuals is not small; otherwise, it can take forever to finish. It should not be Inf
(or a large number) if the number of individuals in certain intermediate generation is very large.
Any individual without parent information is regarded as diallelic with two independent alleles. Users can add to their pedigree (e.g. 50 generations of selfing) if founders are inbred.
A matrix G with G[,j] being the j-th Jacquard identity coefficients.
You may need the administrative privilege to run this function on systems such as Windows 7. It may require your operating system support "long long" integer type in C++. If you run this function in a windows system, make sure the working directory is under system volume C and you have the write privilege.
It is better to remove the working directory if the program is interrupted by external forces (e.g. killed by users).
Warning: you may need to run this program on a 64-bit machine in case of seeing such a message!
Abney, M., M. S. McPeek, and C. Ober (2000). Estimation of variance components of quantitative traits in inbred populations. Am. J. Hum. Genet. 141, 629-650.
pedRecode
for more information.
data(miscEx) ids<- sample(pedF8$id[300:500],20) ## Not run: # run 'cic' for the sampled individuals # top-down oo<- cic(pedF8, ids=ids, df=Inf, msg=TRUE) # bottom-up o1<- cic(pedF8, ids=ids, df=0, msg=TRUE) # hybrid of top-down and bottom-up o2<- cic(pedF8, ids=ids, ask=TRUE, msg=TRUE) # same results c(sum(abs(oo-o1) >1e-7),sum(abs(o2-o1) >1e-7)) ## End(Not run)
data(miscEx) ids<- sample(pedF8$id[300:500],20) ## Not run: # run 'cic' for the sampled individuals # top-down oo<- cic(pedF8, ids=ids, df=Inf, msg=TRUE) # bottom-up o1<- cic(pedF8, ids=ids, df=0, msg=TRUE) # hybrid of top-down and bottom-up o2<- cic(pedF8, ids=ids, ask=TRUE, msg=TRUE) # same results c(sum(abs(oo-o1) >1e-7),sum(abs(o2-o1) >1e-7)) ## End(Not run)
Computes eigenvalues and eigenvectors of real symmetric matrices.
eigen.sym(x)
eigen.sym(x)
x |
A real symmetric matrix. |
This is to use the LAPACK routine 'DSYEVR' to perform spectral decomposition.
values |
a vector containing the eigenvalues of |
vectors |
a matrix whose columns contain the eigenvectors of |
Warning: symmetry is not checked by the program!
eigen
for more information.
Estimate model parameters for covariates, genetic variance components and residual effect.
estVC(y, x, v = list(E=diag(length(y))), initpar, nit = 25, method = c("ML", "REML"), control = list(), hessian = FALSE)
estVC(y, x, v = list(E=diag(length(y))), initpar, nit = 25, method = c("ML", "REML"), control = list(), hessian = FALSE)
y |
A numeric vector or a numeric matrix of one column (representing a phenotype for instance). |
x |
A data frame or matrix, representing covariates if not missing. |
v |
A list of matrices representing variance components of interest. Note:
|
initpar |
Optional initial parameter values. When provided, |
nit |
Maximum number of iterations for optimization. Ignored if there are not more than two variance components. |
method |
Either maximum likelihood (ML) or restricted maximum likelihood (REML). |
control |
A list of control parameters to be passed to |
hessian |
Logical. Should a numerically differentiated Hessian matrix be returned? |
The optimization function optim
is adopted in the above function to estimate the parameters and maximum likelihood. Several optimization methods are available for the optimization algorithm in optim
, but we recommend "Nelder-Mead" for the sake of stability. Alternatively, one may choose other options, e.g., "BFGS" to initialize and speed up the estimation procedure and then the procedure will automatically turn to "Nelder-Mead" for final results. If there is only one variance component (other than E
), optimize
will be used for optimization unless initpar
is provided.
Normality is assumed for the random effects. Input data should be free of missing values.
par |
estimates of the model parameters. |
value |
log-likelihood of the model. |
y |
y used. |
x |
associated with x used. |
v |
variance component matrices v used. |
... |
other information. |
Hessian matrix, if requested, pertains to -log-likelihood function.
data(miscEx) ## Not run: # no sex effect pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) o<- estVC(y=pheno$bwt, v=v) o # sex as fixed effect fo<- estVC(y=pheno$bwt, x=pheno$sex, v=v) fo 2*(fo$value-o$value) # log-likelihood test statistic # sex as random effect SM<- rem(~sex, data=pheno) ro<- estVC(y=pheno$bwt, v=c(v,list(Sex=SM$sex))) ro 2*(ro$value-o$value) # log-likelihood test statistic ## End(Not run)
data(miscEx) ## Not run: # no sex effect pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) o<- estVC(y=pheno$bwt, v=v) o # sex as fixed effect fo<- estVC(y=pheno$bwt, x=pheno$sex, v=v) fo 2*(fo$value-o$value) # log-likelihood test statistic # sex as random effect SM<- rem(~sex, data=pheno) ro<- estVC(y=pheno$bwt, v=c(v,list(Sex=SM$sex))) ro 2*(ro$value-o$value) # log-likelihood test statistic ## End(Not run)
Derive genetic matrices from Jacquard condensed identity coefficients or genotypic data.
genMatrix(x)
genMatrix(x)
x |
An object of |
AA |
Additive genetic matrix. |
DD |
Dominance genetic matrix. |
AD , HH , MH
|
Other three genetic matrices (see Abney et. al. 2000). |
ib |
Inbreeding coefficients. |
Abney, M., M. S. McPeek, and C. Ober (2000). Estimation of variance components of quantitative traits in inbred populations. Am. J. Hum. Genet. 141, 629-650.
data(miscEx) ids<- sample(pedF8$id[300:500],20) ## Not run: # get condensed identity coefficients oo<- cic(pedF8, ids=ids, df=0) ksp<- kinship(pedF8, ids=ids) # kinship coefficients only # extract genetic matrices gm<- genMatrix(oo) sum((gm$AA-2*ksp)>1e-7) # same results ## End(Not run)
data(miscEx) ids<- sample(pedF8$id[300:500],20) ## Not run: # get condensed identity coefficients oo<- cic(pedF8, ids=ids, df=0) ksp<- kinship(pedF8, ids=ids) # kinship coefficients only # extract genetic matrices gm<- genMatrix(oo) sum((gm$AA-2*ksp)>1e-7) # same results ## End(Not run)
Impute missing genotypic data in advance intercross lines (AIL).
genoImpute(gdat, gmap, step, prd = NULL, gr = 2, pos = NULL, method = c("Haldane", "Kosambi"), na.str = "NA", msg = FALSE)
genoImpute(gdat, gmap, step, prd = NULL, gr = 2, pos = NULL, method = c("Haldane", "Kosambi"), na.str = "NA", msg = FALSE)
gdat |
Genotype data. Should be a matrix or a data frame, with each row representing an observation and each column a marker locus. The column names should be marker names. Genotypes can be 1, 2 and 3, or "AA", "AB" and "BB". Optional if an object |
gmap |
A genetic map. Should be data frame (snp, chr, dist,...), where "snp" is the SNP (marker) name, "chr" is the chromosome where the "snp" is, and "dist" is the genetic distance in centi-Morgan (cM) from the left of the chromosome. |
step |
Optional. If specified, it is the maximum distance (in cM) between two adjacent loci for which the probabilities are calculated. The distance corresponds to the "cumulative" recombination rate at |
prd |
An object from |
gr |
The generation under consideration. |
pos |
Data frame (chr, dist, snp, ...). If given, |
method |
Whether "Haldane" or "Kosambi" mapping function should be used. |
na.str |
String for missing values. |
msg |
A logical variable. If TRUE, certain information will be printed out during calculation. |
The missing genotypic value is randomly assigned with a probability conditional on the genotypes of the flanking SNPs (makers).
An object, prd
, from genoProb
alone can be used for the purpose of imputation. Then, the output (especially the putative loci) will be determined by prd
. Optionally, it can be used together with gdat
so that missing values in gdat
will be imputed if possible, depending on whether loci in the columns of gdat
can be identified in the third dimension of prd
; this won't change the original genotypic data. See examples.
A matrix with the number of rows being the same as gdat
and with the number of columns depending on the SNP set in both gdat
and gmap
and the step
length.
Currently only suitable for advanced intercross lines.
data(miscEx) # briefly look at genotype data sum(is.na(gdatF8)) gdatF8[1:5,1:5] ## Not run: # run 'genoProb' gdtmp<- gdatF8 gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # imputation based on 'genoProb' object tmp<- genoImpute(prd=prDat) sum(is.na(tmp)) tmp[1:5,1:5] # imputation based on both genotype data and 'genoProb' object tmp<- genoImpute(gdatF8, prd=prDat) sum(is.na(tmp)) tmp[1:5,1:5] # imputation based on genotype data tmp<- genoImpute(gdatF8, gmap=gmapF8, gr=8, na.str=NA) sum(is.na(tmp)) tmp[1:5, 1:5] # set "msg=TRUE" for more information tmp<- genoImpute(gdatF8, gmap=gmapF8, gr=8, na.str=NA, msg=TRUE) sum(is.na(tmp)) tmp[1:5, 1:5] ## End(Not run)
data(miscEx) # briefly look at genotype data sum(is.na(gdatF8)) gdatF8[1:5,1:5] ## Not run: # run 'genoProb' gdtmp<- gdatF8 gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # imputation based on 'genoProb' object tmp<- genoImpute(prd=prDat) sum(is.na(tmp)) tmp[1:5,1:5] # imputation based on both genotype data and 'genoProb' object tmp<- genoImpute(gdatF8, prd=prDat) sum(is.na(tmp)) tmp[1:5,1:5] # imputation based on genotype data tmp<- genoImpute(gdatF8, gmap=gmapF8, gr=8, na.str=NA) sum(is.na(tmp)) tmp[1:5, 1:5] # set "msg=TRUE" for more information tmp<- genoImpute(gdatF8, gmap=gmapF8, gr=8, na.str=NA, msg=TRUE) sum(is.na(tmp)) tmp[1:5, 1:5] ## End(Not run)
Calculate the probability of a genotype at a locus conditional on the genotypes of its flanking markers in advance intercross lines (AIL).
genoProb(gdat, gmap, step, gr = 2, pos = NULL, method=c("Haldane", "Kosambi"), msg = FALSE)
genoProb(gdat, gmap, step, gr = 2, pos = NULL, method=c("Haldane", "Kosambi"), msg = FALSE)
gdat |
Genotype data. Should be a matrix or a data frame, with each row representing an observation and each column a marker locus. The column names should be marker names. Each entry should be 1, 2, 3 or 0, corresponding to "AA", "AB", "BB" or missing genotype. |
gmap |
A genetic map. Should be data frame (snp, chr, dist,...), where "snp" is the SNP (marker) name, "chr" is the chromosome where the "snp" is, and "dist" is the genetic distance in centi-Morgan (cM) from the left of the chromosome. |
step |
Optional. If specified, it is the maximum "cumulative" genetic distance (in cM) between two adjacent loci for which the probabilities are calculated. The genetic distance corresponds to the "cumulative" recombination rate at |
gr |
The generation under consideration. |
pos |
Data frame (chr, dist, snp, ...). If given, |
method |
Whether "Haldane" or "Kosambi" mapping function should be used. |
msg |
A logical variable. If TRUE, certain information will be printed out during calculation. |
The "cumulative" genetic distance between any two adjacent loci for which probabilities are calculated is not larger than step
. If step
is missing or step = Inf
, probabilities will only be calculated at loci in both the columns of gdat
and the rows of gmap
. If step
is small, a large set of putative loci will be considered, including all loci defined by the columns of gdat
and the rows of gmap
.
Probabilities for genotypes as well as genetic map information (snp,chr,dist)
pr |
A 3-D array with the first dimension corresponding to that of |
Currently only suitable for advanced intercross lines.
data(miscEx) ## Not run: # briefly look at genotype data sum(is.na(gdatF8)) gdatF8[1:5,1:5] gdtmp<- gdatF8 gdtmp<- replace(gdtmp,is.na(gdtmp),0) # In case an individual is not imputable, then # one needs to assign genotypes manually prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) prDat$pr[1:5,,1:5] ## End(Not run)
data(miscEx) ## Not run: # briefly look at genotype data sum(is.na(gdatF8)) gdatF8[1:5,1:5] gdtmp<- gdatF8 gdtmp<- replace(gdtmp,is.na(gdtmp),0) # In case an individual is not imputable, then # one needs to assign genotypes manually prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) prDat$pr[1:5,,1:5] ## End(Not run)
Simulate genotypic data from a pedigree in advanced intercross lines (AIL).
genoSim(ped, gmap, ids, hap, method = c("Haldane", "Kosambi"))
genoSim(ped, gmap, ids, hap, method = c("Haldane", "Kosambi"))
ped |
A pedigree, which is a data frame (id, sex, father/sire, mother/dam, ...). In "sex", male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ..., which should be in an increasing order. Note that 0 is reserved for missing values. If a father/mother is an inbred founder, its ID should be tagged by character 'i' (e.g. 1i, 2i, etc.). See |
gmap |
A genetic map. Should be data frame (snp, chr, dist, ...), where "snp" is the SNP (marker) name, "chr" is the chromosome where the "snp" is, and "dist" is the genetic distance in centi-Morgan (cM) from the left of the chromosome. If |
ids |
Genotypic data are extracted only for individuals with IDs specified by |
hap |
Founders' haplotype data if not missing. Rows correspond to founders as specified by row names, and columns correspond to loci in the genetic map |
method |
Whether "Haldane" or "Kosambi" mapping function should be used. This will be ignored if the recombination rate |
The pedigree should be in the same format as an output of pedRecode
. Sex chromosome should be marked
by 'x' or 'X'. Founders mean those whose parents have 0 or negative IDs after the pedigree is recoded by
pedRecode
. In addition, it is assumed that there are not more than two founders; otherwise,
you may run hapSim
and then extract genotypes manually.
A matrix, with entry value s-1 where s is the summation of the numbers representing two alleles at a locus. For instance, 1, 2, and 3 representing genotypes "AA", "AB" and "BB" respectively if hap
is not specified. Each row represent an observation, and each column corresponds to SNP in gmap
.
Sex may be used as a covariate if significance on x-chromosome is assessed by gene dropping through this function.
pedRecode
for more information.
data(miscEx) ## Not run: # simulate genotypes for F8 individuals ids<- sapply(pedF8$id[pedF8$gen == "F8" & pedF8$sire != "32089"], as.character) gdt<- genoSim(pedF8, gmapF8, ids=ids) dim(gdt) gdt[1:5,1:5] ## End(Not run)
data(miscEx) ## Not run: # simulate genotypes for F8 individuals ids<- sapply(pedF8$id[pedF8$gen == "F8" & pedF8$sire != "32089"], as.character) gdt<- genoSim(pedF8, gmapF8, ids=ids) dim(gdt) gdt[1:5,1:5] ## End(Not run)
Obtain estimates using generalized least squares (gls).
gls(formula, data, vc = NULL, test=c("none","F"))
gls(formula, data, vc = NULL, test=c("none","F"))
formula |
An object of class "formula": a symbolic description of the model to be fitted. |
data |
An data frame containing the variables in the model. |
vc |
An object from |
test |
Wheter F-test is performed. |
A matrix with columns: "Estimate", "Std. Error", "t value" and "Pr(>|t|)", or an ANOVA table if F-test is requested.
lm
.
Simulate gametic data from a pedigree.
hapSim(ped, gmap, ids, hap, method = c("Haldane", "Kosambi"))
hapSim(ped, gmap, ids, hap, method = c("Haldane", "Kosambi"))
ped |
A pedigree, which is a data frame (id, sex, father/sire, mother/dam, ...). In "sex", male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ..., which should be in an increasing order. Note that 0 is reserved for missing values. If a father/mother is an inbred founder, its ID should be tagged by character 'i' (e.g. 1i, 2i, etc.). See |
gmap |
A genetic map. Should be data frame (snp, chr, dist, ...), where "snp" is the SNP (marker) name, "chr" is the chromosome where the "snp" is, and "dist" is the genetic distance in centi-Morgan (cM) from the left of the chromosome. If |
ids |
Genotypic data are extracted only for individuals with IDs specified by |
hap |
Founders' haplotype data if not missing. Rows correspond to founders as specified by row names, and columns correspond to loci in the genetic map |
method |
Whether "Haldane" or "Kosambi" mapping function should be used. This will be ignored if the recombination rate |
The pedigree should be in the same format as an output of pedRecode
. Founders mean those whose parents
have 0 or negative IDs after the pedigree is recoded by pedRecode
.
A matrix giving haplotypes.
pedRecode
for more information.
data(miscEx) ## Not run: # prepare pedigree in desired format pedR<- pedRecode(pedF8) pedR[1:5,] # check to find out three founders # fake founder haplotypes hapDat<- rbind(rep(1:2,nrow(gmapF8)),rep(3:4,nrow(gmapF8)),rep(5:6,nrow(gmapF8))) rownames(hapDat)<- c("32089","1","2") # simulate hyplotypes for F8 individuals hd<- hapSim(pedF8, gmapF8, ids=pedF8$id[pedF8$gen=="F8"], hap=hapDat) dim(hd) hd[1:5,1:10] ## End(Not run)
data(miscEx) ## Not run: # prepare pedigree in desired format pedR<- pedRecode(pedF8) pedR[1:5,] # check to find out three founders # fake founder haplotypes hapDat<- rbind(rep(1:2,nrow(gmapF8)),rep(3:4,nrow(gmapF8)),rep(5:6,nrow(gmapF8))) rownames(hapDat)<- c("32089","1","2") # simulate hyplotypes for F8 individuals hd<- hapSim(pedF8, gmapF8, ids=pedF8$id[pedF8$gen=="F8"], hap=hapDat) dim(hd) hd[1:5,1:10] ## End(Not run)
Estimate Jacquard condensed identity coefficients by identity-by-state (IBS) from genotypic data.
ibs(x)
ibs(x)
x |
Genotype data with genotypes ("AA", "AB", "BB", or, 1, 2, 3) and without missing data, or probabilities for these genotypes (e.g., obtained by using |
A matrix G with G[,j] being the j-th Jacquard identity coefficients.
Currently only support the two-allele data.
Calculate kinship coefficients from a pedigree.
kinship(ped, ids, all = TRUE, msg = TRUE)
kinship(ped, ids, all = TRUE, msg = TRUE)
ped |
A pedigree, which a data frame (id, sire, dam, ...). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ... If "sex" is included, male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If a founder is inbred, its ID should be tagged by character 'i' (e.g. 1i, 2i, etc.). Note that 0 is reserved for missing values. |
ids |
IDs of the individuals. If given, kinship coefficients are extracted for individuals with ID |
all |
If false, sires and dams with no parents are treated as unknown. |
msg |
If false, messages are suppressed. |
A matrix giving kinship coefficients.
data(miscEx) ids<- sample(pedF8$id,10) ## Not run: ksp<- kinship(pedF8,ids=ids) ## End(Not run)
data(miscEx) ids<- sample(pedF8$id,10) ## Not run: ksp<- kinship(pedF8,ids=ids) ## End(Not run)
Estimate LOD support intervals.
lodci(llk, cv = 0, lod = 1.5, drop = 3)
lodci(llk, cv = 0, lod = 1.5, drop = 3)
llk |
A data frame with components (chr, dist, y, ...), where "chr" is the chromosome on which the scanning locus is located, "dist" is the genetic or physical position of the scanning locus, and "y" is the test statistic. |
cv |
Threshold. Reported support intervals cover at least one scanning locus where |
lod |
LOD (specified by |
drop |
3 by default. See "details". |
In case of multiple peaks on a chromosome, a peak has to satisfy: a) above the threshold cv
; b) drops, e.g., 3 LOD on both sides except chromosome ends. So if two peaks close to each other but LOD between them doesn't drop, e.g., 3 LOD, only one of them is considered.
A data frame with the following components:
chr |
The chromosome |
lower |
The lower bound |
upper |
The upper bound |
index |
Indicates which scanning loci |
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) # run 'genoProb' prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan llk.hk<- scanOne(y=pheno$bwt, x=pheno$sex, vc=o, prdat=prDat) # extract LOD support intervals tmp<- data.frame(y=llk.hk$LRT, chr=llk.hk$chr, dist=llk.hk$dist) lodci(tmp, cv=10, lod=1.5, drop=3) ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) # run 'genoProb' prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan llk.hk<- scanOne(y=pheno$bwt, x=pheno$sex, vc=o, prdat=prDat) # extract LOD support intervals tmp<- data.frame(y=llk.hk$LRT, chr=llk.hk$chr, dist=llk.hk$dist) lodci(tmp, cv=10, lod=1.5, drop=3) ## End(Not run)
Multiple QTL model selection by AIC criterion.
mAIC(y, x, gdat, prdat = NULL, vc = NULL, chrIdx, xin, k = 2, direction = c("both","backward","forward"), ext = FALSE, msg = FALSE)
mAIC(y, x, gdat, prdat = NULL, vc = NULL, chrIdx, xin, k = 2, direction = c("both","backward","forward"), ext = FALSE, msg = FALSE)
y |
A numeric vector or a numeric matrix of one column (representing a phenotype for instance). |
x |
A data frame or matrix, representing covariates if not missing. |
gdat |
Genotype data. Should be a matrix or a data frame, with each row representing an observation and each column a marker locus. The column names should be marker names. Numeric coding of genotype is treated as numeric. Ignored if |
vc |
An object from |
prdat |
An object from |
chrIdx |
Chromsome index of markers in columns of |
xin |
Vector indicating whether a locus is already in the model. |
k |
Penalty on a parameter. The selection criterion is the known "AIC" if |
direction |
The mode of search: "both", "forward" or "backward" with default "both". |
ext |
A logical variable. True if ones wants more exhaustive search. |
msg |
A logical variable. True if ones wants to track the process for monitoring purpose. |
Makes use of "Haley-Knott" method (Haley and Knott 1992) if prdat
is an object from genoProb
.
A list with the following components:
model
: the resulting model;
aic
: AIC of the model;
snp
: selected SNPs.
xin
: vector indicating whether a SNP is selected.
Currently only suitable for advanced intercross lines (or diallelic data).
Haley, C. S., and S. A. Knott (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdat.imp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # run 'genoProb' gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # genome scan llk.hk<- scanOne(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o) xin<- llk.hk$LRT > 10 # run 'mAIC' based on genome scan results mg<- mAIC(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o, xin=xin, k=5, direction="back", msg=TRUE) mg$model$value # likelihood of the final model ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdat.imp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # run 'genoProb' gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # genome scan llk.hk<- scanOne(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o) xin<- llk.hk$LRT > 10 # run 'mAIC' based on genome scan results mg<- mAIC(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o, xin=xin, k=5, direction="back", msg=TRUE) mg$model$value # likelihood of the final model ## End(Not run)
AIL F8 data include the following:
"gmF8": A list with elements inbreeding coefficients "ib", additive genetic matrix "AA", dominance genetic matrix "DD" and other genetic matrices.
"pedF8": Pedigree data.
"pedF8.1", "pedF8.2": Alternative versions of pedigree pedF8.
"gmapF8: Genetic map.
"gdatF8": Genotype data.
"pdatF8": Phenotype data.
data(miscEx)
data(miscEx)
A collection of other functions that are not needed by users.
Simulate the distribution of the test statistic by permutation (of genotypic data) or gene dropping.
nullSim(y, x, gdat, prdat, ped, gmap, hap, method = c("permutation","gene dropping"), vc = NULL, intc = NULL, numGeno = FALSE, test = c("None","F","LRT"), minorGenoFreq = 0.05, rmv = TRUE, gr = 2, ntimes = 1000)
nullSim(y, x, gdat, prdat, ped, gmap, hap, method = c("permutation","gene dropping"), vc = NULL, intc = NULL, numGeno = FALSE, test = c("None","F","LRT"), minorGenoFreq = 0.05, rmv = TRUE, gr = 2, ntimes = 1000)
y |
A numeric vector or a numeric matrix of one column (representing a phenotype for instance). |
x |
A data frame or matrix, representing covariates if not missing. |
gdat |
Genotype data without missing values. Should be a matrix or a data frame, with each row representing a sample and each column a marker locus. Ignored in the case of gene dropping. |
prdat |
An object from |
ped |
A pedigree, which is a data frame (id, sex, father/sire, mother/dam, ...). In "sex", male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ... Note that 0 is reserved for missing values. Ignored in the case of permutation. |
gmap |
A genetic map. Should be data frame (snp, chr, dist, ...), where "snp" is the SNP (marker) name, "chr" is the chromosome where the "snp" is, and "dist" is the genetic distance in centi-Morgan (cM) from the left of the chromosome. Ignored in the case of permutation. |
hap |
Founders' haplotype data if not missing. Rows correspond to all founders, which should be in the first places in the pedigree |
method |
Permutation or gene dropping. |
vc |
An object from |
intc |
Covariates that interact with QTL. |
numGeno |
Whether to treat numeric coding of genotypes as numeric. If true, |
test |
"None", "F" or "LRT". Note: the result will be on the scale of -log10(p-value) if the test is "F" or "LRT"; otherwise, the result will be the log-likelihood test statistic. Moerover, the result from each simulation is the maximum over all the SNPs/variants. Therefore, the user should make sure what is pertinent. |
minorGenoFreq |
Specify the minimum tolerable minor genotype frequency at a scanning locus if |
rmv |
A logical variable. If true, then the scanning locus will be skipped if the minor genotype frequency at the locus is smaller than |
gr |
The generation under consideration. |
ntimes |
Number of simulations. |
Two methods considered here are permutation test and gene dropping test as described as follows.
Permutation test: Depending on the genome-scan, one can provide either gdat
or prdat
respectively corresponding to single-marker analysis or interval mapping. Then only arguments in scanOne
are needed in addition to method
and ntimes
.
Gene dropping test: If prdat
is provided, then gdat
will be ignored. The procedure will first call genoSim
to generate new genotype data and then call genoProb
to generate data for Haley-Knott interval mapping. If prdat
is not provided, then gdat
should be provided. The procedure will generate new genotype data and scan the genome using these generated genotype data. Haldane mapping function is used to generate data.
A vector of length ntimes
, the n-th element of which is maximum of the test statistics (LRT or -log10(p-value)) over the n-th genome scan.
genoSim
, genoProb
and scanOne
.
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdatTmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # scan marker loci & permutation ex1<- nullSim(y=pheno$bwt, x=pheno$sex, gdat=gdatTmp, method="permutation", vc=o, ntimes=10) # Haley-Knott method & permutation gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) ex2<- nullSim(y=pheno$bwt, x=pheno$sex, prdat=prDat, method="permutation", vc=o, ntimes=10) # remove samples whose father is troublesome "32089" # before running gene dropping # otherwise, "hap" data needs to be supplied # scan marker loci & gene dropping idx<- is.element(rownames(pdatF8), pedF8$id[pedF8$sire=="32089"]) pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex) & !idx,] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdatTmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) ex3<- nullSim(y=pheno$bwt, x=pheno$sex, gdat=gdatTmp, ped=pedF8, gmap=gmapF8, method="gene", vc=o, ntimes=10) # Haley-Knott method & gene dropping gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) ex4<- nullSim(y=pheno$bwt, x=pheno$sex, prdat=prDat, ped=pedF8, gmap=gmapF8, method="gene", vc=o, gr=8, ntimes=10) ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdatTmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # scan marker loci & permutation ex1<- nullSim(y=pheno$bwt, x=pheno$sex, gdat=gdatTmp, method="permutation", vc=o, ntimes=10) # Haley-Knott method & permutation gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) ex2<- nullSim(y=pheno$bwt, x=pheno$sex, prdat=prDat, method="permutation", vc=o, ntimes=10) # remove samples whose father is troublesome "32089" # before running gene dropping # otherwise, "hap" data needs to be supplied # scan marker loci & gene dropping idx<- is.element(rownames(pdatF8), pedF8$id[pedF8$sire=="32089"]) pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex) & !idx,] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdatTmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA) o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) ex3<- nullSim(y=pheno$bwt, x=pheno$sex, gdat=gdatTmp, ped=pedF8, gmap=gmapF8, method="gene", vc=o, ntimes=10) # Haley-Knott method & gene dropping gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) ex4<- nullSim(y=pheno$bwt, x=pheno$sex, prdat=prDat, ped=pedF8, gmap=gmapF8, method="gene", vc=o, gr=8, ntimes=10) ## End(Not run)
Prepare a pedigree in a format that is suitable for certain functions
pedRecode(ped, ids, all = TRUE, msg = TRUE)
pedRecode(ped, ids, all = TRUE, msg = TRUE)
ped |
A pedigree, which is a data frame (id, father/sire, mother/dam, ...). If "sex" is a component, male should be "M", "Male" or 1, and female should be "F", "Female" or 2 (other than 0 and 1). If given, "generation" can be numeric 0, 1, 2, ... or non-numeric "F0", "F1", "F2", ..., which should be in an increasing order. Note: 0 is reserved for unknown |
ids |
If given, only individuals with |
all |
If false, fathers and mothers with no parents are treated as unknown. |
msg |
If false, messages are suppressed. |
This function is used in cic
, and it can be used for error checking with respect to sex and generation if sex and/or generation information is available. The actual values of generation can be anything but should correspond to the true order of generation; otherwise, cic
may fail or we may get incorrect results. Information except id
, father
and mother
is optional.
A recoded pedigree.
cic
.
data(miscEx) pedF8[1:10,] pedR<- pedRecode(pedF8) pedR[1:10,] dim(pedR) pedR<- pedRecode(pedF8, ids=pedF8$id[pedF8$gener=="F8"]) dim(pedR)
data(miscEx) pedF8[1:10,] pedR<- pedRecode(pedF8) pedR[1:10,] dim(pedR) pedR<- pedRecode(pedF8, ids=pedF8$id[pedF8$gener=="F8"]) dim(pedR)
Plot mapping results.
## S3 method for class 'scanOne' plot(x,...) plotit(lrt, cv, bychr = FALSE, chr.labels = TRUE, type = "p", lty = NULL, col = NULL, pch = NULL, cex = NULL, ...)
## S3 method for class 'scanOne' plot(x,...) plotit(lrt, cv, bychr = FALSE, chr.labels = TRUE, type = "p", lty = NULL, col = NULL, pch = NULL, cex = NULL, ...)
x |
|
lrt |
A data frame with (chr, dist, y,...) or (chr, dist, y, group,...), where "chr" represents chromosome, "dist" position on the chromosome, "y" the test statistic. |
cv |
Threshold to be drawn on the plot. |
cex |
See |
bychr |
A logical variable. If true, the plot will be displayed per chromosomes. |
chr.labels |
A logical variable. If true, the chromosome names will be drawn. |
type , lty , col , pch
|
See |
... |
Other options passed to R plot function. To call |
A genetic map 'gmap' may be needed to plot an object of scanOne
or scanTwo
. The color option may not give what is expected.
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdat.imp<- genoImpute(geno, gmap=gmapF8, step=Inf, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan llk<- scanOne(y=pheno$bwt, x=pheno$sex, vc=o, gdat=gdat.imp) # plotting plot(llk, gmap=gmapF8) # gmap is needed # plotting in another way idx<- match(colnames(gdat.imp), gmapF8$snp) tmp<- data.frame(chr=gmapF8$chr[idx],dist=gmapF8$dist[idx],y=llk$LRT) plotit(tmp, main="Mapping Plot", xlab="Chromosome", ylab="LRT", col=as.integer(tmp$ch)%%2+2,type="p") ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdat.imp<- genoImpute(geno, gmap=gmapF8, step=Inf, gr=8, na.str=NA) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan llk<- scanOne(y=pheno$bwt, x=pheno$sex, vc=o, gdat=gdat.imp) # plotting plot(llk, gmap=gmapF8) # gmap is needed # plotting in another way idx<- match(colnames(gdat.imp), gmapF8$snp) tmp<- data.frame(chr=gmapF8$chr[idx],dist=gmapF8$dist[idx],y=llk$LRT) plotit(tmp, main="Mapping Plot", xlab="Chromosome", ylab="LRT", col=as.integer(tmp$ch)%%2+2,type="p") ## End(Not run)
Quantile-Quantile Plots With the Ability to Draw Confidence Bands.
qqPlot(y, x = "norm", ..., type = "p", xlim = NULL, ylim = NULL, xlab = if(is.numeric(x)) deparse(substitute(x)) else x, ylab = deparse(substitute(y)),main="Q-Q Plot", col = 1, lty = 2, lwd = 1, pch = 1, cex = 0.7, plot.it = TRUE, confidence = .95, qqline = c("observed","expected","none"), add = FALSE)
qqPlot(y, x = "norm", ..., type = "p", xlim = NULL, ylim = NULL, xlab = if(is.numeric(x)) deparse(substitute(x)) else x, ylab = deparse(substitute(y)),main="Q-Q Plot", col = 1, lty = 2, lwd = 1, pch = 1, cex = 0.7, plot.it = TRUE, confidence = .95, qqline = c("observed","expected","none"), add = FALSE)
y |
A numeric vector of data values. |
x |
Either a numeric vector of data values, or a character string naming a distribution function such as "norm". |
... |
Parameters passed to the distribution specified by |
type |
1-character string giving the type of plot desired. |
xlim |
The x limits. |
ylim |
The y limits. |
xlab |
A label for the x axis. |
ylab |
A label for the y axis. |
main |
A main title for the plot. |
col |
Color for points and lines. |
lty |
Line type. |
lwd |
Line width. |
pch |
Plotting character for points. |
cex |
Factor for expanding the size of plotted symbols. |
plot.it |
Whether or not to draw a plot. if plotting, points outside the confidence bands will be indicated by different a color. |
confidence |
Confidence level for the confidence band, or |
qqline |
Whether or not to draw a reference line. if "observed", the line passes through the first and third observed quartiles; if "expected", the point (x,y) is expected to fall on the line if x and y follow the same distribution; if "none", no reference line is drawn. |
add |
Add to an existing plot if true. |
If x
is numeric, a two-sample test of the null hypothesis that x
and y
were drawn from the same continuous distribution is performed. Alternatively, x
can be a character string naming a continuous distribution function. In such a case, a one-sample test is carried out of the null that y
was draw from distribution x
with parameters specified by "...".
x |
Quantiles of |
y |
Quantiles of |
lower , upper
|
Lower and upper limits if |
George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software 8 (18): 1-4.
Vijayan N. Nair (1982). Q-Q plots with confidence bands for comparing several populations.
William J. Conover (1971). Practical Nonparametric Statistics. New York: John Wiley & Sons.
## Not run: par(mfrow=c(1,2)) x<- rnorm(200, mean=0.7,sd=2); y<- rnorm(200, sd=2) qqPlot(y,x,qqline="exp") qqPlot(y=y,x="norm",sd=2) ks.test(x,y) ## End(Not run)
## Not run: par(mfrow=c(1,2)) x<- rnorm(200, mean=0.7,sd=2); y<- rnorm(200, sd=2) qqPlot(y,x,qqline="exp") qqPlot(y=y,x="norm",sd=2) ks.test(x,y) ## End(Not run)
Convert the data for a QTL mapping experiment from the R/qtl format (see http://www.rqtl.org) to that used by QTLRel.
qtl2rel(cross)
qtl2rel(cross)
cross |
An object of class |
The input cross must by an intercross (class "f2"
).
Simple pedigree information is created, assuming the data are from a standard intercross.
A list with four components: "ped"
(pedigree information),
"gdat"
(genotype data), "pdat"
(phenotype data), and
"gmap"
(genetic map), in the formats used by QTLRel.
Karl W Broman, [email protected]
library(qtl) data(listeria) listeria <- listeria[as.character(1:19),] reldat <- qtl2rel(listeria)
library(qtl) data(listeria) listeria <- listeria[as.character(1:19),] reldat <- qtl2rel(listeria)
Estimate variance in a quantitative trait induced by QTL.
qtlVar(lrt, prdat, simulation = FALSE, nsim = 25)
qtlVar(lrt, prdat, simulation = FALSE, nsim = 25)
lrt |
A data frame (a, d, ...), where 'a' and 'd' are respectively additive and dominance effects. |
prdat |
A 3-D array that provides probabilities of genotypes "AA", "AB" and "BB". If |
simulation |
Whether to use simulations to estimate the variance explained by QTL. |
nsim |
Number of simulations to perform if |
A vector displaying the estimated variance at each loci.
Correlations among observations are ignored, and this function should be used with caution.
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) # rung 'genoProb' prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan pv.hk<- scanOne(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o) # run 'qtlVar' qef<- pv.hk$par[,c("a","d")] qef<- as.data.frame(qef) qv<- qtlVar(qef,prDat$pr) ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) gdtmp<- geno gdtmp<- replace(gdtmp,is.na(gdtmp),0) # rung 'genoProb' prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # genome scan pv.hk<- scanOne(y=pheno$bwt, x=pheno$sex, prdat=prDat, vc=o) # run 'qtlVar' qef<- pv.hk$par[,c("a","d")] qef<- as.data.frame(qef) qv<- qtlVar(qef,prDat$pr) ## End(Not run)
Convert the data for a QTL mapping experiment from the QTLRel format to that used by R/qtl (http://www.rqtl.org).
rel2qtl(gdat, pdat, gmap)
rel2qtl(gdat, pdat, gmap)
gdat |
Genotype data |
pdat |
Phenotype data |
gmap |
Genetic map |
Pedigree information is ignored, and X chromosome data is omitted.
The data are treated as an intercross.
A cross object for the R/qtl package (http://www.rqtl.org).
Karl W Broman, [email protected]
data(miscEx) f8 <- rel2qtl(gdatF8, pdatF8, gmapF8) summary(f8)
data(miscEx) f8 <- rel2qtl(gdatF8, pdatF8, gmapF8) summary(f8)
Construct matrices associated with random effects.
rem(formula,data)
rem(formula,data)
formula |
A formula of the form: ~ Z | G1/.../Gk + ..., corresponding to random effects Z*G_i + Z*G_{ij} + ... in a mixed effect model. If Z=1 as in most cases, then it can be ~ G1/.../Gk + ... |
data |
A data frame that contains all the variables in the formula. |
A list of matrices that are associated with random effects.
## Not run: # make-up example dat<- data.frame( group=c("A","A","A","A","A","A","B","B","B","B"), sex=c("F","F","F","M","M","M","F","F","M","M"), pass=c("Y","N","N","Y","Y","Y","Y","N","N","Y"), z=1:10) # random effect pass, group and sex, where sex is nested # within group: # y_{ijk} = x_{ij}*b + group_i + sex_{ij} + z*pass_{ij} # + e_{ijk} rem(~ group/sex + z|pass,data=dat) ## End(Not run)
## Not run: # make-up example dat<- data.frame( group=c("A","A","A","A","A","A","B","B","B","B"), sex=c("F","F","F","M","M","M","F","F","M","M"), pass=c("Y","N","N","Y","Y","Y","Y","N","N","Y"), z=1:10) # random effect pass, group and sex, where sex is nested # within group: # y_{ijk} = x_{ij}*b + group_i + sex_{ij} + z*pass_{ij} # + e_{ijk} rem(~ group/sex + z|pass,data=dat) ## End(Not run)
Likelihood ratio tests or F tests at scanning loci over the genome.
scanOne(y, x, gdat, prdat = NULL, vc = NULL, intc = NULL, numGeno = FALSE, test = c("None","F","LRT"), minorGenoFreq = 0, rmv = TRUE)
scanOne(y, x, gdat, prdat = NULL, vc = NULL, intc = NULL, numGeno = FALSE, test = c("None","F","LRT"), minorGenoFreq = 0, rmv = TRUE)
y |
A numeric vector or a numeric matrix of one column, representing a phenotype. |
x |
A data frame or matrix, representing covariates if not missing. |
gdat |
Genotype data. It should be a matrix or a data frame, with each row being a sample and each column a locus. The column names should be marker names. It is ignored if an object If |
prdat |
An object from |
vc |
An object from |
intc |
Covariates that interact with QTL. |
numGeno |
Whether to treat numeric coding of genotypes as numeric. If true, |
test |
"None", "F" or "LRT". |
minorGenoFreq |
Specify the minimum tolerable minor genotype frequency at a scanning locus if |
rmv |
A logical variable. If true, then the scanning locus will be skipped if the minor genotype frequency at the locus is smaller than |
The test at a scanning locus under the null hypothesis of no QTL effect is performed by conditioning on the polygenic genetic variance-covariance. Normality is assumed for the random effects.
It is possible to extend the Haley-Knott approach to multiple-allelic cases under the assumption that allelic effects are all additive. Then, prdat
should be provided and be of class "addEff".
A list with at least the following components:
F or LRT |
the F-test or likelihood ratio test (LRT) statistic at the SNP (marker) if |
pval |
P-value at the snp (marker) if |
v |
Variation explained by the SNP (marker) |
parameters |
Estimated parameters at all scanning loci, including additive effect |
Haley, C. S., and S. A. Knott (1992). A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.
genoImpute
and genoProb
.
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # impute missing genotypes gdtmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA, msg=FALSE) # genome scan and plotting lrt<- scanOne(y=pheno$bwt, x=pheno$sex, gdat=gdtmp, vc=o) lrt plot(lrt,gmap=gmapF8) # Haley-Knott method gdtmp<- geno; unique(unlist(gdtmp)) gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) pv.hk<- scanOne(y=pheno$bwt, intc=pheno$sex, prdat=prDat, vc=o, test="F") pv.hk plot(pv.hk, gmap=gmapF8) # assume additive allelic effects class(prDat)<- c(class(prDat), "addEff") lrt.hk<- scanOne(y=pheno$bwt, intc=pheno$sex, prdat=prDat, vc=o) lrt.hk ## End(Not run)
data(miscEx) ## Not run: # impute missing genotypes pheno<- pdatF8[!is.na(pdatF8$bwt) & !is.na(pdatF8$sex),] ii<- match(rownames(pheno), rownames(gdatF8)) geno<- gdatF8[ii,] ii<- match(rownames(pheno), rownames(gmF8$AA)) v<- list(A=gmF8$AA[ii,ii], D=gmF8$DD[ii,ii]) # estimate variance components o<- estVC(y=pheno$bwt, x=pheno$sex, v=v) # impute missing genotypes gdtmp<- genoImpute(geno, gmap=gmapF8, gr=8, na.str=NA, msg=FALSE) # genome scan and plotting lrt<- scanOne(y=pheno$bwt, x=pheno$sex, gdat=gdtmp, vc=o) lrt plot(lrt,gmap=gmapF8) # Haley-Knott method gdtmp<- geno; unique(unlist(gdtmp)) gdtmp<- replace(gdtmp,is.na(gdtmp),0) prDat<- genoProb(gdat=gdtmp, gmap=gmapF8, gr=8, method="Haldane", msg=TRUE) pv.hk<- scanOne(y=pheno$bwt, intc=pheno$sex, prdat=prDat, vc=o, test="F") pv.hk plot(pv.hk, gmap=gmapF8) # assume additive allelic effects class(prDat)<- c(class(prDat), "addEff") lrt.hk<- scanOne(y=pheno$bwt, intc=pheno$sex, prdat=prDat, vc=o) lrt.hk ## End(Not run)
Evaluate log-likelihood ratio test statistic for epistasis (QTL by QTL interaction).
scanTwo(y, x, gdat, prdat = NULL, vc = NULL, numGeno = FALSE, minorGenoFreq = 0, rmv = TRUE)
scanTwo(y, x, gdat, prdat = NULL, vc = NULL, numGeno = FALSE, minorGenoFreq = 0, rmv = TRUE)
y |
A numeric vector or a numeric matrix of one column (representing a phenotype for instance). |
x |
A data frame or matrix, representing covariates if not missing. |
gdat |
Genotype data. Should be a matrix or a data frame, with each row representing an observation and each column a marker locus. The column names should be marker names. Optional if an object |
prdat |
An object from |
vc |
An object from |
numGeno |
Whether to treat numeric coding of genotypes as numeric. If true, |
minorGenoFreq |
Specify the minimum tolerable minor genotype frequency at a scanning locus if |
rmv |
A logical variable. If true, then the scanning locus will be skipped if the minor genotype frequency at the locus is smaller than |
A matrix whose entry in the upper triangle is the log-likelihood test statistic for epistatic effect.