Title: | Bayesian QTL Mapping Toolkit |
---|---|
Description: | QTL mapping toolkit for inbred crosses and recombinant inbred lines. Includes maximum likelihood and Bayesian tools. |
Authors: | Charles C. Berry [aut, cre] |
Maintainer: | Charles C. Berry <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-38 |
Built: | 2024-11-24 06:50:37 UTC |
Source: | CRAN |
Some pointers to a few key functions in BQTL
Be sure to check out all of the free documentation that comes with R.
The example
function is very
helpful in getting familiar with a new function. You type
example(fun)
and the examples in the documentation for
fun
are run, then you can read the documentaiton to get a bette
sense of what is really going on. My personal favorite is to type
par(ask=T)
, hit the 'enter' key, then example(image)
,
and 'enter' again; after each display you hit the 'enter' key to get
to the next one.
library(bqtl)
is needed to load the BQTL functions and data
sets.
\
make.map.frame
defines the map,
marker.levels
The help page describes several functions that define the coding scheme for marker levels,
make.analysis.obj
combines marker data, phenotype
data,and the map.frame
to create an object that can be
used by data analysis functions.
\
bqtl
does a host of things from marker regression
and interval mapping to full maximum likelihood. The best way to
get started is to run example(bqtl)
and take a look at
the resulting output.
locus
is very helpful in specification of runs.
\
linear.bayes
For a good starting point try example(linear.bayes)
Charles C. Berry [email protected]
The approximation provided by linear.bayes
can be improved by
performing Laplace approximations. This function is a development
version of a wrapper to do that for all of the returned by
linear.bayes
.
adjust.linear.bayes(lbo, ana.obj=lbo$call$ana.obj, ...)
adjust.linear.bayes(lbo, ana.obj=lbo$call$ana.obj, ...)
lbo |
The object returned by |
ana.obj |
The |
... |
currently unused |
A list of class "adjust.linear.bayes"
containing:
odds |
A vector, typically of length k giving the odds for models of size 1, 2, ..., k under a uniform posterior relative to a model with no genes. |
loc.posterior |
The marginal posterior probabilities by locus |
coefficients |
The marginal posterior means of the coefficients |
one.gene.adj |
Results of fits for one gene models |
n.gene.adj |
Results of fits for modles with more than one gene |
call |
the call to |
For large linear.bayes
objects invloving many gene models,
this can require a very long time to run.
Charles C. Berry [email protected]
Find maximum likelihood estimate(s) or posterior mode(s) for QTL model(s). Use Laplace approximation to determine the posterior mass associated with the model(s).
bqtl(reg.formula, ana.obj, scope = ana.obj$reg.names, expand.specials = NULL, ...)
bqtl(reg.formula, ana.obj, scope = ana.obj$reg.names, expand.specials = NULL, ...)
reg.formula |
A formula.object like |
ana.obj |
The result of |
scope |
passed to |
expand.specials |
passed to |
... |
Arguments to pass to |
This function is a wrapper for lapadj
. It does a lot
of useful packaging through the configs
terms. If there
is no configs
term, then the result is simply the output of
lapadj
with the call
attribute replaced by the
call to bqtl
The result(s) of calling lapadj
.
If configs
is used in the reg.formula
, then the
result is a list with one element for each formula. Each element is the
value returned by lapadj
Charles C. Berry [email protected]
Tierney L. and Kadane J.B. (1986) Accurate Approximations for Posterior Moments and Marginal Densities. JASA, 81,82–86.
data(little.ana.bc ) # load BC1 dataset loglik( bqtl( bc.phenotype ~ 1, little.ana.bc ) ) #null loglikelihood #on chr 1 near cM 25 loglik(bqtl(bc.phenotype~locus(chromo=1,cM=25),little.ana.bc)) little.bqtl <- # two genes with epistasis bqtl(bc.phenotype ~ m.12 * m.24, little.ana.bc) summary(little.bqtl) several.epi <- # 20 epistatic models bqtl( bc.phenotype ~ m.12 * locus(31:50), little.ana.bc) several.main <- # main effects only bqtl( bc.phenotype ~ m.12 + locus(31:50), little.ana.bc) max.loglik <- max( loglik(several.epi) - loglik(several.main) ) round( c( Chi.Square=2*max.loglik, df=1, p.value=1-pchisq(2*max.loglik,1)) ,2) five.gene <- ## a five gene model bqtl( bc.phenotype ~ locus( 12, 32, 44, 22, 76 ), little.ana.bc , return.hess=TRUE ) regr.coef.table <- summary(five.gene)$coefficients round( regr.coef.table[,"Value"] + # coefs inside 95% CI qnorm(0.025) * regr.coef.table[,"Std.Err"] %o% c("Lower CI"=1,"Estimate"=0,"Upper CI"=-1),3)
data(little.ana.bc ) # load BC1 dataset loglik( bqtl( bc.phenotype ~ 1, little.ana.bc ) ) #null loglikelihood #on chr 1 near cM 25 loglik(bqtl(bc.phenotype~locus(chromo=1,cM=25),little.ana.bc)) little.bqtl <- # two genes with epistasis bqtl(bc.phenotype ~ m.12 * m.24, little.ana.bc) summary(little.bqtl) several.epi <- # 20 epistatic models bqtl( bc.phenotype ~ m.12 * locus(31:50), little.ana.bc) several.main <- # main effects only bqtl( bc.phenotype ~ m.12 + locus(31:50), little.ana.bc) max.loglik <- max( loglik(several.epi) - loglik(several.main) ) round( c( Chi.Square=2*max.loglik, df=1, p.value=1-pchisq(2*max.loglik,1)) ,2) five.gene <- ## a five gene model bqtl( bc.phenotype ~ locus( 12, 32, 44, 22, 76 ), little.ana.bc , return.hess=TRUE ) regr.coef.table <- summary(five.gene)$coefficients round( regr.coef.table[,"Value"] + # coefs inside 95% CI qnorm(0.025) * regr.coef.table[,"Std.Err"] %o% c("Lower CI"=1,"Estimate"=0,"Upper CI"=-1),3)
Internal bqtl functions and objects
x %equiv% y map.dx(lambda, theta, min.lambda) rhs.bqtl(reg.terms, ana.obj, bqtl.specials, local.covar, scope, expand.specials = NULL, method, ...) zero.dup(x,dig=6) uniq.config(swap.obj)
x %equiv% y map.dx(lambda, theta, min.lambda) rhs.bqtl(reg.terms, ana.obj, bqtl.specials, local.covar, scope, expand.specials = NULL, method, ...) zero.dup(x,dig=6) uniq.config(swap.obj)
lambda |
(2*(recomb fraction-1/2) |
theta |
recomb fraction |
min.lambda |
smallest map distance to use |
reg.terms |
a formula |
ana.obj |
an analysis.object |
bqtl.specials |
a vector of acceptable special names |
local.covar |
a function |
scope |
vector of strings |
expand.specials |
logical,whether to use expand.grid on the loci |
method |
e'g' "F2", "BC!", etc |
... |
not sure |
swap.obj |
result of swap |
x |
numeric vector or matrix |
y |
numeric vector or matrix |
dig |
how many significant digits to use |
These are not to be called by the user.
For a single type of model, this function evaluates multiple models that
differ only in terms of the loci involved. The looping is all done by
internal C functions, so this is faster than simply using bqtl
to
do the same thing.
bqtl.fitter(setup, loc.mat, ana.obj)
bqtl.fitter(setup, loc.mat, ana.obj)
setup |
The object returned by |
loc.mat |
A matrix of locus numbers, s.t. |
ana.obj |
An |
In order to avoid the computational overhead of running large loops of
very repetitive operations in R/S, bqtl.fitter
used after the
setup=TRUE
option in bqtl
will loop through the
loci specified in loc.mat
using internal C code. This is many
times faster than running the same code via bqtl
.
For now it only returns the loglikelihood. But it would be trivial to build an option that would allow other quantities computed to be returned, and this should probably be done. However, some care is needed to keep objects from becoming unmanageably large.
Charles C. Berry [email protected]
data( little.ana.bc ) little.setup <- bqtl( bc.phenotype~locus(1)*locus(2), little.ana.bc, setup=TRUE ) combos <- t( as.matrix( expand.grid( 1:21, 44:64 ) ) ) little.update <- bqtl.fitter(little.setup, combos, little.ana.bc) little.res <- matrix( little.update, nr=21 ) image( 1:21, 44:64, little.res ) rm(little.ana.bc, little.update, little.res )
data( little.ana.bc ) little.setup <- bqtl( bc.phenotype~locus(1)*locus(2), little.ana.bc, setup=TRUE ) combos <- t( as.matrix( expand.grid( 1:21, 44:64 ) ) ) little.update <- bqtl.fitter(little.setup, combos, little.ana.bc) little.res <- matrix( little.update, nr=21 ) image( 1:21, 44:64, little.res ) rm(little.ana.bc, little.update, little.res )
Return a vector or matrix of coefficients as appropriate
## S3 method for class 'bqtl' coef(object,...)
## S3 method for class 'bqtl' coef(object,...)
object |
The object returned by |
... |
ignored |
A vector (if bqtl
returned a single fit) or matrix (if
bqtl
returned a list with more than one fit)
Charles C. Berry [email protected]
Convert numeric indexes to names of regressors for a genetic model.
One or many genetic models can be specified through the use of this
function. It is used on the right hand side of a formula in the
bqtl
function.
configs(x,...,scope, method = NULL)
configs(x,...,scope, method = NULL)
x |
Typically an integer, an integer vector, an array, or a list with a
|
... |
Optional arguments to be used when |
scope |
(Optional and) Usually not supplied by the user. Rather |
method |
(Optional and) Usually not supplied by the user. A method like "F2". Typically, this is determined by internal code. |
configs
is used in the model formula notation of
bqtl
, possibly more than once, and possibly with regressors named
in the usual manner. configs
is intended to speed up the
specification and examination of genetic models by allowing many models
to be specified in a shorthand notation in a single model formula. The
names of genetic loci can consist of marker names, names that encode
chromosome number and location, or other shorthand notations. The names
of terms in genetic models will typically include the names of the locus
and may prepend "add." or "dom." or similar abbreviations for the
'additive' and 'dominance' terms associated with the locus.
When used as in bqtl( y ~ configs(34), my.analysis.obj )
, it will
look up the term my.analysis.obj$reg.names[34]
. When
this is passed back to bqtl
, it get pasted into the formula and
is subsequently processed to yield the fit for a one gene model.
When used as in bqtl( y ~ configs(34,75,172), my.analysis.obj)
it
looks up each term and returns a result to bqtl
that results in
fitting a 3 gene model (without interaction terms).
When x
is a vector, array, or list, the processing typically
returns pieces of many model formulas. bqtl(y ~ configs(26:75),
...)
results in a list of 50 different one gene model fits from
bqtl
for the terms corresponding to the 26th through the 75th
variables. bqtl(y ~
configs(cbind(c(15,45,192),c(16,46,193))),...)
returns two four gene
models. And more generally, whenever is.array(x)
is TRUE, the
columns (or slices) specify dim(x)[1]/length(x)
different
models. When x$configs
is an array, this also happens. This turns
out to be useful when the result of running swapbc1
or
swapf2
is treated as an importance sample. In such a case,
bqtl(y ~ configs(my.swap),my.analysis.obj)
will return a list in
which element i
is the ith sample drawn when my.swap <-
swapbc1(...)
was run.
A character vector whose element(s) can be parsed as the right hand side of a model formula.
Charles C. Berry [email protected]
bqtl
and the examples there for a sense of how to use
configs
,
make.analysis.obj
for the setup that encodes the marker
map and the marker information,
swapbc1
and
swapf2
for generating samples to be screened by bqtl
.
Sometimes it is helps speed computations to linearize the likelihood
or at least a part of it w.r.t. the locus allele values. Both
'Haley-Knott regression' and 'composite interval mapping' use this
approach. covar
provides a mechanism for creating formula
objects that specify such linearizations.
covar(x)
covar(x)
x |
The name of a locus (except for F2 designs, when it is the
name of an effect like 'add.m.32') or any argument of the sort that
|
The function covar
actually only returns x
. The real
work is done by a covar
function that is hidden inside of
bqtl
, where the arguments are parsed as for locus
. Each
of the return values from locus
is prefixed by "covar(" and
suffixed by ")". If x
is a name of a locus or effect, then
paste("covar(",deparse(x),")")
is returned. Later, when
bqtl
calls lapadj
, terms like covar(PVV4.1)
are
recognized as requiring a linearization w.r.t. effect 'PVV4.1'.
a character string or vector
Charles C. Berry [email protected]
HALEY, C. S. and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.
Knapp SJ, Bridges WC, and Birkes D. Mapping quantitative trait loci using molecular marker linkage maps. Theoretical and Applied Genetics 79: 583-592, 1990.
ZENG, Z.-B., 1994 Precision mapping of quantitative trait loci. Genetics 136:1457-1468
formula method for class bqtl
## S3 method for class 'bqtl' formula(x, ...)
## S3 method for class 'bqtl' formula(x, ...)
x |
The object returned by |
... |
unused |
a formula object
Charles C. Berry [email protected]
lapadj provides the Laplace approximation to the marginal posterior (over coefficients and dispersion parameter) for a given genetical model for a quantitative trait. A by-product is the parameter value corresponding to the maximum posterior or likelihood.
lapadj(reg.formula, ana.obj, rparm = NULL, tol = 1e-10, return.hess = FALSE, mode.names = NULL, mode.mat = NULL, maxit = 100, nem = 1,setup.only=FALSE,subset=NULL,casewt=NULL, start.parm=NULL, ...)
lapadj(reg.formula, ana.obj, rparm = NULL, tol = 1e-10, return.hess = FALSE, mode.names = NULL, mode.mat = NULL, maxit = 100, nem = 1,setup.only=FALSE,subset=NULL,casewt=NULL, start.parm=NULL, ...)
reg.formula |
A formula, like
|
ana.obj |
See |
rparm |
One of the following: A scalar that will be used as the ridge parameter for all design terms except for the intercept ridge parameter which is set to zero A vector who named elements can be matched by the design term
names returned in
A vector with Positive entries are 'ridge' parameters or variance ratios in a Bayesian prior for the regression coefficients. Larger values imply more shrinkage or a more concentrated prior for the regresion coefficients. |
tol |
Iteration control parameter |
return.hess |
Logical, include the Hessian in the output? |
mode.names |
names to use as |
mode.mat |
Not usually set by the user. A matrix which indicates the values of regressor
variables corresponding to the allele states. If |
maxit |
Maximum Number of iterations to perform |
nem |
Number of EM iterations to use in reinitializing the pseudo-Hessian |
setup.only |
If TRUE, do not run. Return an object that can be use
for a direct call to |
subset |
expression to evaluate using |
casewt |
a vector of non-negative weights |
start.parm |
Vector of starting values for the maximization |
... |
other objects needed in fitting |
The core of this function is a quasi-Newton optimizer due to Minami (1993) that has a computational burden that is only a bit more than the EM algorithm, but features fast convergence. This is used to find the mode of the posterior. Once this is in hand, one can find the Laplace approximation to the marginal likelihood. In addition, some useful quantities are provided that help in estimating the marginal posterior over groups of models.
A list with components to be used in constructing approximations to the marginal posterior or a list that can be used to call the underlying C code directly. In the former case, these are:
adj |
The ratio of the laplace approximation to the posterior for the correct likelihood to the laplace approximation to the posterior for the linearized likelihood |
logpost |
The logarithm of the posterior or likelihood at the mode |
parm |
the location of the mode |
posterior |
The laplace approximation of the marginal posterior for the exact likelihood |
hk.approx |
Laplace approximation to the linearized likelihood |
hk.exact |
Exact marginal posterior for the linearized likelihood |
reg.vec |
A vector of the variables used |
rparm |
Values of ridge parameters used in this problem. |
Charles C. Berry [email protected]
Berry C.C.(1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section. 164–169.
Minami M. (1993) Variance estimation for simultaneous response growth curve models. Thesis (Ph. D.)–University of California, San Diego, Department of Mathematics.
The Bayesian QTL models via a likelihood that is linearized w.r.t. a fixed genetic model. By default, all one and two gene models (without epistasis) are fitted and a MCMC sampler is used to fit 3,4, and 5 gene and (optionally) larger models.
linear.bayes(x, ana.obj, partial=NULL, rparm, specs, scope, subset, casewt, ...)
linear.bayes(x, ana.obj, partial=NULL, rparm, specs, scope, subset, casewt, ...)
x |
a formula giving the QTL and the candidate loci or a
|
ana.obj |
An analysis.object, see |
partial |
a formula giving covariates to be controlled |
rparm |
A ridge parameter. A value of 1 is suggested, but the default is 0. |
specs |
An optional list with components
|
scope |
Not generally used. If supplied this will be passed to
|
subset |
Not generally used. If supplied this will be passed to
|
casewt |
Not generally used. If supplied this will be passed to
|
... |
optional arguments to pass to |
This function is a wrapper for
varcov
, twohk
, swap
, and
summary.swap
, and a better understanding of optional
arguments and the object generated is gained from their
documentation.
hk |
The object returned by |
swaps |
A list of objects returned by calls to
|
smry |
A list of objects returned by calls to
|
odds |
A Vector of odds (relative to a no gene setup) for each
model size evaluated. The odds are computed under a prior that
places equal weights on models of each size considered (and are,
therefore, Bayes Factors). If models of size 1 and 2 are not
evaluated or if some degenerate results were encountered, this will
be |
coefs |
A vector of posterior means of the regression
coefficients. If models of size 1 and 2 are not
evaluated or if some degenerate results were encountered, this will
be |
loc.posterior |
A vector of locus-wise posterior probabilities
that the interval covered by this locus contains a gene.If models of
size 1 and 2 are not evaluated or if some degenerate results were
encountered, this will be |
call |
The call that generated this object |
Charles C. Berry [email protected]
Berry C.C.(1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section. 164–169.
data( little.ana.bc ) little.lin <- linear.bayes( bc.phenotype~locus(all), little.ana.bc, rparm=1 ) par(mfrow=c(2,3)) plot( little.ana.bc, little.lin$loc.posterior, type="h" ) little.lin$odds par(mfrow=c(1,1)) plot(fitted(little.lin), residuals(little.lin))
data( little.ana.bc ) little.lin <- linear.bayes( bc.phenotype~locus(all), little.ana.bc, rparm=1 ) par(mfrow=c(2,3)) plot( little.ana.bc, little.lin$loc.posterior, type="h" ) little.lin$odds par(mfrow=c(1,1)) plot(fitted(little.lin), residuals(little.lin))
A simulation of a BC1 cross of 150 organisms with a genome of around 500
cM consisting of 5 chromosomes. The format is that created by
make.analysis.obj
This dataset is built up from several others. The basic data are:
A vector of phenotype data
A map.frame
of marker data and
A data frame with 50 rows and 2 columns that specify the map locations of a simulated set of markers
These are used to construct
A map.frame
with 'pseudo-markers' at least
every 5 cM made from
little.mf.5 <- make.map.frame(little.map.frame,
nint=marker.fill( little.map.frame, reso=5, TRUE ))
Then phenotype, covariate, and marker data are combined
with little.mf.5
A data.frame
with the
variable bc.phenotype
A data.frame
with marker state information
The examples in make.analysis.obj
A simulation of an F2 cross of 150 organisms with a genome of around 500
cM consisting of 5 chromosomes. The format is that created by make.analysis.obj
data(little.ana.f2)
data(little.ana.f2)
The little.bc.markers
data frame has 150 rows and 50
columns with the simulated marker data from a BC1 cross of 150
organisms with a genome of around 500 cM consisting of 5
chromosomes. Some NA
's have been intentionally introduced.
data(little.bc.markers)
data(little.bc.markers)
This data frame contains the following columns:
a factor with levels
AA
Aa
a factor with levels
AA
Aa
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
row.names
The little.bc.pheno
data frame has 150 rows and 1 columns.
data(little.bc.pheno)
data(little.bc.pheno)
This data frame contains the following columns:
a numeric vector of simulated phenotype data
The little.f2.markers
data frame has 150 rows and 50 columns with
the simulated marker data from an F2 cross of 150
organisms with a genome of around 500 cM consisting of 5 chromosomes.
data(little.f2.markers)
data(little.f2.markers)
This data frame contains the following columns:
a factor with levels
AA
Aa
aa
a factor with levels
AA
Aa
aa
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
a factor with levels
A-
aa
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
ditto
a factor with levels
a-
ditto
ditto
ditto
a factor with levels
AA
Aa
aa
a factor with levels
AA
Aa
aa
row names
The little.f2.pheno
data frame has 150 rows and 1 columns.
data(little.f2.pheno)
data(little.f2.pheno)
This data frame contains the following columns:
a numeric vector of simulated phenotype data
The little.map.dx
data frame has 50 rows and 2 columns that specify
the map locations of a simulated set of markers
data(little.map.dx)
data(little.map.dx)
This data frame contains the following columns:
a factor with levels
m.1
... m.50
a numeric vector of map locations in centimorgans
The little.map.frame
data frame has 50 rows and 9 columns that
describe the marker map of little.map.dx
in the format
produced by make.map.frame
. little.map.dx
has the
minimal data needed to construct this.
data(little.map.frame)
data(little.map.frame)
This data frame contains the following columns:
a factor with levels
m.1
m.2
...
m.50
a vector of locations
weights to be used in sampling and Bayesian computations
a factor with levels
left
right
center
always TRUE
for these data
a vector of plotting positions
transformed recombination fractions
an abbreviated locus name
the chromosome number 1, 2, 3, 4, or 5.
The little.mf.5
data frame has 114 rows and 9 columns
consisting of little.map.frame
plus 64 'virtual' marker loci
data(little.mf.5)
data(little.mf.5)
This data frame contains the following columns:
The marker names taken from little.map.frame
and those created to
fill virtual markers in between actual markers.
a vector of locations
weights to be used in sampling and Bayesian computations
a factor with levels
left
right
center
TRUE
for the 50 markers, FALSE
for the 'virtual' markers
a vector of plotting positions
transformed recombination fractions
an abbreviated locus name
the chromosome number 1, 2, 3, 4, or 5.
Convert numeric indexes to names of regressors for a genetic model.
One or many genetic models can be specified through the use of this
function. It is used on the right hand side of a formula in the
bqtl
function.
locus(x, ..., scope, method, chromo, cM, ana.obj) add(x, ..., scope, method) dom(x, ..., scope, method)
locus(x, ..., scope, method, chromo, cM, ana.obj) add(x, ..., scope, method) dom(x, ..., scope, method)
x |
Typically an integer, an integer vector, or an array whose elements
are integers. These index loci described in a However, |
... |
Optional arguments (usually integers) to be used when
|
chromo |
A chromosome number or 2 ordered numbers. The loci on the
chromosome or in the range of chromosome numbers are used. If
|
cM |
(Optional) map distance or two giving a location near a locus
or range of locations from which loci will be included. If the one
chromosome number is specified in |
scope |
(Optional and)
Usually not supplied by the user. Rather |
method |
(Optional and)
Usually not supplied by the user. Like |
ana.obj |
Usually not specified by the user. This is the
|
locus
is used in the model
formula notation of bqtl
, possibly more than once, and possibly
with regressors named in the usual manner. locus
is intended to
speed up the specification and examination of genetic models by allowing
many models to be specified in a shorthand notation in a single model
formula. The names of genetic loci can consist of marker names, names
that encode chromosome number and location, or other shorthand
notations. The names of terms in genetic models will typically include
the names of the locus and may prepend "add." or "dom." or similar
abbreviations for the 'additive' and 'dominance' terms associated with
the locus.
When used as in bqtl( y ~ locus(34), my.analysis.obj )
, it will
look up the term or terms corresponding to the 34th locus. When
this is passed back to bqtl
, it is pasted into a text string that
will become a formula and is subsequently processed to yield the fit for
a one gene model.
When used as in bqtl( y ~ locus(34,75,172), my.analysis.obj)
it
looks up each term and returns a result to bqtl
that results in
fitting a 3 gene model (without interaction terms).
When x
is a vector or array, the processing typically returns
pieces character strings for many model formulas. bqtl(y ~
locus(26:75), ...)
results in a list of 50 different one gene model
fits from bqtl
for the terms corresponding to the 26th through
the 75th variables. bqtl(y ~
locus(cbind(c(15,45,192),c(16,46,193))),...)
returns two three gene
models. And more generally, whenever is.array(x)
is TRUE, the
columns (or slices) specify dim(x)[1]/length(x)
different
models.
The chromo
argument performs a lookup of loci on the chromosome
via the function map.index
. If cM
is also given,
the locus nearest that location is used. If two values are given for
cM
all loci in the range are used.
add(x)
and dom(x)
are alternatives that specify that only
the additive or dominance terms in an F2 intercross.
A character vector whose element(s) can be parsed as the right hand side of a model formula(s).
Charles C. Berry [email protected]
configs
,
bqtl
, and the examples there for a sense of how to use
locus
,
make.analysis.obj
for the setup that encodes the marker
map and the marker information.
A fitted model or a list of such generated by bqtl
has a maximum
log likelihood or log posterior and a posterior. These functions simply
extract them.
loglik(x, ...) logpost(x, ...) posterior(x, ...)
loglik(x, ...) logpost(x, ...) posterior(x, ...)
x |
The object produced by |
... |
Currently unused |
A vector of numbers whose length equals the number of fitted models in x
Charles C. Berry [email protected]
Create commonly used objects for the analysis of a backcross or intercross experiment or of recombinant inbred lines.
make.analysis.obj(data, map.frame, marker.frame, marker.levels=NULL, method="F2", casewt=NULL,varcov=FALSE,mode.mat=NULL)
make.analysis.obj(data, map.frame, marker.frame, marker.levels=NULL, method="F2", casewt=NULL,varcov=FALSE,mode.mat=NULL)
data |
A |
|||||||||||||||||||||||||||||
map.frame |
A |
|||||||||||||||||||||||||||||
marker.frame |
A |
|||||||||||||||||||||||||||||
marker.levels |
A vector of length six or
NA's are allowed in |
|||||||||||||||||||||||||||||
method |
One of "F2", "BC1", "RI.self", or "RI.sib" |
|||||||||||||||||||||||||||||
casewt |
If there are multiple observations on one genotype (such as in recombinant inbreds) this can be used to assign a weight to each observation. The wisdom of doing this is debatable. |
|||||||||||||||||||||||||||||
varcov |
If FALSE, don't create a varcov.object. Otherwise give an
index into data to select a dependent variable. See |
|||||||||||||||||||||||||||||
mode.mat |
If
For
and for RIL methods ( and the default
Other choices of |
A lot of stuff is bundled together in one object. The function is
really just a wrapper calling other make.*
functions.
A list with components
data |
|
varcov |
A varcov.object. See |
reg.names |
The names of the regressors from |
method |
The |
state.matrix |
See |
loc.right |
See |
map.frame |
See |
casewt |
The |
mode.mat |
The |
version |
A string giving the version of BQTL from qhich the objects was created |
call |
The function call |
This can be quite a LARGE object.It might be better in crosses with lots (say, thousands) of markers, or in which many 'virtual' markers are used, or on computers with limited RAM to store each component separately. Not all components are used in every type of analysis.
Charles C. Berry [email protected]
make.map.frame
for definition of the marker map, The
internally used functions are: make.loc.right
,
make.state.matrix
, make.regressor.matrix
,
make.varcov
, and make.marker.numeric
data( little.bc.pheno ) data( little.mf.5 ) data( little.bc.markers ) names(little.bc.pheno) little.ana.bc <- make.analysis.obj(little.bc.pheno$bc.phenotype, little.mf.5,little.bc.markers, method="BC1") summary( little.ana.bc )
data( little.bc.pheno ) data( little.mf.5 ) data( little.bc.markers ) names(little.bc.pheno) little.ana.bc <- make.analysis.obj(little.bc.pheno$bc.phenotype, little.mf.5,little.bc.markers, method="BC1") summary( little.ana.bc )
Helps speed computations in multigene models by allowing a quick assessment of whether two loci are independent given the marker information for the individual.
make.loc.right(marker.frame, marker.distances)
make.loc.right(marker.frame, marker.distances)
marker.frame |
A |
marker.distances |
Actually a misnomer, this is a vector with a zero in the last position of each chromosome. |
A matrix of the same dimension as marker.frame
whose elements
index the column on the next (right) fully informative marker.
Charles C. Berry [email protected]
Uses the map distances as a means of assigning a prior for chromosomal location. Basically, this function attempts to assign equal weight according to the spacing or markers and 'virtual' markers.
make.location.prior( x, add.2.end=0, normalize=TRUE )
make.location.prior( x, add.2.end=0, normalize=TRUE )
x |
|
add.2.end |
How many Morgans to extend the first and last interval on each chromosome |
normalize |
If TRUE, let the result sum to 1.0 |
A vector of length(x)
whose sum is one, if normalize==TRUE
Charles C. Berry, [email protected]
A map.frame.object describes a marker map and additional loci that may be used in a QTL study. Each row pertains to one locus. Names of markers, abbreviated names, distances, and other necessary and useful information are bundled.
make.map.frame(dx,chr.num = NULL, prior = make.location.prior(lambda), morgan = 100, nint = NULL, reso = NULL)
make.map.frame(dx,chr.num = NULL, prior = make.location.prior(lambda), morgan = 100, nint = NULL, reso = NULL)
dx |
An object of class |
chr.num |
(Optional) Vector of chromosome numbers |
prior |
(Optional) Vector of Prior probabilities for the loci |
morgan |
100 if centiMorgans, 1 if Morgans |
nint |
(Optional) Vector of one plus number of 'virtual' markers to be inserted after each locus |
reso |
Maximum distance between loci. If necessary fill in with 'pseudo-markers' |
The QTL analysis depends on information about the marker map and on specifications of the loci to be studied. The 'map.frame' contains this information.
A data frame with components:
marker.name |
The full text identifier for this marker, e.g. "HH.360L.Col" is a marker on chromosome 1 of arabidopsis thaliana, and names like this can be used for reference purposes. 'Virtual' markers have a suffix appended to the name of the previous marker. |
cM |
Location on the chromosome. If this is a marker of a locus
that was input via |
pos.type |
"left" if it is the first locus on this chromosome,"right" if it is last, or "center" otherwise. |
is.marker |
TRUE if this was actually a marker, FALSE if it is a 'virtual' marker |
pos.plot |
Plotting position for this locus. Typically the same as
|
lambda |
Twice the recombination fraction minus one. |
locus |
An abbreviation for the locus of the form
"C.< |
chr.num |
The chromosome number. |
The idea in having all of this bundled together is to make it easier for plot and summary methods to be implemented and to allow convenient references in formula based methods.
Charles C. Berry [email protected]
data( little.map.dx ) little.map.frame <- make.map.frame( little.map.dx ) plot( little.map.frame ) # there is a plot method # add 'virtual' markers to map little.mf.5 <- make.map.frame(little.map.frame,reso=5) print(little.mf.5[1:10,],digits=1) # show a few rows plot( little.mf.5 ) # notice the 'virtual' markers added
data( little.map.dx ) little.map.frame <- make.map.frame( little.map.dx ) plot( little.map.frame ) # there is a plot method # add 'virtual' markers to map little.mf.5 <- make.map.frame(little.map.frame,reso=5) print(little.mf.5[1:10,],digits=1) # show a few rows plot( little.mf.5 ) # notice the 'virtual' markers added
Not to be called directly by users. This utility function simply returns the coded numeric values corresponding to the allele states.
make.marker.numeric(marker.frame, level.names=NULL)
make.marker.numeric(marker.frame, level.names=NULL)
marker.frame |
A data.frame.object consisting of factors or character vectors that encode the allele states. |
level.names |
A vector of length 6 to translate the levels
attribute or character codes into allele states that
|
A matrix, for which column i is
match(as.character(marker.frame[,i]),level.names)
Charles C. Berry [email protected]
Create regression variables for markers and loci between or near markers by imputation conditional on known marker states.
make.regressor.matrix(state.matrix, mode.mat=NULL)
make.regressor.matrix(state.matrix, mode.mat=NULL)
state.matrix |
A state.matrix.object - see |
mode.mat |
A matrix which indicates the values of regressor variables corresponding to the allele states. If mode.mat=NULL (the default) a mode.mat is inferred from the dimensions of state.matrix. For the F2 intercross these are typically additive and dominance codes like (-1,0,1) and (1,-1,1). For BC1 backcross and RI lines, typically the values are (-1,1). |
A matrix with variables suitable for use as regressors.
Haley C.S. and Knott S.A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69,315-324.
Create a state.matrix.object to be used encode marker information in a form in which it can be used in subsequent calculations.
make.state.matrix(marker.frame, marker.distances, method="F2")
make.state.matrix(marker.frame, marker.distances, method="F2")
marker.frame |
Actually, this is a misnomer. This is NOT
a |
|||||||||||||||||||||||||||||
marker.distances |
Distances between the markers in the 'lambda' metric. -log(lambda)/2 is the Haldance map distance. Linkage groups are separated by values of 0.0. |
|||||||||||||||||||||||||||||
method |
method = "F2" is the default, and "BC1", "RI.self", and "RI.sib" are other options. The assumed setup is as follows (strains are A and a):
|
n by k by q array. q is 3 for method="F2" and 2 for others methods. Each element encodes the probability of the allele state conditional on the marker states.
It might have been better to design this array so that the third subscript moves fastest. In large problems, the current structure may involve excessive memory access.
Lander E.S. and Green P. (1987) Construction of multilocus genetic linkage maps in humans. Proceedings of the National Academy of Sciences of the United States of America, 84(8), 2363–7.
Jiang C. and Zeng Z-B. (1997) Mapping quantitative trait loci with dominant and missing markers in various crosses from tow inbred lines. Genetica 101, 47-58.
Create a moment matrix of the marker variables and of the regressors by the phenotype variable. For use in regression modelling on the markers.
make.varcov(regressor.matrix, y, subset=is.finite(y), casewt=NULL)
make.varcov(regressor.matrix, y, subset=is.finite(y), casewt=NULL)
regressor.matrix |
The object produced by |
y |
A vector of phenotype information with the same number of
elements as there are rows in |
subset |
Logical vector with the same number of
elements as there are rows in |
casewt |
Optional vector of case weights. |
A list with components
var.x |
Moment matrix of the marker regressor variables |
cov.xy |
Moment matrix of the marker regressor variables versus the phenotype variable |
var.y |
The Second central moment of the phenotype variable |
df |
|
It is generally NOT a good idea to do regressions on
ill-conditioned designs using the moment matrices like this. The
excuse for doing so here is twofold. First, calculations using this
method are used to perform importance sampling, so minor numerical
inaccuracies in computing the probabilites used in sampling get
straightened out by the importance weights. Second, it will typically
be the case that a prior is set on the regression coefficients and
this results in a positive constant (aka a 'ridge' parameter) being
added to diagonal of varcov$var.x
and this reduces the
ill-conditioning. Of course the rational for using the method is to
speed the sampling, and it is very effective at doing so.
Charles C. Berry [email protected]
One way to index a locus (loci) in a genetic map is by the numerical index of
its row (their rows). map.index
performs a lookup in a specific
map.frame
given one (or two) chromosome number(s) and one (or
two) map distance(s).
map.index(x, ... )
map.index(x, ... )
x |
A |
... |
For methods that look up a location in a |
It is often convenient to refer to genetic loci or regions by the
numerical index(es) in a map.frame
. map.index
allows
lookups according to the approximate map location.
A numerical vector of one or more row numbers. If only chromo
is
specified, all row numbers on the specified chromosome are returned. If
chromo
has two elements, then all row numbers on those
chromosomes with numbers in range(chromo)
will be returned. If
one of each of chromo
and cM
are specified, then the row
number of the closest locus will be returned. For two of each, row
numbers in the range of the closest matches will be returned.
Charles C. Berry [email protected]
make.map.frame
for a description of how map
information is organized.
data(little.ana.bc) map.index(little.ana.bc,chromo=1,cM=25) # locus nearest 1,25 index.chr.1 <- map.index(little.ana.bc,chromo=1) fit.on.1 <- bqtl(bc.phenotype~locus(index.chr.1),little.ana.bc) summary( loglik( fit.on.1 ) )
data(little.ana.bc) map.index(little.ana.bc,chromo=1,cM=25) # locus nearest 1,25 index.chr.1 <- map.index(little.ana.bc,chromo=1) fit.on.1 <- bqtl(bc.phenotype~locus(index.chr.1),little.ana.bc) summary( loglik( fit.on.1 ) )
Report the chromosome number and location of loci in a genetic map.
map.location(x,... ) map.loc(x, ... )
map.location(x,... ) map.loc(x, ... )
x |
A object of class |
... |
Other arguments usage depend on the class of
|
It is often helpful to refer to genentic loci by their
locations. The methods of map.location
(alias map.loc
)
will extract the row index, chromosome number and location, and the name
for specified loci. For direct lookups of the loci in a map.frame
or analysis.object
, one must specify y
or chromo
or
map.names
. When class(x)=="bqtl"
map.location
s of
terms used in a call to bqtl
are returned. When cM
is
used, an attempt will be made to match the location; if the match fails,
the nearest locus will be used. When there are two elements in
chromo
and two in cM
, all the map locations in between the
matching loci will be returned.
An object of class map.location
which inherits from
map.frame
. It has columns:
chr.num |
The chromosome number |
cM |
The location in centiMorgans on that chromosome. |
marker.name |
The name by which that marker is known |
attr( , "row.names")
|
An index of the locations |
Charles C. Berry [email protected]
data(little.ana.bc) map.loc(little.ana.bc, c(1,15,45)) map.loc(little.ana.bc,chromo=3,cM=22) map.loc(little.ana.bc,"m.12") rm(little.ana.bc)
data(little.ana.bc) map.loc(little.ana.bc, c(1,15,45)) map.loc(little.ana.bc,chromo=3,cM=22) map.loc(little.ana.bc,"m.12") rm(little.ana.bc)
This is a generic helper function with methods that will return the names of markers or loci.
map.names(x,...)
map.names(x,...)
x |
An object that has marker names in it. Methods for objects
of the
|
... |
For |
When applied to an object of class bqtl
map.names(x, \dots, ana.obj )
can be used to specify where to find the data.
A character vector
Charles C. Berry [email protected]
data(little.ana.bc) map.names(little.ana.bc,chromo=1,cM=24) map.names(little.ana.bc,chromo=c(1,1),cM=c(40,55)) fit <- bqtl( bc.phenotype ~ locus(23,42) , little.ana.bc ) map.names( fit )
data(little.ana.bc) map.names(little.ana.bc,chromo=1,cM=24) map.names(little.ana.bc,chromo=c(1,1),cM=c(40,55)) fit <- bqtl( bc.phenotype ~ locus(23,42) , little.ana.bc ) map.names( fit )
Given a set of markers, one wants to create a finer map at a given resolution. marker fill takes a a collection of marker distances and a desired resolution and finds positions that are intermediate and at that resolution.
marker.fill(map.frame, reso, return.nint = FALSE)
marker.fill(map.frame, reso, return.nint = FALSE)
map.frame |
A map.frame.object. |
reso |
The desired interval between loci in the same metric as |
return.nint |
Whether to output a vector of number of intervals to produce in each existing interlocus interval |
If return.nint
is TRUE
, a vector of integers is
returned. It indicates how many intervals to place between this marker
and the next to achive the desired minimum distance.
If return.nint
is FALSE
, a vector of distances is
returned. The names attribute has suffixes
added to indicate positions filled to the 'right' of existing
markers. Thus if markers 'mark.01' and 'mark.02' are in succession at
a distance of 3 and reso==1, then the value associated with 'mark.01'
(which was 3) becomes 1, a value of 1 is associated with new loci
called 'mark.01.1' and 'mark.01.2' in created with values of 1
each. The returned vector is ordered by chromosome, then marker or
filled locus.
data( little.map.frame ) little.nint <- marker.fill( little.map.frame, reso=5, TRUE ) cbind(nint=little.nint,cM=little.map.frame$cM)[1:10,] rm( little.map.frame, little.nint )
data( little.map.frame ) little.nint <- marker.fill( little.map.frame, reso=5, TRUE ) cbind(nint=little.nint,cM=little.map.frame$cM)[1:10,] rm( little.map.frame, little.nint )
The coding scheme used to define marker.levels is set up by these functions. BQTL has defaults that these functions can help the user to redefine.
bc1.levels( AA="AA", Aa="Aa", miss.val="--") ri.levels( AA="AA", aa="aa", miss.val="--") f2.levels( AA="AA", Aa="Aa", aa="aa", not.aa="A-", not.AA="a-", miss.val="--")
bc1.levels( AA="AA", Aa="Aa", miss.val="--") ri.levels( AA="AA", aa="aa", miss.val="--") f2.levels( AA="AA", Aa="Aa", aa="aa", not.aa="A-", not.AA="a-", miss.val="--")
AA |
Always used: the code for the homozygous state from one parent line |
Aa |
F2 and BC1 setups: the code for the heterozygous state |
aa |
F2 and RI setups: the code for the homozygous state for the other parent line |
not.aa |
F2 only: the code for a dominant marker that rules out |
not.AA |
F2 only: the code for a dominant marker that rules out |
miss.val |
The character string for a missing (unknown) allele
state. |
It is essential that the codes intended by the user be clearly understood by BQTL. It is hoped that thees functions provide a bridge between the internals of BQTL and the user's view of the marker codes. Numeric values can be used, but they will be coerced to character values.
A vector with 6 elements corresponding to the values of
AA
, Aa
, aa
, not.aa
, not.AA
, and
miss.val
. For RI and BC1 setups, those that do not apply will
be unnamed and set to "nil"
Charles C. Berry [email protected]
### show the defaults: f2.levels() bc1.levels() ri.levels() ### suppose that 1,2,3 are codes used in F2: f2.levels(1,2,3) ### show what would happen changing "Aa" to "H" f2.levels(Aa="H") bc1.levels(Aa="H")
### show the defaults: f2.levels() bc1.levels() ri.levels() ### suppose that 1,2,3 are codes used in F2: f2.levels(1,2,3) ### show what would happen changing "Aa" to "H" f2.levels(Aa="H") bc1.levels(Aa="H")
Multiple x-y plots are formed using chromosome numbers
(chr.num
) and positions (pos.plot
) specified in a object
of the sort created by make.map.frame
## S3 method for class 'map.frame' plot(x, y, fun = if (y.type == "matrix") matlines else lines, type = "l", include.rug = TRUE, rug.lwd = 0.1, title.string = NULL, y.range = NULL, ylab = deparse(substitute(y)), xlab = "Location", ...)
## S3 method for class 'map.frame' plot(x, y, fun = if (y.type == "matrix") matlines else lines, type = "l", include.rug = TRUE, rug.lwd = 0.1, title.string = NULL, y.range = NULL, ylab = deparse(substitute(y)), xlab = "Location", ...)
x |
A |
y |
(optional) A vector with as many elements or a
matrix with as many rows as |
... |
more args |
fun |
A plotting function to be used after the plot axes and
labels have been drawn. The current default |
type |
|
include.rug |
if |
rug.lwd |
size of ticks |
title.string |
(optional) label to prepend to each title |
y.range |
range for y limits |
ylab |
plot label for y-axis, see |
xlab |
plot label for x-axis, see |
This function enables drawing graphs that depend on chromosome and
chromosome location. Typically, one will use a command like
par(mfrow=c(nrows,ncols))
first to set up a page on which
multiple plots will be drawn. However, one can draw one plot per page
on postscript devices by leaving par(mfrow=c(1,1))
NULL
- this function is called only for its side effects
Charles C. Berry [email protected]
plot
,
lines
, and
matlines
for general information on plotting functions;
par
for optional arguments to add as arguments; and
make.map.frame
for the details on the object the drives this function.
data( little.ana.bc ) null.llk <- loglik(bqtl(bc.phenotype~1,little.ana.bc)) llk <- loglik( bqtl( bc.phenotype~locus(all), little.ana.bc) ) - null.llk .old.par <- par(mfrow=c(2,3)) plot.map.frame(little.ana.bc$map.frame,llk) par(.old.par)
data( little.ana.bc ) null.llk <- loglik(bqtl(bc.phenotype~1,little.ana.bc)) llk <- loglik( bqtl( bc.phenotype~locus(all), little.ana.bc) ) - null.llk .old.par <- par(mfrow=c(2,3)) plot.map.frame(little.ana.bc$map.frame,llk) par(.old.par)
The estimated coefficients and expected locus values are used to find fitted values for the QTL model
## S3 method for class 'bqtl' predict(object, newdata, ...) ## S3 method for class 'bqtl' fitted(object, newdata, ...)
## S3 method for class 'bqtl' predict(object, newdata, ...) ## S3 method for class 'bqtl' fitted(object, newdata, ...)
object |
An object of class |
newdata |
An optional data.frame for which fitted values are to be found. If not specified, the a search for the original data frame for the fit will be made. |
... |
unused |
The estimated coefficients for a specific QTL model fit are used along
with the expected locus values (conditionally on the marker values)
are used to find fitted values for the QTL model. This is not the only
way in which such fits could be obtained; one could condition the
expect marker values on both the trait value and the marker
values. One could also define fitted values for specific genotype
combinations, e.g. for a backcross with k animals and a two gene model
4 fitted values could be determined for each animal leading to 2*2*k
values. In fact, using newdata
one can do this.
A vector with as many elements as rows in newdata (after removing missing data) or in the original model.frame.
Charles C. Berry [email protected]
data(little.ana.bc) fit.pheno <- bqtl(bc.phenotype~locus(15)+locus(42),little.ana.bc) summary(predict(fit.pheno)) genotype.grid <- expand.grid( c(-1,1), c(-1,1) ) # set up a grid names(genotype.grid) <- map.names( fit.pheno ) # use matching names fit.vals <- predict( fit.pheno, genotype.grid ) # make predictions cbind( genotype.grid, fit.vals ) # print them!
data(little.ana.bc) fit.pheno <- bqtl(bc.phenotype~locus(15)+locus(42),little.ana.bc) summary(predict(fit.pheno)) genotype.grid <- expand.grid( c(-1,1), c(-1,1) ) # set up a grid names(genotype.grid) <- map.names( fit.pheno ) # use matching names fit.vals <- predict( fit.pheno, genotype.grid ) # make predictions cbind( genotype.grid, fit.vals ) # print them!
The linear.bayes
object returns fitted coefficients. These are
used to construct predicted values. Since the fitting process for
linear.bayes
objects is based on moments of centered variables,
the 'intercept' is lost; see ‘Details’ below.
## S3 method for class 'linear.bayes' residuals(object, ...) ## S3 method for class 'linear.bayes' predict(object, newdata = lb.call$ana.obj, return.resids = FALSE, ...) ## S3 method for class 'linear.bayes' fitted(...)
## S3 method for class 'linear.bayes' residuals(object, ...) ## S3 method for class 'linear.bayes' predict(object, newdata = lb.call$ana.obj, return.resids = FALSE, ...) ## S3 method for class 'linear.bayes' fitted(...)
object |
An object returned by |
... |
possibly the following |
newdata |
Optional |
return.resids |
Not usually set by the user. |
Since the linear.bayes object
is based on a moment
matrix, some information is lost thsat must be reconstructed or
assumed. The intercept and possibly the coefficients for control
variates are aong these. Also, when the call to linear.bayes
supplied the moment matrix rather than formulae with which to create
one, then it is unclear what variable was used as the regressand and
hence which variable to use in ofrming residuals. So, in that case,
residuals
will report an error
A vector of predicted values or residuals
Charles C. Berry [email protected]
The phenotype data, estimated coefficients, and expected locus values are used to find fitted values for the QTL model
## S3 method for class 'bqtl' residuals(object,...)
## S3 method for class 'bqtl' residuals(object,...)
object |
An object of class |
... |
ignored |
The estimated coefficients for a specific QTL model fit are used along with the expected locus values (conditionally on the marker values) are used to find fitted values for the QTL model; these are subtracted from the origianl trait values to get residuals. This is not the only way in which such fits could be obtained; one could condition the expected marker values on both the trait value and the marker values. One could also define fitted values for specific genotype combinations, e.g. for a backcross with k animals and a two gene model 4 fitted values could be determined for each animal leading to 2*2*k residuals.
A vector with as many elements trait values used in the original fitted model.
Charles C. Berry [email protected]
data(little.ana.bc) fit.pheno <- bqtl(bc.phenotype~locus(15)+locus(42),little.ana.bc) summary(residuals(fit.pheno)) plot( fitted( fit.pheno ), residuals( fit.pheno) )
data(little.ana.bc) fit.pheno <- bqtl(bc.phenotype~locus(15)+locus(42),little.ana.bc) summary(residuals(fit.pheno)) plot( fitted( fit.pheno ), residuals( fit.pheno) )
The linear approximations of swap
are much improved
by the use a Laplace approximations for loci that are not
markers. This function combines the results of a call like
bqtl(y~configs(swap.obj),...)
with the data in
swap.obj
to provide improved posteriors, et cetera
## S3 method for class 'adj' summary(object, n.loc, coef.znames, mode.names=c("add", "dom"), imp.denom=NULL, swap.obj=NULL,...)
## S3 method for class 'adj' summary(object, n.loc, coef.znames, mode.names=c("add", "dom"), imp.denom=NULL, swap.obj=NULL,...)
object |
Typically, this is the result of a call like
|
n.loc |
The number of genes in this model |
coef.znames |
|
mode.names |
|
imp.denom |
Optional, and only used when some sampling scheme
other than the default MCMC generates |
swap.obj |
The result of a call to |
... |
unused |
There are a lot of details. This sections nneds to be revised to reflect them.
A list with components
adj |
This multiplier adjusts the posterior odds for k vs k-1 gene models |
var |
An estimate of the variance of |
coef |
Posterior means of coefficients |
loc |
Marginal Posterior for location for k gene model |
hk.ratio.mean |
argh! I need to look this up |
Charles C. Berry [email protected]
Berry C.C. (1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section, 164-169.
Extract coefficients (and related stats), loglikelihood, and residual standard error of the trait.
## S3 method for class 'bqtl' summary(object,...)
## S3 method for class 'bqtl' summary(object,...)
object |
The result of |
... |
Currently not used |
A list containing
coefficients |
Either a vector of regression coefficents, or if
object was created via |
loglik |
the loglikelihood or log posterior |
std.res |
The residual standard deviation of the trait |
N |
The counts of all observations, the number omitted, and the number used in the fit |
Charles C. Berry [email protected]
data(little.ana.bc) fit <- bqtl( bc.phenotype~locus(4)*locus(45), little.ana.bc, return.hess=TRUE ) summary(fit)
data(little.ana.bc) fit <- bqtl( bc.phenotype~locus(4)*locus(45), little.ana.bc, return.hess=TRUE ) summary(fit)
Provide a simple report on the data structure
## S3 method for class 'map.frame' summary(object,...)
## S3 method for class 'map.frame' summary(object,...)
object |
A |
... |
ignored |
a list
Charles C. Berry [email protected]
Calculate marginal posteriors for location, posterior means for coefficients, and the Bayes Factor for k vs k-1 genes
## S3 method for class 'swap' summary(object, method=NULL, ncoef=length(object$alt.coef), nloc=object$nloc,...)
## S3 method for class 'swap' summary(object, method=NULL, ncoef=length(object$alt.coef), nloc=object$nloc,...)
object |
The result of |
method |
Optional. One of the supported methods, see |
ncoef |
Optional. The number of coefficients in the class of
models. Typically, |
nloc |
Optional. The number of loci in the sample space. |
... |
ignored |
A list with components:
loc.posterior |
A vector of (marginal) posterior odds for each locus compared to a no gene model |
coefs |
Posterior means of coefficients. |
ratio |
A list with components |
Charles C. Berry [email protected]
Given a k-gene model as a starting point, one gene is deleted and another is sampled in its place. This is done using an approximation to the posterior. Then another gene is deleted and another sampled,...
swap(varcov, invars, rparm, nreps, ana.obj, ...)
swap(varcov, invars, rparm, nreps, ana.obj, ...)
varcov |
The result of |
invars |
Vector of
numerical indexes of |
rparm |
Scalar or vector with |
nreps |
How many cycles (of |
ana.obj |
An |
... |
Additional arguments override the default choices of
candidate loci ( |
An MCMC sampler for loci using the object of make.varcov
is
executed. This sampler uses the exact posterior probability under the
assumed correctness of the regression model using expected genotypes
given marker values. This amounts to linearizing the likelihood with
respect to the (possibly unknown) locus states. For models where the
loci are fully informative markers this is the true posterior.
The chain is implemented as follows: given a set of regressor
variables to start, one variable is removed, all regressor
variables not in the model are examined to determine the effect of each
on the posterior. One variable is sampled. The process is repeated until
each variable has been removed and a new one sampled in its place
(possibly the same variable that was removed is sampled). And this whole
cycle is repeated nreps
times.
A list with components:
config |
A k by k by nreps array (or, for
|
posteriors |
A vector of length |
coefs |
A k by k matrix of the regression coefficients(or, for
|
call |
The call to |
cond |
The |
marg |
The |
alt.marginal |
A vector with |
alt.coef |
A vector with |
Charles C. Berry [email protected]
Berry C.C. (1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section, 164-169.
data( little.ana.bc ) little.vc <- varcov( bc.phenotype~locus(all), little.ana.bc) little.4 <- swap( little.vc, c(1,15,55,75), rparm=1, 50, little.ana.bc ) little.4.smry <- summary( little.4 ) print(c("Bayes Factor (3 vs 4)"=little.4.smry$ratio$mean)) par(mfrow=c(3,2)) plot( little.ana.bc, little.4.smry$loc.posterior, type="h", ylab="E(genes)" ) rm(little.4,little.vc,little.ana.bc)
data( little.ana.bc ) little.vc <- varcov( bc.phenotype~locus(all), little.ana.bc) little.4 <- swap( little.vc, c(1,15,55,75), rparm=1, 50, little.ana.bc ) little.4.smry <- summary( little.4 ) print(c("Bayes Factor (3 vs 4)"=little.4.smry$ratio$mean)) par(mfrow=c(3,2)) plot( little.ana.bc, little.4.smry$loc.posterior, type="h", ylab="E(genes)" ) rm(little.4,little.vc,little.ana.bc)
An MCMC sampler for loci using precomputed dispersion matrices, various priors, and a pre-selected set of variables. For use with BC1 (backcross) designs and recombinant inbred lines.
swapbc1(varcov, invars, rparm, nreps, ana.obj, locs=NULL, locs.prior=NULL, tol=1e-10 )
swapbc1(varcov, invars, rparm, nreps, ana.obj, locs=NULL, locs.prior=NULL, tol=1e-10 )
varcov |
The result of |
rparm |
Scalar or vector with |
nreps |
How many cycles of MCMC to perform |
ana.obj |
A object produced by |
invars |
Which variables to start in the model. The first of these is
immediately removed, so it is merely a placeholder. The number of
genes in the model is therefore |
locs |
The columns of |
locs.prior |
The prior mass to associate with each variable. Typically, these sum to one, but sometimes they might each be set to one (as in computing lod scores). |
tol |
Used in forming QR decomposition. Let it be. |
An MCMC sampler for loci using the object of make.varcov
is
executed. This sampler uses the exact posterior probability under the
assumed correctness of the regression model using expected genotypes
given marker values. This amounts to linearizing the likelihood with
respect to the (possibly unknown) locus states. For models where the
loci are fully informative markers this is the true posterior.
The chain is implemented as follows: given a set of regressor
variables to start, one variable is removed, all regressor
variables not in the model are examined to determine the effect of each
on the posterior. One variable is sampled. The process is repeated until
each variable has been removed and a new one sampled in its place
(possibly the same variable that was removed is sampled). And this whole
cycle is repeated nreps
times.
A list with components:
config |
A k by k by nreps array of the locations sampled in each iteration. |
posteriors |
A vector of length |
coefs |
A k by k matrix of the regression coefficients. |
call |
The call to |
cond |
The |
marg |
The |
alt.marginal |
A vector with |
alt.coef |
A vector with |
Charles C. Berry [email protected]
Berry C.C. (1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section, 164-169.
An MCMC sampler for loci using precomputed dispersion matrices, various priors, and a pre-selected set of variables. For use with F2 intercross design.
Using precomputed dispersion matrices, various priors, and a pre-selected set of variables, one locus is removed, all other loci are examined to determine the effect of each on the posterior. One locus is sampled. The process is repeated until each locus has been removed and a new one sampled in its place (possibly the same one that was removed is sampled).
swapf2(varcov, invars, rparm, nreps, ana.obj, locs, locs.prior, combo.prior, tol = 1e-10)
swapf2(varcov, invars, rparm, nreps, ana.obj, locs, locs.prior, combo.prior, tol = 1e-10)
varcov |
The result of |
rparm |
The 'ridge' parameters for the independent variables - larger values imply more shrinkage or a more concentrated prior for the regresion coefficients. |
nreps |
How many cycles of MCMC to perform |
ana.obj |
A object produced by |
invars |
A vector of variable indexes. This determines which
variables to start in the model. If both additive and
dominance terms are to be used, they should occupy adjacent
locations in |
locs |
The pairs of columns of |
locs.prior |
Vector whose elements are the prior masses to associate with each locus. Typically, these sum to one, but sometimes they might each be set to one (as in computing lod scores). The default value sets them all to 1.0. |
combo.prior |
The prior probability for each term or combination of terms for the phenotypic effect at a locus. Typically, there will be three of these - one for the 'additive' term (linear in number of alleles from one parent strain), the 'dominance' term (quadratic in allele number), or both terms. The default sets them all to 1/3. |
tol |
Used in forming QR decomposition. Let it be. |
A call to swapf2
is used to obtain the results. This function
is really just a wrapper.
A list with components:
configs |
A 2k by k by nreps array of indexes of variables sampled in
each of the nreps iterations. Models using less than 2k variables
|
posteriors |
A vector of length |
coefs |
A 2k by k by nreps matrix of the regression
coefficients. Models using less than 2k variables
|
call |
The call to |
cond |
The |
marg |
The |
alt.marginal |
A vector with |
alt.coef |
A vector with |
Charles C. Berry [email protected]
Berry C.C. (1998) Computationally Efficient Bayesian QTL Mapping in Experimental Crosses. ASA Proceedings of the Biometrics Section, 164-169.
Fits all one and two gene models (without interactions aka 'epistasis') in an intercross, backcross, or recombinant inbred line. Uses a linear approximation to the likelihood, i.e. the expected allele states are used.
twohk(varcov, ana.obj, ...)
twohk(varcov, ana.obj, ...)
varcov |
An object produced by |
ana.obj |
An |
... |
Additional arguments override the default choices of
candidate loci ( |
The marginal posterior (integrating over regression parameters and dispersion) is calculated for each one and two gene model under the assumed correctness of the regression model using expected genotypes given marker values. This amounts to linearizing the likelihood with respect to the (possibly unknown) locus states. For models where the loci are fully informative markers this is the true posterior.
A list with components:
loc.1 |
The marginal posterior for each one gene model relative to
a no gene model. For
|
loc.2 |
The marginal posterior for each locus — obtained by summing
over all two gene models that include that locus— relative to
a no gene model. For
|
coefs.1 |
The regression coefficients for the genetic effect for
each locus. For |
coefs.2 |
The marginal posterior mean of regression coefficients
for the genetic effect for each locus - obtained by averaging over
all two gene models that include that locus according to the
posterior masses. For |
Charles C. Berry [email protected]
Haley C.S. and Knott S.A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69,315-324.
data(little.ana.bc) little.vc<-make.varcov(little.ana.bc$data[,little.ana.bc$reg.names], little.ana.bc$data$bc.phenotype) little.2<- twohk(little.vc,little.ana.bc,rparm=1) print( c(odds.1=sum(little.2$loc.1),odds.2=sum(little.2$loc.2)) ) par(mfrow=c(3,2)) little.pe <- 2 * little.2$loc.2 / sum(little.2$loc.2) #locus-wise posterior expectation plot(little.ana.bc,little.pe,type="h",ylab="E(genes") rm(little.2,little.vc,little.pe,little.ana.bc)
data(little.ana.bc) little.vc<-make.varcov(little.ana.bc$data[,little.ana.bc$reg.names], little.ana.bc$data$bc.phenotype) little.2<- twohk(little.vc,little.ana.bc,rparm=1) print( c(odds.1=sum(little.2$loc.1),odds.2=sum(little.2$loc.2)) ) par(mfrow=c(3,2)) little.pe <- 2 * little.2$loc.2 / sum(little.2$loc.2) #locus-wise posterior expectation plot(little.ana.bc,little.pe,type="h",ylab="E(genes") rm(little.2,little.vc,little.pe,little.ana.bc)
Fits all one and two gene models (without interactions aka 'epistasis') in an intercross, backcross, or recombinant inbred line. Uses a linear approximation to the likelihood, i.e. the expected allele states are used.
twohkbc1(varcov, ana.obj, rparm = 0, locs = NULL, locs.prior = NULL) twohkf2(varcov, ana.obj, rparm, locs, locs.prior, combo.prior)
twohkbc1(varcov, ana.obj, rparm = 0, locs = NULL, locs.prior = NULL) twohkf2(varcov, ana.obj, rparm, locs, locs.prior, combo.prior)
varcov |
An object produced by |
ana.obj |
An object produced by |
rparm |
The 'ridge' parameters for the independent variables - larger values imply more shrinkage or a more concentrated prior for the regresion coefficients. |
locs |
The columns (or pairs of columns for |
locs.prior |
The prior mass to associate with each locus. Typically, these sum to one, but sometimes they might each be set to one (as in computing lod scores). |
combo.prior |
Only valid for |
The marginal posterior (integrating over regression parameters and dispersion) is calculated for each one and two gene model under the assumed correctness of the regression model using expected genotypes given marker values. This amounts to linearizing the likelihood with respect to the (possibly unknown) locus states. For models where the loci are fully informative markers this is the true posterior.
A list with components:
loc.1 |
The marginal posterior for each one gene model. For
|
loc.2 |
The marginal posterior for each locus - obtained by summing
over all two gene models that include that locus. For
|
coefs.1 |
The regression coefficients for the genetic effect for
each locus. For |
coefs.2 |
The marginal posterior mean of regression coefficients
for the genetic effect for each locus - obtained by averaging over
all two gene models that include that locus according to the
posterior masses. For |
Charles C. Berry [email protected]
Haley C.S. and Knott S.A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69,315-324.
Create a moment matrix of the marker variables and of the regressors by the phenotype variable. For use in regression modelling on the markers.
varcov(x, ana.obj, partial=NULL, scope,...)
varcov(x, ana.obj, partial=NULL, scope,...)
x |
A formula to specify the dependent and independent variables
to be used in subsequent calculations e.g |
ana.obj |
An |
partial |
A formula whose right hand side specifies variables to be treated as covariates. |
scope |
Usually not explicitly used. Optional vector of variable names. |
... |
ignored |
This is just a wrapper for make.varcov
.
A list with components
var.x |
Moment matrix of the marker regressor variables |
cov.xy |
Moment matrix of the marker regressor variables versus the phenotype variable |
var.y |
The Second central moment of the phenotype variable |
df |
The degrees of freedom, when no variables are specified in
|
It is generally NOT a good idea to do regressions on
ill-conditioned designs using the moment matrices. The
excuse for doing so here is twofold. First, calculations using this
method are used to perform importance sampling, so minor numerical
inaccuracies in computing the probabilites used in sampling get
straightened out by the importance weights. Second, it will typically
be the case that a prior is set on the regression coefficients and
this results in a positive constant (aka a 'ridge' parameter) being
added to diagonal of varcov()$var.x
and this reduces the
ill-conditioning. Of course the rational for using the method is to
speed the sampling, and it is very effective at doing so.
Charles C. Berry [email protected]
The examples in swap
and twohk
.