Title: | Genotype Probabilities in Intermediate Generations of Inbreeding Through Selfing |
---|---|
Description: | A probability tree allows to compute probabilities of complex events, such as genotype probabilities in intermediate generations of inbreeding through recurrent self-fertilization (selfing). This package implements functionality to compute probability trees for two- and three-marker genotypes in the F2 to F7 selfing generations. The conditional probabilities are derived automatically and in symbolic form. The package also provides functionality to extract and evaluate the relevant probabilities. |
Authors: | Frank Technow [aut, cre] (Pioneer Hi-Bred International, Inc., Johnston, Iowa) |
Maintainer: | Frank Technow <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 0.2 |
Built: | 2024-12-22 06:22:23 UTC |
Source: | CRAN |
A probability tree allows to compute probabilities of complex events, such as genotype probabilities in intermediate generations of inbreeding through recurrent self-fertilization (selfing). This package implements functionality to compute probability trees for two- and three-marker genotypes in the F2 to F7 selfing generations. The conditional probabilities are derived automatically and in symbolic form. The package also provides functionality to extract and evaluate the relevant probabilities.
Copyright (c) 2014, Pioneer Hi-Bred International, Inc.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Pioneer Hi-Bred International, Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Package: | selfingTree |
Type: | Package |
Version: | 0.2 |
Date: | 2014-12-18 |
LazyData: | yes |
Depends: | foreach |
Function buildSelfingTree
generates the probability trees for
two- and three-marker genotypes. This is done by recursively calling
functions genSubtree.2M
(two-marker genotypes) or
genSubtree.3M
(three-marker genotypes). The core functionality of
deriving the symbolic conditional haplotype probabilities is implemented
in functions haploProb.2M
and haploProb.3M
. The function
nodeProbabilities
is used to symbolically multiply the
conditional probabilities along all branches and uses function
extractProbs
to extract the conditional probabilities from the
trees. Finally function evalProb
symbolically sums the marginal
probabilities of relevant nodes and evaluates them with user specified
values for the recombination frequencies. The function getTargets
can be used to identify relevant events given a target genotype.
Frank Technow
at Pioneer Hi-Bred International, Inc., Breeding Technologies, Johnson/IA, USA.
Maintainer: Frank Technow [email protected]
This function builds the probability tree for recurrent selfing.
buildSelfingTree(genF,generation,gam1,gam2)
buildSelfingTree(genF,generation,gam1,gam2)
genF |
A function that generates a sub-tree of all possible
genotypes given a parental genotype, either |
generation |
Integer giving the selfing generation to which the tree will be build. Values can range from 2 to 7, e.g., the F2 generation build by default and the highest possible generation is currently the F7. |
gam1 , gam2
|
Three (three marker genotypes) or two (two marker genotypes) character string with the configuration of gametes one and two of the parental F1 genotype. |
A recursive data type in the form of a nested list
. Each
element is a list
with three elements. Element [[1]]
holds the genotype configuration as "gam1-gam2" (e.g.,
"ABA-BAB"
), element [[2]]
the symbolic formula
representing the probability of observing this genotype given the
parental genotype and element [[3]]
is again a list containing
the sub-tree rooted at this genotype.
Frank Technow
## F2 and F3 genotypes F.2M <- buildSelfingTree(genSubtree.2M,3,"AA","BB") ## F2 and F3 genotypes F.3M <- buildSelfingTree(genSubtree.3M,3,"AAA","BBB")
## F2 and F3 genotypes F.2M <- buildSelfingTree(genSubtree.2M,3,"AA","BB") ## F2 and F3 genotypes F.3M <- buildSelfingTree(genSubtree.3M,3,"AAA","BBB")
This function symbolically sums the marginal probabilities of relevant nodes and evaluates them with user specified values for the recombination frequencies.
evalProb(node.prob, x = 0, y = 0, z = 0, chunk.size = min(length(node.prob),75))
evalProb(node.prob, x = 0, y = 0, z = 0, chunk.size = min(length(node.prob),75))
node.prob |
Character vector with symbolic marginal node
probabilities, i.e., a (subset of) an element of the list returned by
function |
x , y , z
|
Recombination frequencies. For three-marker genotypes,
|
chunk.size |
|
The genotype probability (numeric
).
Frank Technow
evalProb(extractProbs(genSubtree.3M("BAA","AAB")),x = 0.123,y = 0.344)
evalProb(extractProbs(genSubtree.3M("BAA","AAB")),x = 0.123,y = 0.344)
This function extracts the symbolic formulas for the conditional genotype probabilities from the uppermost level of the (sub)tree.
extractProbs(F)
extractProbs(F)
F |
A sub-tree in the format generated by function
|
A character vector with the symbolic formulas. For three-marker
genotypes, symbol x
is the recombination frequency between
markers 1 and 2 and y
that between markers 2 and 3. For
two-marker genotypes, symbol z
is the recombination frequency
between markers 1 and 2. The names of the elements indicate the
allelic configuration of the two gametes comprising the genotype as
gamete1-gamete2
(e.g., "AAB-AAA"
). The elements sum to
1.
Frank Technow
probs.2M <- extractProbs(genSubtree.2M("BA","AA")) probs.3M <- extractProbs(genSubtree.3M("BAA","AAB")) ## must sum to 1 stopifnot(all.equal(evalProb(probs.2M, z = 0.044),1)) stopifnot(all.equal(evalProb(probs.3M, x = 0.123, y = 0.344),1))
probs.2M <- extractProbs(genSubtree.2M("BA","AA")) probs.3M <- extractProbs(genSubtree.3M("BAA","AAB")) ## must sum to 1 stopifnot(all.equal(evalProb(probs.2M, z = 0.044),1)) stopifnot(all.equal(evalProb(probs.3M, x = 0.123, y = 0.344),1))
These are three-marker genotypes of one million F4 lines from a cross between parent A and B, simulated using R package hypred.
F4
F4
A character matrix with one million rows and three columns. Homozygousity for parents A or B is coded as "A" and "B", respectively. Heterozygousity as "H".
Frank Technow(2013). hypred: Simulation of Genomic Data in Applied Genetics. R package version 0.4.
These functions generate sub-trees consisting of all genotypes (and their conditional probabilities) that can result after selfing the parental genotype.
genSubtree.2M(gam1,gam2) ## two-marker genotypes genSubtree.3M(gam1,gam2) ## three-marker genotypes
genSubtree.2M(gam1,gam2) ## two-marker genotypes genSubtree.3M(gam1,gam2) ## three-marker genotypes
gam1 , gam2
|
Three (three-marker genotypes) or two (two-marker genotypes) character string with the configuration of gamete one and two of the parental genotype. |
A list
with one element per possible genotype. Each
element is itself a list
with two elements. Element
[[1]]
holds the genotype configuration as "gam1-gam2" (e.g.,
"ABA-BAB"
), element [[1]]
the symbolic formula
representing the probability of observing this genotype given the
parental genotype.
Frank Technow
genSubtree.2M("AB","AA") genSubtree.3M("ABA","AAA")
genSubtree.2M("AB","AA") genSubtree.3M("ABA","AAA")
This convenience function finds all genotypes that match a certain target configuration. It is used only if the target configuration contains heterozygous states, but order (e.g., A/B or B/A) does not matter.
getTargets(target.geno)
getTargets(target.geno)
target.geno |
Three (three-marker genotypes) or two (two-marker
genotypes) character string specifying the target
configuration. Homozygousity for parent A allele is indicated as
|
A character vector with all genotypes matching the target
configuration. The format complies with the output format of
branchProbabilities
(gamete1-gamete2
, e.g., "AAB-AAA"
)
Frank Technow
getTargets("AHB")
getTargets("AHB")
These functions derive the symbolic formula for the probability of observing the target haplotype given the parental genotype.
haploProb.2M(gam1,gam2,target) ## two-marker genotypes haploProb.3M(gam1,gam2,target) ## three-marker genotypes
haploProb.2M(gam1,gam2,target) ## two-marker genotypes haploProb.3M(gam1,gam2,target) ## three-marker genotypes
gam1 , gam2
|
Three (three-marker genotypes) or two (two-marker genotypes) character string with the configuration of gamete one and two of the parental genotype. |
target |
Three (three-marker genotypes) or two (two-marker genotypes) character string with the configuration of the target haplotype. |
The idea behind the algorithm is to conceptually "recode" the alleles of the parental genotype into "target" and "non-target", where "target" is relative to the target haplotype. Then the rules are determined that would rearrange the gametes of the parental genotype into a "target-target-target" haplotype. These rearrangement rules are then translated into the symbolic formula.
A character string with the symbolic formula. For three-marker
genotypes, x
is the recombination frequency between markers 1
and 2 and y
that between markers 2 and 3. For two-marker
genotypes, z
is the recombination frequency between markers 1 and
2.
Frank Technow
haploProb.2M("AA","BB","AB") haploProb.3M("AAA","BBB","ABA")
haploProb.2M("AA","BB","AB") haploProb.3M("AAA","BBB","ABA")
Genetic map of the three markers in the F4 data set. The unit is Morgan. This map can be used to compute the recombination frequencies between the markers using the inverse of the Haldane mapping function.
map
map
A numeric vector with three elements (c(0.00, 0.05, 0.20)
).
Frank Technow(2013). hypred: Simulation of Genomic Data in Applied Genetics. R package version 0.4.
This function generates the symbolic formulas representing the marginal node probabilities.
nodeProbabilities(F,generation)
nodeProbabilities(F,generation)
F |
A recurrent selfing tree, as generated by function
|
generation |
Integer giving the highest selfing generation contained
in |
Each formula represents the marginal probability of a particular node. Summing over all nodes for a particular genotype gives the probability of observing this genotype in this generation. The sum over all marginal node probabilities within a generation is 1.
A list
with as many elements as there were generations in
F
. The list elements are named "F2", "F3", etc. Each element is a
vector with the symbolic formulas for the marginal probabilities of all
possible nodes. The vector elements are named
and the names indicate the allelic configuration of the two gametes
comprising the genotype as gamete1-gamete2
(e.g.,
"AAB-AAA"
).
Frank Technow
## F2 and F3 genotypes node.probs <- nodeProbabilities(buildSelfingTree(genSubtree.2M,3,"AA","BB"),3) ## must sum to 1 stopifnot(all.equal(evalProb(node.probs[["F2"]],z = 0.045),1)) stopifnot(all.equal(evalProb(node.probs[["F3"]],z = 0.045),1))
## F2 and F3 genotypes node.probs <- nodeProbabilities(buildSelfingTree(genSubtree.2M,3,"AA","BB"),3) ## must sum to 1 stopifnot(all.equal(evalProb(node.probs[["F2"]],z = 0.045),1)) stopifnot(all.equal(evalProb(node.probs[["F3"]],z = 0.045),1))