Title: | Statistical Inference for Mixed Traces of DNA (Lite-Version) |
---|---|
Description: | Statistical methods for DNA mixture analysis. This package is a lite-version of the 'DNAmixtures' package to allow users without a 'HUGIN' software license to experiment with the statistical methodology. While the lite-version aims to provide the full functionality it is noticeably less efficient than the original 'DNAmixtures' package. For details on implementation and methodology see <https://dnamixtures.r-forge.r-project.org/>. |
Authors: | Therese Graversen [aut, cre] |
Maintainer: | Therese Graversen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0-1 |
Built: | 2024-11-03 07:18:55 UTC |
Source: | CRAN |
Tools for statistical inference for one or multiple DNA mixtures.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
The package implements a statistical model for analysis of one or more mixed samples of DNA in the possible presence of dropout and stutter. Details of the model can be found in Cowell et. al (2013), and details on the model checking tools and Bayesian network structure can be found in Graversen and Lauritzen (2014).
Any hypothesis involving unknown contributors relies on computations in a Bayesian network. For performing such computations, DNAmixtures package relies on Hugin (https://www.hugin.com) through the R-package RHugin. For an installation guide, see the package webpage https://dnamixtures.r-forge.r-project.org.
Although DNAmixtures can be installed with only the free version of Hugin, the size of the networks will in practice require the full licence. In theory, the implementation allows analysis with an arbitrary number of unknown contributors. However, in practice, depending on hardware and time-constraints working with up to 5 or 6 unknown contributors seems realistic.
The statistical model jointly models the observed peak heights and the set of contributors to the DNA sample(s). In the event of analysing multiple DNA mixtures, the union of the contributors is used as the contributor set for each mixture. By allowing a contribution of zero, we cover the case of a contributor not having contributed to a particular mixture.
Genotypes for unknown contributors are modelled using
allele-frequencies from a database specified by the user. The
database is also used to define the range of alleles at each
marker. A genotype for an unknown contributor is represented by a
vector of allele counts , counting for each allele
the number of alleles
that a person possesses; in
the network for a marker, the allele count
is
represented by a variable
n_i_a
. The vector of allele
counts follows a multinomial distribution with and the specified allele frequencies. It is assumed that
genotypes are independent across markers and between
contributors. If desired, the database of allele frequencies may
be corrected for F_st or sampling adjustment before use.
Peak heights are assumed mutually independent and their
distributions for a fixed set of DNA profiles are modelled using
gamma distributions. The peak height for allele in EPG
is assumed to follow a gamma distribution with scale parameter
and shape parameter
Applying a detection threshold , any peak height
falling below the threshold is considered to be 0. The peak
heights are denoted by
height1, ..., heightR
.
The model parameters are for each DNA mixture
The proportions of DNA from each contributor.
Amplification parameter, which will be larger for larger amounts of DNA amplified.
Scale parameter for the gamma distribution.
Mean stutter percentage. Allele uses stutter parameter
if the allele
is included in the model, and
otherwise
An alternative parametrisation uses and
, which can be interpreted as the mean
peak height and the coefficient of variation respectively. Besides
being interpretable, an advantage of this reparametrisation is
that the parameters are fairly orthogonal.
The model assumes the model parameters to be the same across markers. Relaxations of these assumptions are not implemented here.
The computational approach of the implementation of this package is discussed in Graversen and Lauritzen (2014).
The Bayesian networks include three types of auxiliary variables
O
, D
, and Q
; these can be thought of as
representing the observed peak heights, the absence/presence of
peaks, and the peak height distribution function. Note that if
invalid tables are set – for instance if very extreme parameter
values are used, or if the vector of mixture proportions is
mis-labeled – then any subsequent propagation will fail. No
roll-back functionality has so far been implemented to fix this,
and the easiest solution is to re-fit the mixture model.
The workhorses of this package are the functions
setCPT.O
, setCPT.D
and
setCPT.Q
for setting the conditional probability
tables for the three types of auxiliary variables according to
specified peak heights and model parameters.
As an experiment, it is possible to add the marker Amelogenin, provided that the marker is named "AMEL" and that the coding of alleles X and Y is of a particular form. One example of a suitable form is the coding X = 0 and Y = 1. The allele frequencies used should then also contain a marker "AMEL", and here frequencies have a slightly different interpretation than for the rest of the markers; as all people possess one X, the frequencies of X and Y denote the presence of an additional X or Y respectively, and thus the frequencies correspond directly to the proportions of the two genders.
Therese Graversen [email protected]
Details on the implemented model may be found in
Cowell, R. G., Graversen, T., Lauritzen, S., and Mortera, J. (2015). Analysis of Forensic DNA Mixtures with Artefacts. With supplementary material documenting the analyses using DNAmixtures. Journal of the Royal Statistical Society: Series C (Applied Statistics). Volume 64, Issue 1, pages 1-48.
Graversen, T. (2014) Statistical and Computational Methodology for the Analysis of Forensic DNA Mixtures with Artefacts. DPhil. University of Oxford. https://ora.ox.ac.uk/objects/uuid:4c3bfc88-25e7-4c5b-968f-10a35f5b82b0.
Graversen, T. and Lauritzen, S. (2014). Computational aspects of DNA mixture analysis. Statistics and Computing, DOI: 10.1007/s11222-014-9451-7.
## Analysing trace MC18, using USCaucasian allele-frequencies. data(MC18, USCaucasian, SGMplusDyes) ## The prosecution hypothesis could be that K1, K2, and K3 have ## contributed to the trace. Setting detection threshold to 50rfu. ## For layout as an EPG we follow the layout of SGMplus. mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian, dyes = list(SGMplusDyes)) plot(mixHp, epg = TRUE) ## The yellow is a bit difficult to see; we can ## change the colors representing the dyes plot(mixHp, epg = TRUE, dyecol = list(c("blue", "green", "black"))) ## Fit by maximum likelihood using p as the initial value for optimisation p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) mlHp <- mixML(mixHp, pars = p) mlHp$mle ## Find the estimated covariance matrix of the MLE V.Hp <- varEst(mixHp, mlHp$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) V.Hp$cov ## using (rho, eta) ## The summary is a table containing the MLE and their standard errors summary(V.Hp) ## Assess the fit of the distribution of peak heights qqpeak(mixHp, pars = mlHp$mle, dist = "conditional") ## Assess the prediction of allele presence pr <- prequential.score(mixHp, pars = mlHp$mle) plot(pr) ## Compare observed peak heights to peak heights simulated under the model sims <- rPeakHeight(mixHp, mlHp$mle, nsim = 100, dist = "conditional") oldpar <- par(mfrow = c(2,5), mar = c(2,2,2,0)) boxplot(mixHp, sims) par(oldpar) data(MC18, USCaucasian, SGMplusDyes) ## A defence hypothesis could be that one unknown person has ## contributed along with K1 and K3. mixHd <- DNAmixture(list(MC18), k = 3, K = c("K1", "K3"), C = list(50), database = USCaucasian, dyes = list(SGMplusDyes)) ## Fit by ML mlHd <- mixML(mixHd, pars = mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, U1 = 0.19)))) mlHd$mle ## Find the estimated covariance matrix of the MLE V <- varEst(mixHd, mlHd$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) summary(V) ## Assess the peak height distribution qq <- qqpeak(mixHd, pars = mlHd$mle, dist = "conditional") ## Assess the prediction of allele presence pr <- prequential.score(mixHd, pars = mlHd$mle) plot(pr, col = pr$trace) ## Compare observed peak heights to peak heights simulated under the model sims <- rPeakHeight(mixHd, mlHd$mle, nsim = 100, dist = "conditional") par(mfrow = c(2,5), mar = c(2,2,2,0)) boxplot(mixHd, sims) ## The log10 of likelihood-ratio can be found as (mlHp$lik - mlHd$lik)/log(10) ## We can find the most probable genotypes for U1 under the hypothesis ## Hd : K1, K2, and U1. ## Include the peak heights setPeakInfo(mixHd, mlHd$mle) ## Jointly best for all unknown contributors mp <- map.genotypes(mixHd, pmin = 0.01, type = "seen") ## See the genotypes rather than allelecounts print(summary(mp), marker = "D2S1338") ## Include only information on presence and absence of alleles setPeakInfo(mixHd, mlHd$mle, presence.only = TRUE) ## Jointly best for all unknown contributors mp2 <- map.genotypes(mixHd, pmin = 0.01, type = "seen") ## See the genotypes rather than allelecounts print(summary(mp2), marker = "D2S1338")
## Analysing trace MC18, using USCaucasian allele-frequencies. data(MC18, USCaucasian, SGMplusDyes) ## The prosecution hypothesis could be that K1, K2, and K3 have ## contributed to the trace. Setting detection threshold to 50rfu. ## For layout as an EPG we follow the layout of SGMplus. mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian, dyes = list(SGMplusDyes)) plot(mixHp, epg = TRUE) ## The yellow is a bit difficult to see; we can ## change the colors representing the dyes plot(mixHp, epg = TRUE, dyecol = list(c("blue", "green", "black"))) ## Fit by maximum likelihood using p as the initial value for optimisation p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) mlHp <- mixML(mixHp, pars = p) mlHp$mle ## Find the estimated covariance matrix of the MLE V.Hp <- varEst(mixHp, mlHp$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) V.Hp$cov ## using (rho, eta) ## The summary is a table containing the MLE and their standard errors summary(V.Hp) ## Assess the fit of the distribution of peak heights qqpeak(mixHp, pars = mlHp$mle, dist = "conditional") ## Assess the prediction of allele presence pr <- prequential.score(mixHp, pars = mlHp$mle) plot(pr) ## Compare observed peak heights to peak heights simulated under the model sims <- rPeakHeight(mixHp, mlHp$mle, nsim = 100, dist = "conditional") oldpar <- par(mfrow = c(2,5), mar = c(2,2,2,0)) boxplot(mixHp, sims) par(oldpar) data(MC18, USCaucasian, SGMplusDyes) ## A defence hypothesis could be that one unknown person has ## contributed along with K1 and K3. mixHd <- DNAmixture(list(MC18), k = 3, K = c("K1", "K3"), C = list(50), database = USCaucasian, dyes = list(SGMplusDyes)) ## Fit by ML mlHd <- mixML(mixHd, pars = mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, U1 = 0.19)))) mlHd$mle ## Find the estimated covariance matrix of the MLE V <- varEst(mixHd, mlHd$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) summary(V) ## Assess the peak height distribution qq <- qqpeak(mixHd, pars = mlHd$mle, dist = "conditional") ## Assess the prediction of allele presence pr <- prequential.score(mixHd, pars = mlHd$mle) plot(pr, col = pr$trace) ## Compare observed peak heights to peak heights simulated under the model sims <- rPeakHeight(mixHd, mlHd$mle, nsim = 100, dist = "conditional") par(mfrow = c(2,5), mar = c(2,2,2,0)) boxplot(mixHd, sims) ## The log10 of likelihood-ratio can be found as (mlHp$lik - mlHd$lik)/log(10) ## We can find the most probable genotypes for U1 under the hypothesis ## Hd : K1, K2, and U1. ## Include the peak heights setPeakInfo(mixHd, mlHd$mle) ## Jointly best for all unknown contributors mp <- map.genotypes(mixHd, pmin = 0.01, type = "seen") ## See the genotypes rather than allelecounts print(summary(mp), marker = "D2S1338") ## Include only information on presence and absence of alleles setPeakInfo(mixHd, mlHd$mle, presence.only = TRUE) ## Jointly best for all unknown contributors mp2 <- map.genotypes(mixHd, pmin = 0.01, type = "seen") ## See the genotypes rather than allelecounts print(summary(mp2), marker = "D2S1338")
A plot will be made for each combination of samples and markers
specified, and it is up to the user to specify a layout for the
plots (e.g. via calls to par
)
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
## S3 method for class 'DNAmixture' boxplot( x, sims, traces = 1:x$ntraces, markers = x$markers, pw = 0.4, ylim = NULL, border = "grey", ... )
## S3 method for class 'DNAmixture' boxplot( x, sims, traces = 1:x$ntraces, markers = x$markers, pw = 0.4, ylim = NULL, border = "grey", ... )
x |
A |
sims |
A set of simulated peak heights. See also |
traces |
Selected traces to plot. |
markers |
Selected markers to plot. |
pw |
Peaks are |
ylim |
Range of the y-axis. |
border |
Color of the boxes. |
... |
Arguments passed on to |
Invisibly the peak heights as used for the boxplots.
Therese Graversen
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = c(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(K1 = 0.9, K3 = 0.05, U1 = 0.05))) rpm <- rPeakHeight(mix, p, nsim = 1000, dist = "joint") oldpar <- par("mfrow") par(mfrow = c(1,2)) boxplot(mix, rpm, traces = 1:2, markers = "VWA") par(mfrow = oldpar) data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = "K3", database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = c(0.08,0.08), phi = list(c(U2 = 0.1, K3 = 0.2, U1 = 0.7), c(U1 = 0.9, K3 = 0.05, U2 = 0.05))) rpm <- rPeakHeight(mix, p, nsim = 50, dist = "joint") rpc <- rPeakHeight(mix, p, nsim = 50, dist = "conditional") oldpar <- par("mfrow") par(mfrow = c(2,2)) boxplot(mix, rpm, traces = 1:2, markers = "VWA") ## First row of plots boxplot(mix, rpc, traces = 1:2, markers = "VWA") ## Second row of plots par(mfrow = oldpar) mixK <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K2", "K3"), database = USCaucasian) pK <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) rpmK <- rPeakHeight(mixK, pK, nsim = 1000, markers = "D2S1338") oldpar <- par(mfrow = c(1,2)) boxplot(mixK, rpmK, markers = "D2S1338") par(oldpar)
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = c(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(K1 = 0.9, K3 = 0.05, U1 = 0.05))) rpm <- rPeakHeight(mix, p, nsim = 1000, dist = "joint") oldpar <- par("mfrow") par(mfrow = c(1,2)) boxplot(mix, rpm, traces = 1:2, markers = "VWA") par(mfrow = oldpar) data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = "K3", database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = c(0.08,0.08), phi = list(c(U2 = 0.1, K3 = 0.2, U1 = 0.7), c(U1 = 0.9, K3 = 0.05, U2 = 0.05))) rpm <- rPeakHeight(mix, p, nsim = 50, dist = "joint") rpc <- rPeakHeight(mix, p, nsim = 50, dist = "conditional") oldpar <- par("mfrow") par(mfrow = c(2,2)) boxplot(mix, rpm, traces = 1:2, markers = "VWA") ## First row of plots boxplot(mix, rpc, traces = 1:2, markers = "VWA") ## Second row of plots par(mfrow = oldpar) mixK <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K2", "K3"), database = USCaucasian) pK <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) rpmK <- rPeakHeight(mixK, pK, nsim = 1000, markers = "D2S1338") oldpar <- par(mfrow = c(1,2)) boxplot(mixK, rpmK, markers = "D2S1338") par(oldpar)
The function is intended for internal use in
DNAmixture
, and it creates one network per
marker.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
buildMixtureDomains( n.unknown, data, domainnamelist, write, dir = NULL, ntraces = 1, triangulate = TRUE, compile = TRUE, compress = TRUE, use.order = TRUE )
buildMixtureDomains( n.unknown, data, domainnamelist, write, dir = NULL, ntraces = 1, triangulate = TRUE, compile = TRUE, compress = TRUE, use.order = TRUE )
n.unknown |
Number of unknown contributors |
data |
A list of |
domainnamelist |
List of names for networkfiles. |
write |
[not available in lite-version] Save networkfiles? Defaults to |
dir |
Path to the networkfiles. |
ntraces |
Number of traces (EPG's) |
triangulate |
Triangulate the networks? Default is to triangulate the network using a good elimination order. |
compile |
Compile the networks? Defaults to |
compress |
Compress the network? Defaults to |
use.order |
Should the default elimination order be used for triangulation? Otherwise the "total.weight" heuristic for triangulation in HUGIN is used. |
For one marker, the network contains variables
n_i_a
The number of alleles that unknown contributor
Ui
possesses.
S_i_a
Cumulative allele counts, summarising how many alleles amongst types 1, ..., that contributor
possesses.
O_r_a
Binary variable representing observed peak height for allele in EPG
. See
setCPT.O
for details.
D_r_a
Binary variable representing the event of a peak for allele falling below threshold in EPG
.
Q_r_a
Binary variable representing the event of a peak for allele being smaller than the peak observed in EPG
.
The network is by default triangulated, compiled,
compressed, and, optionally, saved as a hkb-file. If the networks
are large – if there are many unknown contributors – it is worth
considering saving the networks rather than re-building them every time
DNAmixture
is called.
A list containing a list of pointers for RHugin domains. Additionally a list of the total size of the compressed junction tree in percent of the uncompressed size.
If amelogenin is included as a marker, use integer codes for the alleles with X preceding Y, e.g. X=0, Y=1.
Therese Graversen
A model object for analysis of one or more DNA mixtures. For a
brief overview of the package functionality, see
in particular DNAmixtures
.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
DNAmixture( data, k, C, database, K = character(0), reference.profiles = NULL, dir = character(0), domainnamelist = NULL, load = FALSE, write = FALSE, dyes = NULL, triangulate = TRUE, compile = TRUE, compress = TRUE, use.order = TRUE ) ## S3 method for class 'DNAmixture' print(x, ...)
DNAmixture( data, k, C, database, K = character(0), reference.profiles = NULL, dir = character(0), domainnamelist = NULL, load = FALSE, write = FALSE, dyes = NULL, triangulate = TRUE, compile = TRUE, compress = TRUE, use.order = TRUE ) ## S3 method for class 'DNAmixture' print(x, ...)
data |
A list containing one |
k |
Number of contributors. |
C |
A list of thresholds, one for each mixture. |
database |
A data.frame containing at least variables |
K |
Names of reference profiles; these can be chosen freely, but should match (possibly only a subset of) the names specified by the reference profiles. |
reference.profiles |
A data.frame containing allele counts for each reference profile, if not specified in |
dir |
Location of network files if loading or saving the networks. |
domainnamelist |
Names of marker-wise network files (without hkb-extension). Default is the set of markers. |
load |
Read networks from disk? |
write |
Save networks as hkb files? |
dyes |
A list containing a list of dyes indexed by markers |
triangulate |
Triangulate the networks? Default is to triangulate the network using a good elimination order. |
compile |
Compile the networks? |
compress |
Compress the network? Defaults to |
use.order |
Should the default elimination order be used for triangulation? Otherwise the "total.weight" heuristic for triangulation in Hugin is used. |
x |
An object of class |
... |
not used. |
The names for known contributors can be chosen freely, whereas
unknown contributors are always termed U1,U2, ...
.
We allow for stutter to an allele one repeat number shorter. The
range of alleles at a marker is defined by the union of alleles
specified though the peak heights, the reference profiles, and the
allele frequencies. Any alleles that are included, but not found
in the database, will be assigned frequency NA
, and it is then up
to the user to decide on further actions. If a particular mixture
has no observations at a marker, the heights are set to NA
, but if
the mixture has some peaks at that marker, then missing heights
are all set to 0. Note that we hereby cover the possibility that
mixtures are analysed with different kits, and so are observed at
different markers. We do not (readily) allow kits to have
different ranges of possible alleles at one marker.
If amelogenin is included in the analysis, the marker should be
named "AMEL"
and an integer coding such as X=0, Y=1, where X is
assigned a lower number than Y, should be used. Note that in terms
of amelogenin, the allele frequencies have a slighly different
interpretation to that for other markers, in that they denote the
probability of having an additional X or Y to the X that
all people have. Thus, a natural choice will be ,
denoting equal probability of a male or female contributor.
An object of class DNAmixture
. This contains amongst other things
markers |
The joint set of markers used for the mixtures specified. |
domains |
For models involving unknown contributors,
a list containing one Bayesian network ( |
data |
A list containing for each marker the combined allele frequencies,
peak heights, and reference profiles as produced by |
data(MC15, MC18, USCaucasian) DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K2"), database = USCaucasian) DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K3", "K1", "K2"), database = USCaucasian)
data(MC15, MC18, USCaucasian) DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K2"), database = USCaucasian) DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K3", "K1", "K2"), database = USCaucasian)
The function is intended for internal use in
DNAmixture
and its purpose is to combine peak height
data, any reference profiles, and allele frequencies.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
DNAmixtureData(data, database, K = character(0), reference.profiles = NULL)
DNAmixtureData(data, database, K = character(0), reference.profiles = NULL)
data |
Either one |
database |
A |
K |
Names for reference profiles. |
reference.profiles |
Optionally, a |
list of data.frame
's indexed by markers and
containing variables
marker |
The STR marker |
allele |
Repeat number of the allele |
frequency |
Allele frequency |
height1 , ... , heightN
|
Peak heights for each of the N mixtures analysed;
their order follows the order in which the mixtures are specified in |
stutter.from |
For internal use. Where do the alleles get stutter from?
A value of |
stutter.to |
As above. |
as well as known DNA profiles as labelled by K
.
Therese Graversen
## Create a dataset for two markers with each 3 observed alleles epgdf <- data.frame(marker = rep(c("FGA", "TH01"), each = 3), allele = c(18, 23, 27, 7, 8, 9.3), ## Observed alleles height = c(100, 100, 200, 200, 100, 100), ## Peak heights Anna = c(0,0,2,1,0,1), ## Anna's profile Peter = c(1,1,0,1,1,0)) ## Peter's profile data(USCaucasian) dat <- DNAmixtureData(epgdf, K = c("Anna", "Peter"), database = USCaucasian)
## Create a dataset for two markers with each 3 observed alleles epgdf <- data.frame(marker = rep(c("FGA", "TH01"), each = 3), allele = c(18, 23, 27, 7, 8, 9.3), ## Observed alleles height = c(100, 100, 200, 200, 100, 100), ## Peak heights Anna = c(0,0,2,1,0,1), ## Anna's profile Peter = c(1,1,0,1,1,0)) ## Peter's profile data(USCaucasian) dat <- DNAmixtureData(epgdf, K = c("Anna", "Peter"), database = USCaucasian)
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
dyes(mixture) dyes(mixture) <- value
dyes(mixture) dyes(mixture) <- value
mixture |
A DNA mixture |
value |
A list with one item per dye, each containing a vector of marker names. |
The dyes used for the DNA mixture
The function is mainly for internal use. It calculates the shape parameters, or contribution to the shape parameter, that correspond to the known contributors in the mixture.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
getShapes(mixture, pars, allelecounts = NULL)
getShapes(mixture, pars, allelecounts = NULL)
mixture |
A |
pars |
Model parameters. |
allelecounts |
This argument is possibly redundant. |
For each marker a matrix of shape-parameters.
Therese Graversen
predict
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) sh <- getShapes(mix, p) sh$VWA
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) sh <- getShapes(mix, p) sh$VWA
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
The function logL
is used to produce a log likelihood
function for one or more traces of DNA. logL
calls one of
four other likelihood functions according to whether peak heights
or only peak presence/absence is used as observations, and also
whether there are any unknown contributors in the model.
logL(mixture, presence.only = FALSE, initialize = TRUE) logL.UK(mixture, initialize = TRUE) logL.K(mixture) logLpres.UK(mixture, initialize = TRUE) logLpres.K(mixture)
logL(mixture, presence.only = FALSE, initialize = TRUE) logL.UK(mixture, initialize = TRUE) logL.K(mixture) logLpres.UK(mixture, initialize = TRUE) logLpres.K(mixture)
mixture |
A |
presence.only |
Set to TRUE to ignore peak heights and evaluate the likelihood function considering peak presence and absence (heights above and below threshold) only. Defaults to FALSE. |
initialize |
By default all entered
evidence is removed from the networks in |
A function, which takes a mixpar
model parameter as argument.
In the presence of unknown contributors, the returned
likelihood function relies on the Bayesian networks in the
DNAmixture
object. If any evidence is entered or propagated
in these networks after creating the likelihood function, the
function will no longer compute the correct likelihood. In
particular, be ware of calling other functions in between calls to
the likelihood function, as they may change the state of the
networks.
Therese Graversen
data(MC18, USCaucasian) mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) l <- logL(mixHp) l(p)
data(MC18, USCaucasian) mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) l <- logL(mixHp) l(p)
For each marker, a ranked list of configurations of genotypes for some
or all unknown contributors is returned. The list contains all
configurations with posterior probability higher than some
specified pmin
.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
map.genotypes( mixture, pmin, U = seq_along(mixture$U), markers = mixture$markers, type = c("seen", "all", "unseen") )
map.genotypes( mixture, pmin, U = seq_along(mixture$U), markers = mixture$markers, type = c("seen", "all", "unseen") )
mixture |
a DNA mixture |
pmin |
A list of the minimum probability to consider for each marker. |
U |
Optionally the indices of the unknown contributors of interest, specified as an integer vector. |
markers |
Optionally, a subset of markers. |
type |
It may be of interest to consider only the prediction of alleles in some subset of alleles. We allow
|
Note that an error occurs if there are no configurations
with probability higher than pmin
. In this case, try a
smaller pmin
.
The function makes use of
map.configurations
, which localises the
configurations of highest high posterior probability by simulating
from the Bayesian networks until enough (at least mass
1-pmin
) of the state space has been explored – the
computation time is thus dependent on how flat the posterior is,
and how small pmin
is. The simulation is used only to locate
the relevant configurations; the computed probabilities are exact.
A list, which for each marker contains the maximum
posterior configurations of allele counts (genotypes) above the specified probabilities pmin
.
data(MC18, USCaucasian, SGMplusDyes) mix <- DNAmixture(list(MC18), k = 3, K = "K1", C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(30), xi = list(0.07), phi = list(c(K1 = 0.7, U1 =0.2, U2 = 0.1))) ## Inlude the peak height information setPeakInfo(mix, p) ## Marginally best genotypes for contributor U1 mpU1 <- map.genotypes(mix, pmin = 0.01, U = 1, type = "seen", markers = "D16S539") summary(mpU1) ## Jointly best genotypes for all unknown contributors mp <- map.genotypes(mix, pmin = 0.01, type = "seen") summary(mp) ## Profiles as genotypes rather than allelecounts
data(MC18, USCaucasian, SGMplusDyes) mix <- DNAmixture(list(MC18), k = 3, K = "K1", C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(30), xi = list(0.07), phi = list(c(K1 = 0.7, U1 =0.2, U2 = 0.1))) ## Inlude the peak height information setPeakInfo(mix, p) ## Marginally best genotypes for contributor U1 mpU1 <- map.genotypes(mix, pmin = 0.01, U = 1, type = "seen", markers = "D16S539") summary(mpU1) ## Jointly best genotypes for all unknown contributors mp <- map.genotypes(mix, pmin = 0.01, type = "seen") summary(mp) ## Profiles as genotypes rather than allelecounts
Peak heights from analysis of a bloodstain with DNA
from an unknown number of contributors. There are three reference
profiles K1
, K2
, and K3
for individuals
related to the case.
A data.frame
.
Peter Gill et. al. (2008) Interpretation of complex DNA profiles using empirical models and a method to measure their robustness. Forensic Science International: Genetics, 2(2):91–103.
Peak heights from analysis of a bloodstain with DNA
from an unknown number of contributors. There are three reference
profiles K1
, K2
, and K3
for individuals
related to the case.
A data.frame
.
Peter Gill et. al. (2008) Interpretation of complex DNA profiles using empirical models and a method to measure their robustness. Forensic Science International: Genetics, 2(2):91–103.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
mixML( mixture, pars, constraints = NULL, phi.eq = FALSE, val = NULL, trace = FALSE, order.unknowns = TRUE, ... )
mixML( mixture, pars, constraints = NULL, phi.eq = FALSE, val = NULL, trace = FALSE, order.unknowns = TRUE, ... )
mixture |
A |
pars |
A |
constraints |
Equality constraint function as function of an array of parameters. |
phi.eq |
Should the mixture proportions be the same for all traces? Defaults to FALSE. |
val |
Vector of values to be satisfied for the equality constraints. |
trace |
Print the evaluations of the likelihood-function during optimisation? |
order.unknowns |
Should unknown contributors be ordered according to decreasing contributions? Defaults to TRUE. |
... |
Further arguments to be passed on to |
The pre-specified constraints for the model parameter pars
are for each mixture r
0 <= pars[[r, "rho"]] < Inf
0 <= pars[[r, "eta"]] < Inf
0 <= pars[[r, "xi"]] <= 1
0 <= pars[[r, "phi"]][i] <= 1
sum(pars[[r, "phi"]]) == 1
.
If there are 2 or more unknown contributors, then the mixture proportions for the unknown contributors are ordered decreasingly with the first DNA mixture determining the order.
Further constraints can be specified by the user; for this see examples below.
A list containing
mle |
The (suggested) MLE. |
lik |
The log of the likelihood (log e). |
as well as the output from the optimisation.
Therese Graversen
data(MC15, USCaucasian) mix <- DNAmixture(list(MC15), C = list(50), k = 3, K = c("K1", "K2", "K3"), database = USCaucasian) startpar <- mixpar(rho = list(24), eta = list(37), xi = list(0.08), phi = list(c(K3 = 0.15, K1 = 0.8, K2 = 0.05))) ml <- mixML(mix, startpar) ml$mle data(MC15, USCaucasian) mix <- DNAmixture(list(MC15), C = list(50), k = 3, K = "K1", database = USCaucasian) startpar <- mixpar(rho = list(24), eta = list(37), xi = list(0.08), phi = list(c(U1 = 0.05, K1 = 0.8, U2 = 0.15))) ml <- mixML(mix, startpar) ml$mle ## The following advanced example has a model with two DNA samples ## and various parameter restrictions. ## Be aware that the computation is rather demanding and takes ## a long time to run with this lite-version of DNAmixtures. ## With DNAmixtures (based on HUGIN), it takes only about one minute. data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = "K1", database = USCaucasian) startpar <- mixpar(rho = list(24, 25), eta = list(37, 40), xi = list(0.08, 0.1), phi = list(c(U1 = 0.05, K1 = 0.7, U2 = 0.25), c(K1 = 0.7, U2 = 0.1, U1 = 0.2))) eqxis <- function(x){ diff(unlist(x[,"xi"])) } ## Note that for these two traces, we do not expect phi to be equal. ## Here we set stutter equal for all traces ml.diff <- mixML(mix, startpar, eqxis, val = 0, phi.eq = FALSE) ## Equal mixture proportions across traces ml.eqphi <- mixML(mix, startpar, eqxis, val = 0, phi.eq = TRUE) ## Likelihood ratio test for equal mixture proportions pchisq(-2*(ml.eqphi$lik - ml.diff$lik), df = 1, lower.tail = FALSE)
data(MC15, USCaucasian) mix <- DNAmixture(list(MC15), C = list(50), k = 3, K = c("K1", "K2", "K3"), database = USCaucasian) startpar <- mixpar(rho = list(24), eta = list(37), xi = list(0.08), phi = list(c(K3 = 0.15, K1 = 0.8, K2 = 0.05))) ml <- mixML(mix, startpar) ml$mle data(MC15, USCaucasian) mix <- DNAmixture(list(MC15), C = list(50), k = 3, K = "K1", database = USCaucasian) startpar <- mixpar(rho = list(24), eta = list(37), xi = list(0.08), phi = list(c(U1 = 0.05, K1 = 0.8, U2 = 0.15))) ml <- mixML(mix, startpar) ml$mle ## The following advanced example has a model with two DNA samples ## and various parameter restrictions. ## Be aware that the computation is rather demanding and takes ## a long time to run with this lite-version of DNAmixtures. ## With DNAmixtures (based on HUGIN), it takes only about one minute. data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = "K1", database = USCaucasian) startpar <- mixpar(rho = list(24, 25), eta = list(37, 40), xi = list(0.08, 0.1), phi = list(c(U1 = 0.05, K1 = 0.7, U2 = 0.25), c(K1 = 0.7, U2 = 0.1, U1 = 0.2))) eqxis <- function(x){ diff(unlist(x[,"xi"])) } ## Note that for these two traces, we do not expect phi to be equal. ## Here we set stutter equal for all traces ml.diff <- mixML(mix, startpar, eqxis, val = 0, phi.eq = FALSE) ## Equal mixture proportions across traces ml.eqphi <- mixML(mix, startpar, eqxis, val = 0, phi.eq = TRUE) ## Likelihood ratio test for equal mixture proportions pchisq(-2*(ml.eqphi$lik - ml.diff$lik), df = 1, lower.tail = FALSE)
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
mixpar(rho = NULL, eta = NULL, xi = NULL, phi = NULL, parlist = NULL) ## S3 method for class 'mixpar' print(x, digits = max(3L, getOption("digits") - 3L), scientific = FALSE, ...)
mixpar(rho = NULL, eta = NULL, xi = NULL, phi = NULL, parlist = NULL) ## S3 method for class 'mixpar' print(x, digits = max(3L, getOption("digits") - 3L), scientific = FALSE, ...)
rho |
Amplification factor. |
eta |
Scale parameter in the gamma distribution. |
xi |
Stutter parameter. |
phi |
Named vector of the fraction of DNA contributed by each contributor. |
parlist |
A list of parameters of class |
x |
An object of class |
digits |
The number of digits to print |
scientific |
Should scientific notation be used? |
... |
arguments passed to print |
The mixture parameter is a two-way array of lists;
columns correspond to the four model parameters rho
,
eta
, xi
, and phi
, and rows correspond to the
mixtures included in the model.
The print method is currently somewhat specialised, in that it
assumes that rho
, eta
, and xi
are merely real
numbers. phi
is assumed to consist of a named vector per
mixture; the names, or order of names, can differ between mixtures.
An object of class "mixpar".
Therese Graversen
## A parameter for two mixtures q <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(Anna = 0.5, Peter = 0.2, U1 = 0.3), c(U1 = 0.5, Anna = 0.2, Peter = 0.3))) ## Equivalent to specifying the parameter for each mixture and then combining. p1 <- mixpar(rho = list(30), eta = list(30), xi = list(0.08), phi = list(c(Anna = 0.5, Peter = 0.2, U1 = 0.3))) p2 <- mixpar(rho = list(30), eta = list(30), xi = list(0.08), phi = list(c(U1 = 0.5, Anna = 0.2, Peter = 0.3))) p <- mixpar(parlist = list(p1, p2))
## A parameter for two mixtures q <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(Anna = 0.5, Peter = 0.2, U1 = 0.3), c(U1 = 0.5, Anna = 0.2, Peter = 0.3))) ## Equivalent to specifying the parameter for each mixture and then combining. p1 <- mixpar(rho = list(30), eta = list(30), xi = list(0.08), phi = list(c(Anna = 0.5, Peter = 0.2, U1 = 0.3))) p2 <- mixpar(rho = list(30), eta = list(30), xi = list(0.08), phi = list(c(U1 = 0.5, Anna = 0.2, Peter = 0.3))) p <- mixpar(parlist = list(p1, p2))
NGM allele frequencies.
A list containing
USCaucasian
A data.frame
with, for each marker,
the set of possible alleles as well as their corresponding frequency.
Budowle et. al (2011) Population Genetic Analyses of the NGM STR loci. International Journal of Legal Medicine, 125(1):101-109.
Dyes used for the AmpFlSTR NGM PCR Amplification Kit
A list with one item per dye, each containing a vector of marker-names specifying the order in which they occur in an EPG.
Applied Biosystems
Plot of peak heights against repeat numbers for each marker.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
## S3 method for class 'DNAmixture' plot( x, traces = seq_len(x$ntraces), markers = x$markers, epg = FALSE, dyecol = lapply(dyes(x), names), pw = 0.4, threshold = TRUE, panel = NULL, add = FALSE, ask = NULL, ... )
## S3 method for class 'DNAmixture' plot( x, traces = seq_len(x$ntraces), markers = x$markers, epg = FALSE, dyecol = lapply(dyes(x), names), pw = 0.4, threshold = TRUE, panel = NULL, add = FALSE, ask = NULL, ... )
x |
A |
traces |
Indices giving the mixtures, for which plots should be made. |
markers |
Vector of names giving the markers, for which plots should be made. |
epg |
Should a stylized EPG be produced? This requires a list
of dyes to be specified for the mixtures, possibly through
|
dyecol |
List containing for each mixture a vector of dye
names. The default is to use the names in |
pw |
Peaks are |
threshold |
Should the detection thresholds be indicated on the plot? |
panel |
Alternative function for drawing the peaks. For
instance |
add |
Add to existing plot? (not meaningful if |
ask |
Should the user be prompted for page changes? The
default is |
... |
Other parameters to be passed on to |
A data.frame
containing the plotted data points.
data(MC15, MC18, USCaucasian, SGMplusDyes) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3"), database = USCaucasian) ## Plot as a stylized EPG, using "orange" for the "yellow" dye names(SGMplusDyes) <- c("blue", "green", "orange") dyes(mix) <- list(SGMplusDyes, SGMplusDyes) plot(mix, epg = TRUE) ## We can also supress the dye colors plot(mix, traces = 1, epg = TRUE, dyecol = NULL) ## Create a user specified layout op <- par(mfrow = c(2,3)) plot(mix, markers = c("VWA", "D19S433", "TH01"), col = "red") par(mfrow = op) plot(mix, traces = 1, markers = "D19S433", col = "orange", main = "A single marker", cex.main = 2, ylim = c(0,200))
data(MC15, MC18, USCaucasian, SGMplusDyes) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3"), database = USCaucasian) ## Plot as a stylized EPG, using "orange" for the "yellow" dye names(SGMplusDyes) <- c("blue", "green", "orange") dyes(mix) <- list(SGMplusDyes, SGMplusDyes) plot(mix, epg = TRUE) ## We can also supress the dye colors plot(mix, traces = 1, epg = TRUE, dyecol = NULL) ## Create a user specified layout op <- par(mfrow = c(2,3)) plot(mix, markers = c("VWA", "D19S433", "TH01"), col = "red") par(mfrow = op) plot(mix, traces = 1, markers = "D19S433", col = "orange", main = "A single marker", cex.main = 2, ylim = c(0,200))
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
## S3 method for class 'DNAmixture' predict( object, pars, dist = c("joint", "conditional", "prequential"), markers = object$markers, by.allele = TRUE, initialize = TRUE, ... )
## S3 method for class 'DNAmixture' predict( object, pars, dist = c("joint", "conditional", "prequential"), markers = object$markers, by.allele = TRUE, initialize = TRUE, ... )
object |
A |
pars |
Array of model parameters |
dist |
One of "joint", "conditional", and "prequential". If there are only known contributors, these are all the same since, under the model, peak heights are condtionally independent given profiles of the contributors. |
markers |
The set of markers of interest |
by.allele |
If |
initialize |
By default |
... |
Not used |
For a mixture with unknown contributors, the
probabilities are computed with respect to one of three
distributions. Let height
be the matrix of peak heights
with columns height1, ..., heightR
. For a peak at allele a
in the mixture r
, the three choices of distributions are
"joint"
Default. No conditioning on observed peak heights.
"conditional"
Conditional on height[-a, -r]
, i.e. on heights for all peaks, except the one under consideration.
"prequential"
Conditional on height[1:(a-1), 1:(r-1)]
, i.e. on heights for all peaks "before" the peak under consideration (see argument by.allele
for details).
If all contributors are known, the three distributions are the same due to independence of the peak heights.
A list with one data.frame per marker containing various probabilities for diagnostics
unseen |
The probability of not seeing a peak, i.e. no peak or a peak falling below the threshold |
seen |
The probability of seeing the allele |
smaller |
The probability of seeing a smaller peak than the one observed |
larger |
The probability of seeing a larger peak than the one observed |
Therese Graversen
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) pred <- predict(mix, p) pred$VWA
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) pred <- predict(mix, p) pred$VWA
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
prequential.score(mixture, pars, markers = mixture$markers, by.allele = TRUE) ## S3 method for class 'prequential.score' plot(x, normalise = FALSE, ylab = NULL, ylim = NULL, ...)
prequential.score(mixture, pars, markers = mixture$markers, by.allele = TRUE) ## S3 method for class 'prequential.score' plot(x, normalise = FALSE, ylab = NULL, ylim = NULL, ...)
mixture |
A |
pars |
A |
markers |
An ordering of the markers, possibly a subset of the markers only. |
by.allele |
Should conditioning be done allele-wise ( |
x |
A data.frame containing at least variables |
normalise |
Should the prequential score be normalised? Defaults to |
ylab |
Label for the y-axis. |
ylim |
Range for the y-axis. |
... |
Additional arguments to be passed on to |
A data.frame, which contains the output from
predict
as well as columns Y
, EY
, VY
, corresponding
to the log-score and its mean and variance. Finally the variable
score
is added, which is the normalised cumulative log-score.
Therese Graversen
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) preq <- prequential.score(mix, pars = p) plot(preq, col = preq$trace) ## Annotate using repeat numbers text(preq$score, labels = preq$allele, pos = c(1,3), col = preq$trace, cex = 0.6)
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50,50), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) preq <- prequential.score(mix, pars = p) plot(preq, col = preq$trace) ## Annotate using repeat numbers text(preq$score, labels = preq$allele, pos = c(1,3), col = preq$trace, cex = 0.6)
Dyes used for the AmpFlSTR Profiler Plus PCR Amplification Kit
A list with one item per dye, each containing a vector of marker-names specifying the order in which they occur in an EPG.
Applied Biosystems
Given , the peak height
follows a
continuous distribution. The probability transform
thus follows a uniform distribution. The function
qqpeak
computes the quantiles of the probability transform
and produces a quantile-quantile plot. Information about the other
peak heights in the EPG may be taken into account in the
distribution of a peak.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
qqpeak( x, pars, dist = c("joint", "conditional", "prequential"), by.allele = TRUE, plot = TRUE, xlab = "Uniform quantiles", ylab = NULL, xlim = NULL, ylim = xlim, ... )
qqpeak( x, pars, dist = c("joint", "conditional", "prequential"), by.allele = TRUE, plot = TRUE, xlab = "Uniform quantiles", ylab = NULL, xlim = NULL, ylim = xlim, ... )
x |
A |
pars |
A |
dist |
|
by.allele |
Order of conditioning when |
plot |
Should a plot be produced? Defaults to |
xlab |
Legend for the x-axis. |
ylab |
Legend for y-axis. |
xlim |
Range of x-axis. Default is a plot with equal ranges on the x- and y-axis. |
ylim |
Range of y-axis. |
... |
Other arguments to be passed on to |
A data.frame
containg quantiles q
corresponding to in the specified
distribution, together with other quantities computed by
predict.DNAmixture
. The data are ordered according
to q
.
Therese Graversen
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) qqpeak(mix, pars = p, dist = "conditional") ## If desired, we can make a customised plot -- here we color according to the two mixtures qq <- qqpeak(mix, pars = p, dist = "conditional", plot = FALSE) plot(ppoints(qq$q), qq$q, xlab = "Uniform quantiles", ylab = "Probability transform", col = qq$trace, pch = qq$trace)
data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) p <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08,0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) qqpeak(mix, pars = p, dist = "conditional") ## If desired, we can make a customised plot -- here we color according to the two mixtures qq <- qqpeak(mix, pars = p, dist = "conditional", plot = FALSE) plot(ppoints(qq$q), qq$q, xlab = "Uniform quantiles", ylab = "Probability transform", col = qq$trace, pch = qq$trace)
A set of peak heights is sampled for each mixture in the
model. Note that if the mixtures cover different sets of markers,
so will the simulated peak heights. If there are unknown
contributors, their genotypes are sampled from the networks in
mixture$domains
, but these genotypes are not stored.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
rPeakHeight( mixture, pars, nsim = 1, markers = mixture$markers, dist = c("joint", "conditional"), use.threshold = TRUE, initialize = TRUE, ... )
rPeakHeight( mixture, pars, nsim = 1, markers = mixture$markers, dist = c("joint", "conditional"), use.threshold = TRUE, initialize = TRUE, ... )
mixture |
A |
pars |
A |
nsim |
Number of simulations for each allele. |
markers |
Subset of markers to simulate |
dist |
How should observed peak heights be taken into account? Possible choices are
|
use.threshold |
Should peak heights under the detection
threshold C be set to 0? Defaults to |
initialize |
Should all propagated evidence be removed from
the network? Defaults to |
... |
Not used. |
A list of one three-way array per marker; the three margins correspond to traces, alleles, and simulations.
Therese Graversen
data(MC15, MC18, USCaucasian) mixP <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) pP <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) rpk <- rPeakHeight(mixP, pP, nsim = 5) mixD <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) pD <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) rpj <- rPeakHeight(mixD, pD, nsim = 5, dist = "joint") rpc <- rPeakHeight(mixD, pD, nsim = 5, dist = "conditional")
data(MC15, MC18, USCaucasian) mixP <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3", "K2"), database = USCaucasian) pP <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(K2 = 0.1, K3 = 0.2, K1 = 0.7), c(K2 = 0.1, K3 = 0.2, K1 = 0.7))) rpk <- rPeakHeight(mixP, pP, nsim = 5) mixD <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K3"), database = USCaucasian) pD <- mixpar(rho = list(30, 30), eta = list(30, 30), xi = list(0.08, 0.08), phi = list(c(U1 = 0.1, K3 = 0.2, K1 = 0.7), c(U1 = 0.1, K3 = 0.2, K1 = 0.7))) rpj <- rPeakHeight(mixD, pD, nsim = 5, dist = "joint") rpc <- rPeakHeight(mixD, pD, nsim = 5, dist = "conditional")
A wrapper-function for the case where the conditional probability tables are to be set in a DNA mixture model for all the auxiliary variables (O, D and Q, at all markers and all alleles).
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
setCPT(mixture, pars, markers = mixture$markers)
setCPT(mixture, pars, markers = mixture$markers)
mixture |
A DNA mixture |
pars |
A list of parameters |
markers |
Optionally, a list of markers for which to set the tables |
The conditional probability tables are set according to a list of specified parameters and the observed peak heights. The function returns evidence for future use on the nodes O.
List containing evidence for conditioning on peak heights.
For further details, see in particular setCPT.O
.
D
Function for setting the tables on all nodes D_r_a
for
mixture r
in the network corresponding to one marker, using
peak heights and a set of parameters.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
setCPT.D( domain, rho, xi, eta, phi, C, n.unknown, n_K, D, gets_stutter, can_stutter, stutter.from )
setCPT.D( domain, rho, xi, eta, phi, C, n.unknown, n_K, D, gets_stutter, can_stutter, stutter.from )
domain |
A |
rho |
rho |
xi |
xi |
eta |
eta |
phi |
phi |
C |
Detection threshold |
n.unknown |
Number of unknown contributors |
n_K |
Allelecounts for known contributors |
D |
Names for binary nodes |
gets_stutter |
Does the allele receive stutter? |
can_stutter |
Can the allele stutter? |
stutter.from |
From which allele does the stutter come? |
The function sets conditional probabilities for auxiliary variables D_r_a
,
which are used for obtaining probabilities of a peak
falling above resp. below the detection threshold under the current
model.
The conditional probability tables are defined using the gamma c.d.f. as
invisibly NULL
For further details see setCPT.O
.
O
.Function for setting the tables on all nodes O_r_a
for
mixture r
in the network corresponding to one marker, using
peak heights and a set of parameters. The returned evidence can be
used for conditioning on observed peak heights.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
setCPT.O( domain, rho, xi, eta, phi, heights, C, n.unknown, n_K, O, gets_stutter, can_stutter, stutter.from )
setCPT.O( domain, rho, xi, eta, phi, heights, C, n.unknown, n_K, O, gets_stutter, can_stutter, stutter.from )
domain |
A |
rho |
rho |
xi |
xi |
eta |
eta |
phi |
phi |
heights |
Peak heights |
C |
Detection threshold |
n.unknown |
Number of unknown contributors |
n_K |
Allelecounts for known contributors |
O |
Vector of names for binary nodes |
gets_stutter |
Does the allele receive stutter? A boolean vector. |
can_stutter |
Can the allele stutter? |
stutter.from |
From which allele does the stutter come? |
The function sets probabilities in CPT according to
whether a peak is observed or not. If a peak is seen at allele
, the conditional distribution is defined using the
p.d.f.
for the gamma distribution presented in
DNAmixtures
as
and likelihood-evidence for the states (
FALSE
, TRUE
) is returned.
If the allele is not seen, the CPT is set using the c.d.f. for the gamma distribution as
in which case likelihood-evidence (1,0) is returned for this allele.
A matrix with each column containing the evidence to be set on O[a]
when conditioning on the peak heights.
For the sake of speed and saving
memory, only probabilities for the CPT of O[a]
are
calculated, and not the entire table. The parent nodes are
n_i_a
and possibly n_i_(a+1)
for all unknown
contributors i; the for known contributors are
specified through the argument
n_K
. The layout of the
table relies in particular on the fact that all nodes have a
statespace 0,1,2, which this is *not* checked anywhere. Thus, if
other types of allele-count-nodes n_i_a
are introduced
(such as could be relevant for Amelogenin), the function should be
changed accordingly.
If invalid tables are set then any subsequent propagation will fail. No roll-back functionality has so far been implemented to fix this, and the easiest solution is to re-fit the mixture model.
Note that the detection threshold and model parameters can be
specified as vectors, and so it is in theory possible to use allele-specific
values, if desired. This also holds for the functions
setCPT.D
and setCPT.Q
.
Q
Function for setting the tables on all nodes Q_r_a
for
mixture r
in the network corresponding to one marker, using
peak heights and a set of parameters.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
setCPT.Q( domain, rho, xi, eta, phi, heights, n.unknown, n_K, Q, gets_stutter, can_stutter, stutter.from )
setCPT.Q( domain, rho, xi, eta, phi, heights, n.unknown, n_K, Q, gets_stutter, can_stutter, stutter.from )
domain |
A |
rho |
rho |
xi |
xi |
eta |
eta |
phi |
phi ordered as |
heights |
Observed peak heights |
n.unknown |
Number of unknown contributors |
n_K |
Allelecounts for known contributors |
Q |
Names for binary nodes |
gets_stutter |
Does the allele receive stutter? |
can_stutter |
Can the allele stutter? |
stutter.from |
From which allele does the stutter come? |
The function sets conditional probabilities for auxiliary variables Q_r_a
,
which are used for finding probabilities of observing a smaller or
larger peak than the one observed for the DNA mixture.
The conditional probability tables are defined using the gamma c.d.f. as
Scaling factors to be set as evidence when conditioning on the peak heights
Therese Graversen
For further details see setCPT.O
.
The function setPeakInfo
is used for including in the model
either the full peak height information or only information about
presence and absence of peaks. After a call to setPeakInfo
,
the Bayesian networks in mixture$domains
will represent
the conditional distribution given the specified peak
information. For the reverse functionality, i.e. exclusion of any
such peak information, use removePeakInfo
.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
setPeakInfo(mixture, pars, presence.only = FALSE) removePeakInfo(mixture)
setPeakInfo(mixture, pars, presence.only = FALSE) removePeakInfo(mixture)
mixture |
|
pars |
A |
presence.only |
Default is FALSE, which means that the full peak height information is taken into consideration. Set to TRUE, which will include only information on the presence and absence of peaks. |
The function setPeakInfo
sets conditional
probability tables using the specified model parameters, and
propagates suitable evidence using either observed peak heights or
the discrete presence/absence observations of peaks. Any
previously entered or propagated evidence on nodes O
and
D
will be retracted in this process.
The function removePeakInfo
retracts all evidence on nodes
O
and D
; the conditional probability tables are left
unchanged.
invisibly NULL
setCPT
. For use of the Bayesian networks, see map.genotypes
.
Dyes used for the AmpFlSTR SGM Plus PCR Amplification Kit
A list with one item per dye, each containing a vector of marker-names specifying the order in which they occur in an EPG.
Applied Biosystems
The maximum posterior configurations of genotypes as returned by
map.genotypes
, but in the format of pairs of alleles
rather than raw allelecounts. When a subset of alleles are
considered, then the value NA
is used to denote an allele outside
this subset; in particular, if type="seen"
is used in
map.genotypes
, then NA
corresponds to the
allele having dropped out in all of the mixtures included in the
model.
## S3 method for class 'map.genotypes' summary(object, ...) ## S3 method for class 'summary.map.genotypes' print(x, markers = names(x), ...)
## S3 method for class 'map.genotypes' summary(object, ...) ## S3 method for class 'summary.map.genotypes' print(x, markers = names(x), ...)
object |
An object returned by |
... |
arguments passed on to other methods. |
x |
An object of class |
markers |
Optionally, a subset of markers to print |
A data.frame
with two columns per unknown
contributor under consideration and a column Prob
containing the probability of the configuration.
Therese Graversen
Database of allele frequencies for UK Caucasians.
A data.frame
The frequencies are produced from the allele counts for one (the UK Caucasian) of three British sub-populations as found on David Balding's homepage under his likeLTD software for mixture analysis in R.
https://sites.google.com/site/baldingstatisticalgenetics/software/likeltd-r-forensic-dna-r-code
Allele frequencies for US Caucasians.
A data.frame
.
There are a few errata found after publication of the allele frequencies, but we have not updated the allele frequencies accordingly. The frequencies also did not add up to 1, and this is simply corrected by normalising within each marker.
http://www.cstl.nist.gov/strbase/NISTpop.htm
Provided that the user specifies the MLE as well as the constraints used in the maximisation, this function computes an estimate of the variance of the MLE based on the observed information. The observed information is obtained by numerically deriving the Hessian of the log-likelihood function.
IMPORTANT: This is the DNAmixturesLite package, which is intended as a service to enable users to try DNAmixtures without purchasing a commercial licence for Hugin. When at all possible, we strongly recommend the use of DNAmixtures rather than this lite-version. See https://dnamixtures.r-forge.r-project.org/ for details on both packages.
While the lite-version seeks to provide the full functionality of DNAmixtures, note that computations are much less efficient and that there are some differences in available functionality. Be aware that the present documentation is copied from DNAmixtures and thus may not accurately describe the implementation of this lite-version.
varEst(mixture, mle, npars, method = "Richardson", ...) ## S3 method for class 'mixVarEst' summary(object, transform = FALSE, ...) ## S3 method for class 'summary.mixVarEst' print( x, digits = max(3, getOption("digits") - 3), scientific = FALSE, print.gap = 3L, ... )
varEst(mixture, mle, npars, method = "Richardson", ...) ## S3 method for class 'mixVarEst' summary(object, transform = FALSE, ...) ## S3 method for class 'summary.mixVarEst' print( x, digits = max(3, getOption("digits") - 3), scientific = FALSE, print.gap = 3L, ... )
mixture |
A |
mle |
A |
npars |
A list of integers specifying the number of each of the four
parameters
|
method |
Method for numeric differentiation used in |
... |
Arguments to be passed on to other methods. |
object |
An object of class |
transform |
Should the parameterisation |
x |
An object of class |
digits |
Number of significant digits to print |
scientific |
Should scientific notation be used? |
print.gap |
Distance between columns in the printing of the summary. |
As the user can apply highly customized constraints to the model
parameters when maximising with mixML
, it is a
complicated matter to write a generic function for computing the
asymptotic variance matrix. We have thus restricted attention to
the case where each of the (multi-dimensional) parameters rho
,
eta
, xi
and phi
can be either
fixed at known values
unknown, but common across traces
unconstrained
The mle and the estimated covariance matrix in different parametrisations
cov and mle |
|
cov.trans and mle.trans |
|
cov.res |
A non-singular covariance matrix for a
reparametrisation of |
An integer suffix is used to indicate which mixture the parameter
is associated with. In the restricted covariance matrix, all fixed
parameters are left out. If parameters are equal accross mixtures,
the suffix for this parameter will be .1
. If parameters are
unconstrained and there are N
mixtures in the model,
suffixes are .1, ..., .N
data(MC18, USCaucasian) mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) mlHp <- mixML(mixHp, pars = p) ## Find the estimated covariance matrix of the MLE V.Hp <- varEst(mixHp, mlHp$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) V.Hp$cov ## using (rho, eta) V.Hp$cov.trans ## using (mu, sigma) ## The summary is a table containing the MLE and their standard errors summary(V.Hp) data(MC18, USCaucasian) mixmult <- DNAmixture(list(MC18), C = list(50), k = 3, K = c("K1", "K2"), database = USCaucasian) startpar <- mixpar(rho = list(30), eta = list(28), xi = list(0.08), phi = list(c(U1 = 0.2, K1 = 0.7, K2 = 0.1))) ml.mult <- mixML(mixmult, startpar) Vmult <- varEst(mixmult, ml.mult$mle, list(rho=1,eta=1,xi=1,phi=1)) summary(Vmult) ## Be aware that the following two advanced examples are computationally demanding and ## typically have a runtime of several minutes with the lite-version of DNAmixtures. data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K2"), database = USCaucasian) startpar <- mixpar(rho = list(30, 30), eta = list(28, 35), xi = list(0.08, 0.1), phi = list(c(U1 = 0.05, K1 = 0.7, K2 = 0.25), c(K1 = 0.7, K2 = 0.1, U1 = 0.2))) eqxis <- function(x){ diff(unlist(x[,"xi"])) } ## Here we set stutter equal for all traces ml.diff <- mixML(mix, startpar, eqxis, val = 0, phi.eq = FALSE) V.diff <- varEst(mix, ml.diff$mle, list(rho=2,eta=2,xi=1,phi=2)) summary(V.diff) ## Fixing stutter to 0.07 xival <- function(x){unlist(x[,"xi"])} ml.eq <- mixML(mix, startpar, xival, val = c(0.07, 0.07), phi.eq = FALSE) V.eq <- varEst(mix, ml.eq$mle, list(rho=2,eta=2,xi=0,phi=2)) summary(V.eq)
data(MC18, USCaucasian) mixHp <- DNAmixture(list(MC18), k = 3, K = c("K1", "K2", "K3"), C = list(50), database = USCaucasian) p <- mixpar(rho = list(30), eta = list(34), xi = list(0.08), phi = list(c(K1 = 0.71, K3 = 0.1, K2 = 0.19))) mlHp <- mixML(mixHp, pars = p) ## Find the estimated covariance matrix of the MLE V.Hp <- varEst(mixHp, mlHp$mle, npars = list(rho=1,eta=1,xi=1,phi=1)) V.Hp$cov ## using (rho, eta) V.Hp$cov.trans ## using (mu, sigma) ## The summary is a table containing the MLE and their standard errors summary(V.Hp) data(MC18, USCaucasian) mixmult <- DNAmixture(list(MC18), C = list(50), k = 3, K = c("K1", "K2"), database = USCaucasian) startpar <- mixpar(rho = list(30), eta = list(28), xi = list(0.08), phi = list(c(U1 = 0.2, K1 = 0.7, K2 = 0.1))) ml.mult <- mixML(mixmult, startpar) Vmult <- varEst(mixmult, ml.mult$mle, list(rho=1,eta=1,xi=1,phi=1)) summary(Vmult) ## Be aware that the following two advanced examples are computationally demanding and ## typically have a runtime of several minutes with the lite-version of DNAmixtures. data(MC15, MC18, USCaucasian) mix <- DNAmixture(list(MC15, MC18), C = list(50, 38), k = 3, K = c("K1", "K2"), database = USCaucasian) startpar <- mixpar(rho = list(30, 30), eta = list(28, 35), xi = list(0.08, 0.1), phi = list(c(U1 = 0.05, K1 = 0.7, K2 = 0.25), c(K1 = 0.7, K2 = 0.1, U1 = 0.2))) eqxis <- function(x){ diff(unlist(x[,"xi"])) } ## Here we set stutter equal for all traces ml.diff <- mixML(mix, startpar, eqxis, val = 0, phi.eq = FALSE) V.diff <- varEst(mix, ml.diff$mle, list(rho=2,eta=2,xi=1,phi=2)) summary(V.diff) ## Fixing stutter to 0.07 xival <- function(x){unlist(x[,"xi"])} ml.eq <- mixML(mix, startpar, xival, val = c(0.07, 0.07), phi.eq = FALSE) V.eq <- varEst(mix, ml.eq$mle, list(rho=2,eta=2,xi=0,phi=2)) summary(V.eq)