Title: | Comparative 'Phylogenetic' Analyses of Diversification |
---|---|
Description: | Contains a number of comparative 'phylogenetic' methods, mostly focusing on analysing diversification and character evolution. Contains implementations of 'BiSSE' (Binary State 'Speciation' and Extinction) and its unresolved tree extensions, 'MuSSE' (Multiple State 'Speciation' and Extinction), 'QuaSSE', 'GeoSSE', and 'BiSSE-ness' Other included methods include Markov models of discrete and continuous trait evolution and constant rate 'speciation' and extinction. |
Authors: | Richard G. FitzJohn [aut, cre], Emma Goldberg [aut], Karen Magnuson-Ford [aut], Roger Sidje [aut] |
Maintainer: | Richard G. FitzJohn <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.10-1 |
Built: | 2024-11-02 06:43:16 UTC |
Source: | CRAN |
Contains a number of comparative 'phylogenetic' methods, mostly focusing on analysing diversification and character evolution. Contains implementations of 'BiSSE' (Binary State 'Speciation' and Extinction) and its unresolved tree extensions, 'MuSSE' (Multiple State 'Speciation' and Extinction), 'QuaSSE', 'GeoSSE', and 'BiSSE-ness' Other included methods include Markov models of discrete and continuous trait evolution and constant rate 'speciation' and extinction.
Richard G. FitzJohn [aut, cre], Emma Goldberg [aut], Karen Magnuson-Ford [aut], Roger Sidje [aut]
Maintainer: Richard G. FitzJohn <[email protected]>
Diversitree contains methods described in the following papers (all of which aside from Maddison et al. 2007 were originally published as a diversitree implementation).
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. systematic biology 58:595-611. Systematic Biology 58:595-611.
FitzJohn R.G. 2010. Quantitative traits and diversification. Systematic Biology 59:619-633.
Goldberg E.E., Lancaster L.T., Ree R.H. 2011. Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. Systematic Biology 60: 451-465.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Systematic Biology 56: 701-710.
Magnuson-Ford K. and Otto S.P. 2012. Linking the investigations of character evolution and species diversification. The American Naturalist 180: 225-245.
Functions to get and set “argument names” for
functions that take vectorised arguments. For example, the likelihood
function returned by make.bisse
takes a vector of six these
functions can be used to get the canonical names for these six
parameters, and also to set them to something more memorable. These
names are used by the constrain
function to specify
submodels.
argnames(x, ...) argnames(x) <- value ## S3 method for class 'constrained' argnames(x, ...) ## S3 replacement method for class 'constrained' argnames(x) <- value
argnames(x, ...) argnames(x) <- value ## S3 method for class 'constrained' argnames(x, ...) ## S3 replacement method for class 'constrained' argnames(x) <- value
x |
A function taking a vector of parameters as its first argument. |
value |
Vector of names to set the argument names to. |
... |
Ignored arguments to future methods. |
Methods exist for all models: bisse
, geosse
, bd
,
yule
, mk2
, and mkn
. These are particulary useful
for mkn
as the number of parameters for the Q matrix can be
very large.
Richard G. FitzJohn
## Same example likelihood function as for \link{make.bisse}: pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) f <- make.bisse(phy, phy$tip.state) argnames(f) # Canonical argument names (set by default) ## Names that might be more informative for a tall/short state argnames(f) <- c("l.tall", "l.short", "m.tall", "m.short", "q.tall.short", "q.short.tall") argnames(f)
## Same example likelihood function as for \link{make.bisse}: pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) f <- make.bisse(phy, phy$tip.state) argnames(f) # Canonical argument names (set by default) ## Names that might be more informative for a tall/short state argnames(f) <- c("l.tall", "l.short", "m.tall", "m.short", "q.tall.short", "q.short.tall") argnames(f)
Perform ancestral state reconstruction. These functions are all generic and will dispatch on the class of the given likelihood functions. Currently methods exist for all generics for Mk2, and marginal ancestral state reconstructions are supported for BiSSE.
asr.marginal(lik, pars, nodes=NULL, ...) asr.joint(lik, pars, n=1, ...) asr.stoch(lik, pars, n=1, ...) make.asr.marginal(lik, ...) make.asr.joint(lik, ...) make.asr.stoch(lik, ...)
asr.marginal(lik, pars, nodes=NULL, ...) asr.joint(lik, pars, n=1, ...) asr.stoch(lik, pars, n=1, ...) make.asr.marginal(lik, ...) make.asr.joint(lik, ...) make.asr.stoch(lik, ...)
lik |
A likelihood function. |
pars |
A vector of parameters, suitable for |
nodes |
For |
n |
The number of samples to draw from the joint distribution, or number of stochastic reconstructions to make. |
... |
Additional arguments passed through to future methods |
These three functions compute marginal, joint, and stochastic
ancestral reconstructions. The make
versions return functions
that can efficiently be used many times over.
The return values of the functions are likely to change in the near future. Watch out!
Richard G. FitzJohn
asr.mkn and asr.bisse for methods specific to particular classes, with examples of use.
Perform ancestral state reconstruction under BiSSE and
other constant rate Markov models. Marginal reconstructions are
supported (c.f. asr
). Documentation is still in an
early stage, and mostly in terms of examples.
## S3 method for class 'bisse' make.asr.marginal(lik, ...) ## S3 method for class 'musse' make.asr.marginal(lik, ...)
## S3 method for class 'bisse' make.asr.marginal(lik, ...) ## S3 method for class 'musse' make.asr.marginal(lik, ...)
lik |
A likelihood function, returned by |
... |
Additional arguments passed through to future methods. Currently unused. |
Richard G. FitzJohn
## Start with a simple tree evolved under a BiSSE with all rates ## asymmetric: pars <- c(.1, .2, .03, .06, .01, .02) set.seed(3) phy <- trees(pars, "bisse", max.taxa=50, max.t=Inf, x0=0)[[1]] ## Here is the true history h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, main="True history") ## Not run: ## BiSSE ancestral state reconstructions under the ML model lik <- make.bisse(phy, phy$tip.state) fit <- find.mle(lik, pars, method="subplex") st <- asr.marginal(lik, coef(fit)) nodelabels(thermo=t(st), piecol=1:2, cex=.5) ## Mk2 ancestral state reconstructions, ignoring the shifts in ## diversification rates: lik.m <- make.mk2(phy, phy$tip.state) fit.m <- find.mle(lik.m, pars[5:6], method="subplex") st.m <- asr.marginal(lik.m, coef(fit.m)) ## The Mk2 results have more uncertainty at the root, but both are ## similar. nodelabels(thermo=t(st.m), piecol=1:2, cex=.5, adj=-.5) ## (This section will take 10 or so minutes to run.) ## Try integrating over parameter uncertainty and comparing the BiSSE ## with Mk2 output: prior <- make.prior.exponential(2) samples <- mcmc(lik, coef(fit), 1000, w=1, prior=prior, print.every=100) st.b <- apply(samples[2:7], 1, function(x) asr.marginal(lik, x)[2,]) st.b.avg <- rowMeans(st.b) samples.m <- mcmc(lik.m, coef(fit.m), 1000, w=1, prior=prior, print.every=100) st.m <- apply(samples.m[2:3], 1, function(x) asr.marginal(lik.m, x)[2,]) st.m.avg <- rowMeans(st.m) ## These end up being more striking in their similarity than their ## differences, except for the root node, where BiSSE remains more sure ## that is in state 0 (there is about 0.05 red there). plot(h, phy, main="Marginal ASR, BiSSE (left), Mk2 (right)", show.node.state=FALSE) nodelabels(thermo=1-st.b.avg, piecol=1:2, cex=.5) nodelabels(thermo=1-st.m.avg, piecol=1:2, cex=.5, adj=-.5) ## Equivalency of Mk2 and BiSSE where diversification is state ## independent. For any values of lambda/mu (here .1 and .03) where ## these do not vary across character states, these two methods will ## give essentially identical marginal ancestral state reconstructions. st.id <- asr.marginal(lik, c(.1, .1, .03, .03, coef(fit.m))) st.id.m <- asr.marginal(lik.m, coef(fit.m)) ## Reconstructions are identical to a relative tolerance of 1e-7 ## (0.0000001), which is similar to the expected tolerance of the BiSSE ## calculations. all.equal(st.id, st.id.m, tolerance=1e-7) ## Equivalency of BiSSE and MuSSE reconstructions for two states: lik.b <- make.bisse(phy, phy$tip.state) lik.m <- make.musse(phy, phy$tip.state + 1, 2) st.b <- asr.marginal(lik.b, coef(fit)) st.m <- asr.marginal(lik.m, coef(fit)) all.equal(st.b, st.m) ## End(Not run)
## Start with a simple tree evolved under a BiSSE with all rates ## asymmetric: pars <- c(.1, .2, .03, .06, .01, .02) set.seed(3) phy <- trees(pars, "bisse", max.taxa=50, max.t=Inf, x0=0)[[1]] ## Here is the true history h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, main="True history") ## Not run: ## BiSSE ancestral state reconstructions under the ML model lik <- make.bisse(phy, phy$tip.state) fit <- find.mle(lik, pars, method="subplex") st <- asr.marginal(lik, coef(fit)) nodelabels(thermo=t(st), piecol=1:2, cex=.5) ## Mk2 ancestral state reconstructions, ignoring the shifts in ## diversification rates: lik.m <- make.mk2(phy, phy$tip.state) fit.m <- find.mle(lik.m, pars[5:6], method="subplex") st.m <- asr.marginal(lik.m, coef(fit.m)) ## The Mk2 results have more uncertainty at the root, but both are ## similar. nodelabels(thermo=t(st.m), piecol=1:2, cex=.5, adj=-.5) ## (This section will take 10 or so minutes to run.) ## Try integrating over parameter uncertainty and comparing the BiSSE ## with Mk2 output: prior <- make.prior.exponential(2) samples <- mcmc(lik, coef(fit), 1000, w=1, prior=prior, print.every=100) st.b <- apply(samples[2:7], 1, function(x) asr.marginal(lik, x)[2,]) st.b.avg <- rowMeans(st.b) samples.m <- mcmc(lik.m, coef(fit.m), 1000, w=1, prior=prior, print.every=100) st.m <- apply(samples.m[2:3], 1, function(x) asr.marginal(lik.m, x)[2,]) st.m.avg <- rowMeans(st.m) ## These end up being more striking in their similarity than their ## differences, except for the root node, where BiSSE remains more sure ## that is in state 0 (there is about 0.05 red there). plot(h, phy, main="Marginal ASR, BiSSE (left), Mk2 (right)", show.node.state=FALSE) nodelabels(thermo=1-st.b.avg, piecol=1:2, cex=.5) nodelabels(thermo=1-st.m.avg, piecol=1:2, cex=.5, adj=-.5) ## Equivalency of Mk2 and BiSSE where diversification is state ## independent. For any values of lambda/mu (here .1 and .03) where ## these do not vary across character states, these two methods will ## give essentially identical marginal ancestral state reconstructions. st.id <- asr.marginal(lik, c(.1, .1, .03, .03, coef(fit.m))) st.id.m <- asr.marginal(lik.m, coef(fit.m)) ## Reconstructions are identical to a relative tolerance of 1e-7 ## (0.0000001), which is similar to the expected tolerance of the BiSSE ## calculations. all.equal(st.id, st.id.m, tolerance=1e-7) ## Equivalency of BiSSE and MuSSE reconstructions for two states: lik.b <- make.bisse(phy, phy$tip.state) lik.m <- make.musse(phy, phy$tip.state + 1, 2) st.b <- asr.marginal(lik.b, coef(fit)) st.m <- asr.marginal(lik.m, coef(fit)) all.equal(st.b, st.m) ## End(Not run)
Perform ancestral state reconstruction under Mk2 and other constant rate Markov models. Marginal, joint, and stochastic reconstructions are supported. Documentation is still in an early stage, and mostly in terms of examples.
## S3 method for class 'mkn' make.asr.marginal(lik, ...) ## S3 method for class 'mkn' make.asr.joint(lik, ...) ## S3 method for class 'mkn' make.asr.stoch(lik, slim=FALSE, ...)
## S3 method for class 'mkn' make.asr.marginal(lik, ...) ## S3 method for class 'mkn' make.asr.joint(lik, ...) ## S3 method for class 'mkn' make.asr.stoch(lik, slim=FALSE, ...)
lik |
A likelihood function, returned by |
slim |
Should the history object be slimmed down? |
... |
Additional arguments; currently ignored. |
Output will differ slightly when mk2
and mkn
models are
used as lik
, as mk2
uses states 0/1, while 2-state
mkn
uses 1/2.
This is all quite slow. Faster versions are coming eventually.
These functions all return functions that generate different types of ancestral reconstruction.
Richard G. FitzJohn
## Start with a simple tree evolved under a constant rates birth-death ## model with asymetric character evolution pars <- c(.1, .1, .03, .03, .03, .06) set.seed(1) phy <- trees(pars, "bisse", max.taxa=50, max.t=Inf, x0=0)[[1]] ## Here is the true history. The root node appears to be state 1 (red) ## at the root, despite specifying a root of state 0 (x0=0, in statement ## above). This is because the tree started with a single lineage, but ## had changed state by the time the first speciation event happened. h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, main="True history") ## All of the methods need a likelihood function; build a mk2 function: lik <- make.mk2(phy, phy$tip.state) ## Using the true parameters, compute the marginal ancestral state ## reconstructions: st.m <- asr.marginal(lik, pars[5:6]) ## There is still not a good stand-alone plotting command for nodes. ## For now, use ape's nodelabels(). plot(h, phy, main="Marginal ASR", show.node.state=FALSE) nodelabels(thermo=t(st.m), piecol=1:2, cex=.5) ## Again, with the true parameters, a sample from the joint ## distribution: st.j <- asr.joint(lik, pars[5:6]) ## Plotting this sample against the true values. plot(h, phy, main="Joint ASR", show.node.state=FALSE) nodelabels(pch=19, col=st.j + 1) ## This is just one sample, and is not very accurate in this case! Make ## 1,000 such samples and average them: st.j2 <- asr.joint(lik, pars[5:6], 1000) st.j2.mean <- colMeans(st.j2) plot(h, phy, main="Joint ASR (averaged)", show.node.state=FALSE) nodelabels(thermo=1-st.j2.mean, piecol=1:2, cex=.5) ## Check the estimates against one another: plot(st.m[2,], st.j2.mean, xlab="Marginal", ylab="Joint", las=1) abline(0, 1) ## Finally, the stochastic character mapping. This uses samples from ## the joint distribution at its core. st.s <- asr.stoch(lik, pars[5:6]) plot(st.s, phy) ## Again, multiple samples can be done at once. There is a function for ## summarising histories, but it is still in the works. ## Repeating the above with a two-state mkn model: lik2 <- make.mkn(phy, phy$tip.state + 1, 2, FALSE) ## Everything works: st2.m <- asr.marginal(lik2, pars[5:6]) st2.j <- asr.joint(lik2, pars[5:6], 100) st2.s <- asr.stoch(lik2, pars[5:6]) ## Marginal likelihoods agree: all.equal(st.m, st2.m) ## Joint reconstructions are stochastic, so just check with a ## regression: summary(lm(colMeans(st2.j) - 1 ~ colMeans(st.j2) - 1)) ## Integrate parameter uncertainty, and see how far down the tree there ## is any real information on parameter states for this tree (this takes ## about 6s) ## Not run: set.seed(1) prior <- make.prior.exponential(.5) samples <- mcmc(lik, pars[5:6], 1000, w=1, prior=prior, print.every=100) st.m.avg <- rowMeans(apply(samples[2:3], 1, asr.joint, lik=lik)) plot(h, phy, main="MCMC Averaged ASR", show.node.state=FALSE) nodelabels(thermo=1 - st.m.avg, piecol=1:2, cex=.5) ## End(Not run)
## Start with a simple tree evolved under a constant rates birth-death ## model with asymetric character evolution pars <- c(.1, .1, .03, .03, .03, .06) set.seed(1) phy <- trees(pars, "bisse", max.taxa=50, max.t=Inf, x0=0)[[1]] ## Here is the true history. The root node appears to be state 1 (red) ## at the root, despite specifying a root of state 0 (x0=0, in statement ## above). This is because the tree started with a single lineage, but ## had changed state by the time the first speciation event happened. h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, main="True history") ## All of the methods need a likelihood function; build a mk2 function: lik <- make.mk2(phy, phy$tip.state) ## Using the true parameters, compute the marginal ancestral state ## reconstructions: st.m <- asr.marginal(lik, pars[5:6]) ## There is still not a good stand-alone plotting command for nodes. ## For now, use ape's nodelabels(). plot(h, phy, main="Marginal ASR", show.node.state=FALSE) nodelabels(thermo=t(st.m), piecol=1:2, cex=.5) ## Again, with the true parameters, a sample from the joint ## distribution: st.j <- asr.joint(lik, pars[5:6]) ## Plotting this sample against the true values. plot(h, phy, main="Joint ASR", show.node.state=FALSE) nodelabels(pch=19, col=st.j + 1) ## This is just one sample, and is not very accurate in this case! Make ## 1,000 such samples and average them: st.j2 <- asr.joint(lik, pars[5:6], 1000) st.j2.mean <- colMeans(st.j2) plot(h, phy, main="Joint ASR (averaged)", show.node.state=FALSE) nodelabels(thermo=1-st.j2.mean, piecol=1:2, cex=.5) ## Check the estimates against one another: plot(st.m[2,], st.j2.mean, xlab="Marginal", ylab="Joint", las=1) abline(0, 1) ## Finally, the stochastic character mapping. This uses samples from ## the joint distribution at its core. st.s <- asr.stoch(lik, pars[5:6]) plot(st.s, phy) ## Again, multiple samples can be done at once. There is a function for ## summarising histories, but it is still in the works. ## Repeating the above with a two-state mkn model: lik2 <- make.mkn(phy, phy$tip.state + 1, 2, FALSE) ## Everything works: st2.m <- asr.marginal(lik2, pars[5:6]) st2.j <- asr.joint(lik2, pars[5:6], 100) st2.s <- asr.stoch(lik2, pars[5:6]) ## Marginal likelihoods agree: all.equal(st.m, st2.m) ## Joint reconstructions are stochastic, so just check with a ## regression: summary(lm(colMeans(st2.j) - 1 ~ colMeans(st.j2) - 1)) ## Integrate parameter uncertainty, and see how far down the tree there ## is any real information on parameter states for this tree (this takes ## about 6s) ## Not run: set.seed(1) prior <- make.prior.exponential(.5) samples <- mcmc(lik, pars[5:6], 1000, w=1, prior=prior, print.every=100) st.m.avg <- rowMeans(apply(samples[2:3], 1, asr.joint, lik=lik)) plot(h, phy, main="MCMC Averaged ASR", show.node.state=FALSE) nodelabels(thermo=1 - st.m.avg, piecol=1:2, cex=.5) ## End(Not run)
These check to see if FFTW support was included in diversitree. They rarely need to be called directly.
check.fftC(error=TRUE)
check.fftC(error=TRUE)
error |
Logical: causes an error if FFTW is not
available if |
Richard G. FitzJohn
Combine several likelihood functions, so that the new functions gives the product of all likelihoods (the sum of the log likelihoods). This assumes that all likelihoods are independent from one another!
This function is little tested. Use at your own risk!
combine(liks)
combine(liks)
liks |
A list of likelihood functions. All must be of the same type, with the same argnames, and not constrained. |
Richard G. FitzJohn
Constrain a model to make submodels with fewer parameters.
If f
is a function that takes a vector x
as its first
argument, this function returns a new function that takes a
shorter vector x
with some elements constrained in some way;
parameters can be fixed to particular values, constrained to be the
same as other parameters, or arbitrary expressions of free
parameters.
constrain(f, ..., formulae=NULL, names=argnames(f), extra=NULL) constrain.i(f, p, i.free)
constrain(f, ..., formulae=NULL, names=argnames(f), extra=NULL) constrain.i(f, p, i.free)
f |
A function to constrain. |
... |
Formulae indicating how the function should be constrained (see Details and Examples). |
formulae |
Optional list of constraints, possibly in addition to
those in |
names |
Optional Character vector of names, the same length as
the number of parameters in |
extra |
Optional vector of additional names that might appear on
the RHS of constraints but do not represent names in the function's
|
p |
A parameter vector (for |
i.free |
Indices of the parameters that are not
constrained. Other indices will get the value in |
The relationships are specified in the form target ~ rel
, where
target
is the name of a vector to be constrained, and
rel
is some relationship. For example lambda0 ~ lambda1
would have the effect of making the parameters lambda0
and
lambda1
take the same value.
The rel
term can be a constant (e.g., target ~ 0
),
another parameter (as above) or some expression of the parameters
(e.g., lambda0 ~ 2 * lambda1
or
lambda0 ~ lambda1 - mu1
).
Terms that appear on the right hand side of an expression may not be constrained in another expression, and no term may be constrained twice.
This function returns a constrained function that can be passed
through to find.mle
and mcmc
. It will behave
like any other function. However, it has a modified class
attribute so that some methods will dispatch differently
(argnames
, for example). All arguments in addition to x
will be passed through to the original function f
.
For help in designing constrained models, the returned function has
an additional argument pars.only
, when this is TRUE
the
function will return a named vector of arguments rather than evaluate
the function (see Examples).
Only a few checks are done to ensure that the resulting function makes any sense; it is possible that I have missed some cases. There is currently no way of modifying constrained functions to remove the constraints. These weaknesses will be addressed in a future version.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Same example likelihood function as for \link{find.mle} - BiSSE on a ## tree with 203 species, generated with an asymmetry in the speciation ## rates. pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) lik <- make.bisse(phy, phy$tip.state) argnames(lik) # Canonical argument names ## Specify equal speciation rates lik.2 <- constrain(lik, lambda0 ~ lambda1) argnames(lik.2) # Note lambda0 now missing ## On constrained functions, use the "pars.only" argument to see what ## the full argument list would be: lik.2(c(.1, pars[3:6]), pars.only=TRUE) ## Check this works: lik(c(.1, .1, pars[3:6])) == lik.2(c(.1, pars[3:6])) ## For optimisation of these functions, see \link{find.mle}, which ## includes an example. ## More complicated; constrain lambda0 to half of lambda1, constrain mu0 ## to be the same mu1, and set q01 equal to zero. lik.3 <- constrain(lik, lambda0 ~ lambda1 / 2, mu0 ~ mu1, q01 ~ 0) argnames(lik.3) # lambda1, mu1, q10 lik(c(.1, .2, .03, .03, 0, .01)) == lik.3(c(.2, .03, .01)) ## Alternatively, coefficients can be specified using a list of ## constraints: cons <- list(lambda1 ~ lambda0, mu1 ~ mu0, q10 ~ q01) constrain(lik, formulae=cons) ## Using the "extra" argument allows recasting things to dummy ## parameters. Here both lambda0 and lambda1 are mapped to the ## parameter "lambda": lik.4 <- constrain(lik, lambda0 ~ lambda, lambda1 ~ lambda, extra="lambda") argnames(lik.4) ## constrain.i can be useful for setting a number of values at once. ## Suppose we wanted to look at the shape of the likelihood surface with ## respect to one parameter around the ML point. For this tree, the ML ## point is approximately: p.ml <- c(0.09934, 0.19606, 0.02382, 0.03208, 0.01005, 0.00982) ## Leaving just lambda1 (which is parameter number 2) free: lik.l1 <- constrain.i(lik, p.ml, 2) ## The function now reports that five of the parameters are constrained, ## with one free (lambda1) lik.l1 ## Likewise: argnames(lik.l1) ## Looking in the neighbourhood of the ML point, the likelihood surface ## is approximately quadratic: pp <- seq(p.ml[2] - .02, p.ml[2] + .02, length.out=15) yy <- sapply(pp, lik.l1) plot(yy ~ pp, type="b", xlab="lambda 1", ylab="Log likelihood") abline(v=p.ml[2], col="red", lty=2) ## pars.only works as above, returning the full parameter vector lik.l1(p.ml[2], pars.only=TRUE) identical(p.ml, lik.l1(p.ml[2], pars.only=TRUE))
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Same example likelihood function as for \link{find.mle} - BiSSE on a ## tree with 203 species, generated with an asymmetry in the speciation ## rates. pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) lik <- make.bisse(phy, phy$tip.state) argnames(lik) # Canonical argument names ## Specify equal speciation rates lik.2 <- constrain(lik, lambda0 ~ lambda1) argnames(lik.2) # Note lambda0 now missing ## On constrained functions, use the "pars.only" argument to see what ## the full argument list would be: lik.2(c(.1, pars[3:6]), pars.only=TRUE) ## Check this works: lik(c(.1, .1, pars[3:6])) == lik.2(c(.1, pars[3:6])) ## For optimisation of these functions, see \link{find.mle}, which ## includes an example. ## More complicated; constrain lambda0 to half of lambda1, constrain mu0 ## to be the same mu1, and set q01 equal to zero. lik.3 <- constrain(lik, lambda0 ~ lambda1 / 2, mu0 ~ mu1, q01 ~ 0) argnames(lik.3) # lambda1, mu1, q10 lik(c(.1, .2, .03, .03, 0, .01)) == lik.3(c(.2, .03, .01)) ## Alternatively, coefficients can be specified using a list of ## constraints: cons <- list(lambda1 ~ lambda0, mu1 ~ mu0, q10 ~ q01) constrain(lik, formulae=cons) ## Using the "extra" argument allows recasting things to dummy ## parameters. Here both lambda0 and lambda1 are mapped to the ## parameter "lambda": lik.4 <- constrain(lik, lambda0 ~ lambda, lambda1 ~ lambda, extra="lambda") argnames(lik.4) ## constrain.i can be useful for setting a number of values at once. ## Suppose we wanted to look at the shape of the likelihood surface with ## respect to one parameter around the ML point. For this tree, the ML ## point is approximately: p.ml <- c(0.09934, 0.19606, 0.02382, 0.03208, 0.01005, 0.00982) ## Leaving just lambda1 (which is parameter number 2) free: lik.l1 <- constrain.i(lik, p.ml, 2) ## The function now reports that five of the parameters are constrained, ## with one free (lambda1) lik.l1 ## Likewise: argnames(lik.l1) ## Looking in the neighbourhood of the ML point, the likelihood surface ## is approximately quadratic: pp <- seq(p.ml[2] - .02, p.ml[2] + .02, length.out=15) yy <- sapply(pp, lik.l1) plot(yy ~ pp, type="b", xlab="lambda 1", ylab="Log likelihood") abline(v=p.ml[2], col="red", lty=2) ## pars.only works as above, returning the full parameter vector lik.l1(p.ml[2], pars.only=TRUE) identical(p.ml, lik.l1(p.ml[2], pars.only=TRUE))
Find the maximum likelihood point of a model by nonlinear
optimisation. find.mle
is generic, and allows different
default behaviour for different likelihood functions.
find.mle(func, x.init, method, ...) ## S3 method for class 'fit.mle' coef(object, full=FALSE, extra=FALSE, ...) ## S3 method for class 'fit.mle' logLik(object, ...) ## S3 method for class 'fit.mle' anova(object, ..., sequential=FALSE)
find.mle(func, x.init, method, ...) ## S3 method for class 'fit.mle' coef(object, full=FALSE, extra=FALSE, ...) ## S3 method for class 'fit.mle' logLik(object, ...) ## S3 method for class 'fit.mle' anova(object, ..., sequential=FALSE)
func |
A likelihood function. This is assumed to return the log likelihood (see Details). The function must take a vector of parameters as the first argument. |
x.init |
Initial starting point for the optimisation. |
method |
Method to use for optimisation. May be one of "optim", "subplex", "nlminb", "nlm" (partial unambigious string is allowed). |
... |
For |
object |
A fitted model, returned by |
full |
When returning the coefficients for a constrained model, should be coefficients for the underlying constrained model be returned? |
extra |
When returning the coefficients for a constrained model, should dummy “extra” parameters be returned as well? |
sequential |
Should |
find.mle
starts a search for the maximum likelihood (ML)
parameters from a starting point x.init
. x.init
should
be the correct length for func
, so that func(x.init)
returns a valid likelihood. However, if func
is a constrained
function (via constrain
) and x.init
is the
correct length for the unconstrained function then an attempt will be
made to guess a valid starting point. This will often do poorly and a
warning will be given.
Different methods will be dispatched for different types of likelihood
functions. Currently all models in diversitree
are supported
(bisse
, geosse
, mk2
, mkn
, bd
, and
yule
). With the exception of the Yule pure-birth process, these
methods just specify different default arguments for the underlying
optimisation routines (the Yule model has an analytical solution, and no
optimisation step is required). Generally, it will not be necessary
to specify the method
argument to find.mle
as a sensible
method is chosen during dispatch.
The ...
argument may contain additional arguments for the
function func
. This includes things like condition.surv
for conditioning on survival in BiSSE, birth-death, and Yule models.
Specify this as
find.mle(lik, x.init, condition.surv=TRUE)
(see the Examples).
Different method
arguments take different arguments passed
through ...
to control their behaviour:
method="optim"
: Uses R's optim
function for the
optimisation. This allows access to a variety of general purpose
optimisation algorithms. The method within optim
can be
chosen via the argument optim.method
, which is set to
"L-BFGS-B" by default (box constrained quasi-Newton optimisation).
This should be suitable for most uses. See the method
argument
of optim
for other possibilities. If "L-BFGS-B"
is used, then upper and lower bounds may be specified by the arguments
lower
and upper
. The argument control
can be
used to specify other control parameters for the algorithms - see
optim
for details. Most of the optim
algorithms
require finite values be returned at every evaluated point. This is
often not possible (extreme values of parameters or particular
combinations may have zero likelihood and therefore -Inf
log-likelihood). To get around this, the argument fail.value
can be used to specify a fallback value. By default this is set to
func(x.init) - 1000
, which should work reasonably well for most
cases.
method="subplex"
: Uses the "subplex" algorithm (a variant of
the downhill simplex/Nelder-Mead algorithm that uses Nelder-Mead on a
sequence of subspaces). This algorithm generally requires more
evaluations than optim
-based optimisation, but does not require
approximation of derivatives and seems to find the global optimum more
reliably (though often less precisely). Additional arguments are
control
to control aspects of the search (see
subplex
for details). The argument fail.value
can be used as in method="optim"
, but by default -Inf
will be used on failure to evaluate, which is generally appropriate.
method="nlminb"
: Uses the function nlminb
for
optimisation, so that optimising a Mk2/Mkn likelihood function behaves
as similarly as possible to ape
's ace
function.
As for method="optim"
, lower and upper bounds on parameters may
be specified via lower
and upper
. fail.value
can
be used to control behaviour on evaluation failure, but like
method="subplex"
, -Inf
is used which should work in most
cases. Additional control parameters may be passed via control
- see link{nlminb} for details
. This function is not generally
recommended for use.
method="nlm"
: Uses the function nlm
for
optimisation, so that optimising a birth-death likelihood function
behaves as similarly as possible to ape
's
birthdeath
function. Takes the same additional
arguments as method="nlminb"
(except that fail.value
behaves as for method="optim"
). Like method="nlminb"
,
this is not recommended for general use.
code
and logLik
methods exist for fit.mle
objects
so that parameters and log-likelihoods may be extracted. This also
allows use with AIC
.
Simple model comparison by way of likelihood ratio tests can be
performed with anova
. See Examples for usage.
A list of class fit.mle
, with at least the components
par
The estimated parameters.
lnLik
The log likelihood at the ML point.
counts
The number of function evaluations performed
during the search.
code
Convergence code. See the documentation for the
underlying optimisation method for meaning, but "0" is usually good.
func
The likelihood function used in the fit.
method
The optimisation method used.
The anova
function carries out likelihood ratio tests.
There are a few possible configurations.
First, the first fit provided could be the focal fit, and all other fits are either special cases of it (every additional model is nested within the focal model) or generalisations of it (the focal model is nested within every additional model).
Second, the models could be sequential series of fits (if
sequential=TRUE
), such that models (A, B, C, D) are to be
compared A vs. B, B vs. C, C vs. D. The models can either be strictly
increasing in parameters (A nested in B, B nested in C, ...) or
strictly decreasing in parameters (D nested in C, C nested in B, ...).
In both cases, nestedness is checked. First, the "class" of the
fitted object must match. Second, the argnames
of the
likelihood function of a sub model must all appear in the
argnames
of the parent model. There are some cases where this
second condition may not be satisfied and yet the comparison is valid
(e.g., comparing a time-varying model against a non time varying
model, and some make.quasse
fits). We attempt to detect this
but it may fail on some valid comparisons and silently allow some
invalid comparisons.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) ## Here is the 203 species tree with the true character history coded. ## Red is state '1', which has twice the speciation rate of black (state ## '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, cex=.5, show.node.state=FALSE) ## Make a BiSSE likelihood function lik <- make.bisse(phy, phy$tip.state) lik(pars) ## This takes ~30s to run, so is not enabled by default ## Not run: ## Fit the full six-parameter model fit <- find.mle(lik, pars) fit[1:2] coef(fit) # Named vector of six parameters logLik(fit) # -659.93 AIC(fit) # 1331.86 ## find.mle works with constrained models (see constrain). Here ## the two speciation rates are constrained to be the same as each ## other. lik.l <- constrain(lik, lambda0 ~ lambda1) fit.l <- find.mle(lik.l, pars[-2]) logLik(fit.l) # 663.41 ## Compare the models with anova - this shows that the more ## complicated model with two separate speciation rates fits ## significantly better than the simpler model with equal rates ## (p=0.008). anova(fit, equal.lambda=fit.l) ## You can return the parameters for the full six parameter model from ## the fitted five parameter model - this makes a good starting point ## for a ML search. coef(fit.l, full=TRUE) ## End(Not run)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) ## Here is the 203 species tree with the true character history coded. ## Red is state '1', which has twice the speciation rate of black (state ## '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, cex=.5, show.node.state=FALSE) ## Make a BiSSE likelihood function lik <- make.bisse(phy, phy$tip.state) lik(pars) ## This takes ~30s to run, so is not enabled by default ## Not run: ## Fit the full six-parameter model fit <- find.mle(lik, pars) fit[1:2] coef(fit) # Named vector of six parameters logLik(fit) # -659.93 AIC(fit) # 1331.86 ## find.mle works with constrained models (see constrain). Here ## the two speciation rates are constrained to be the same as each ## other. lik.l <- constrain(lik, lambda0 ~ lambda1) fit.l <- find.mle(lik.l, pars[-2]) logLik(fit.l) # 663.41 ## Compare the models with anova - this shows that the more ## complicated model with two separate speciation rates fits ## significantly better than the simpler model with equal rates ## (p=0.008). anova(fit, equal.lambda=fit.l) ## You can return the parameters for the full six parameter model from ## the fitted five parameter model - this makes a good starting point ## for a ML search. coef(fit.l, full=TRUE) ## End(Not run)
This function extracts a history object from a simulated
phylogeny produced by tree.bisse
.
history.from.sim.discrete(phy, states)
history.from.sim.discrete(phy, states)
phy |
A phylogeny produced by |
states |
Possible states. For |
Richard G. FitzJohn
Prepare to run a constant rate birth-death model on a
phylogenetic tree. This fits the Nee et al. 1994 equation,
duplicating the birthdeath
function in ape. Differences with
that function include (1) the function is not constrained to positive
diversification rates (mu can exceed lambda), (2) [eventual] support
for both random taxon sampling and unresolved terminal clades (but see
bd.ext
), and (3) run both MCMC and MLE fits to birth death
trees.
make.bd(tree, sampling.f=NULL, unresolved=NULL, times=NULL, control=list()) make.yule(tree, sampling.f=NULL, unresolved=NULL, times=NULL, control=list()) starting.point.bd(tree, yule=FALSE)
make.bd(tree, sampling.f=NULL, unresolved=NULL, times=NULL, control=list()) make.yule(tree, sampling.f=NULL, unresolved=NULL, times=NULL, control=list()) starting.point.bd(tree, yule=FALSE)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
times |
Vector of branching times, as returned by
|
sampling.f |
Probability of an extant species being included in the phylogeny (sampling fraction). By default, all extant species are assumed to be included. |
unresolved |
Unresolved clade information. This is a named
vector, with the number of species as the value and names
corresponding to tip labels. Tips that represent a single species
should not be included in this vector. For example
|
yule |
Should the starting point function return a Yule model (zero extinction rate)? |
control |
List of control parameters. The element |
make.bd
returns a function of class bd
.
This function has argument list (and default values)
f(pars, prior=NULL, condition.surv=TRUE)
The arguments are interpreted as
pars
A vector of two parameters, in the order
lambda
, mu
.
prior
: a valid prior. See make.prior
for
more information.
condition.surv
(logical): should the likelihood
calculation condition on survival of two lineages and the speciation
event subtending them? This is done by default, following Nee et
al. 1994.
The function "ode" method is included for completeness, but should not be taken too seriously. It uses an alternative ODE-based approach, more similar to most diversitree models, to compute the likelihood. It exists so that other models that extend the birth-death models may be tested.
Richard G. FitzJohn
Nee S., May R.M., and Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:305-311.
constrain
for making submodels, find.mle
for ML parameter estimation, mcmc
for MCMC integration,
and make.bisse
for state-dependent birth-death models.
## Simulate a tree under a constant rates birth-death model and look at ## the maximum likelihood speciation/extinction parameters: set.seed(1) phy <- trees(c(.1, .03), "bd", max.taxa=25)[[1]] lik <- make.bd(phy) ## By default, optimisation gives a lambda close to 0.1 and extremely ## small mu: fit <- find.mle(lik, c(.1, .03)) coef(fit) ## The above optimisation uses the algorithm \link{nlm} for ## compatibility with ape's \link{birthdeath}. This can be slightly ## improved by using \link{optim} for the optimisation, which allows ## bounds to be specified: fit.o <- find.mle(lik, c(.1, .03), method="optim", lower=0) coef(fit.o) logLik(fit.o) - logLik(fit) # slight improvement ## Special case methods are worked out for the Yule model, for which ## analytic solutions are available. Compare a direct fit of the Yule ## model with one where mu is constrained to be zero: lik.yule <- make.yule(phy) lik.mu0 <- constrain(lik, mu ~ 0) ## The same to a reasonable tolerance: fit.yule <- find.mle(lik.yule, .1) fit.mu0 <- find.mle(lik.mu0, .1) all.equal(fit.yule[1:2], fit.mu0[1:2], tolerance=1e-6) ## There is no significant improvement in the fit by including the mu ## parameter (unsurprising as the ML value was zero) anova(fit.o, yule=fit.yule) ## Optimisation can be done without conditioning on survival: fit.nosurv <- find.mle(lik, c(.1, .03), method="optim", lower=0, condition.surv=FALSE) coef(fit.nosurv) # higher lambda than before ## Look at the marginal likelihoods, computed through MCMC (see ## \link{mcmc} for details, and increase nsteps for smoother ## plots [takes longer]). samples <- mcmc(lik, fit$par, nsteps=500, lower=c(-Inf, -Inf), upper=c(Inf, Inf), w=c(.1, .1), fail.value=-Inf, print.every=100) samples$r <- with(samples, lambda - mu) ## Plot the profiles (see \link{profiles.plot}). ## The vertical lines are the simulated parameters, which match fairly ## well with the estimated ones. col <- c("red", "blue", "green3") profiles.plot(samples[c("lambda", "mu", "r")], col.line=col, las=1, legend="topright") abline(v=0, lty=2) abline(v=c(.1, .03, .07), col=col) ## Sample the phylogeny to include 20 of the species, and run the ## likelihood search assuming random sampling: set.seed(1) phy2 <- drop.tip(phy, sample(25, 5)) lik2 <- make.bd(phy2, sampling.f=20/25) fit2 <- find.mle(lik2, c(.1, .03)) ## The ODE based version gives comparable results. However, it is ## about 55x slower. lik.ode <- make.bd(phy, control=list(method="ode")) all.equal(lik.ode(coef(fit)), lik(coef(fit)), tolerance=2e-7)
## Simulate a tree under a constant rates birth-death model and look at ## the maximum likelihood speciation/extinction parameters: set.seed(1) phy <- trees(c(.1, .03), "bd", max.taxa=25)[[1]] lik <- make.bd(phy) ## By default, optimisation gives a lambda close to 0.1 and extremely ## small mu: fit <- find.mle(lik, c(.1, .03)) coef(fit) ## The above optimisation uses the algorithm \link{nlm} for ## compatibility with ape's \link{birthdeath}. This can be slightly ## improved by using \link{optim} for the optimisation, which allows ## bounds to be specified: fit.o <- find.mle(lik, c(.1, .03), method="optim", lower=0) coef(fit.o) logLik(fit.o) - logLik(fit) # slight improvement ## Special case methods are worked out for the Yule model, for which ## analytic solutions are available. Compare a direct fit of the Yule ## model with one where mu is constrained to be zero: lik.yule <- make.yule(phy) lik.mu0 <- constrain(lik, mu ~ 0) ## The same to a reasonable tolerance: fit.yule <- find.mle(lik.yule, .1) fit.mu0 <- find.mle(lik.mu0, .1) all.equal(fit.yule[1:2], fit.mu0[1:2], tolerance=1e-6) ## There is no significant improvement in the fit by including the mu ## parameter (unsurprising as the ML value was zero) anova(fit.o, yule=fit.yule) ## Optimisation can be done without conditioning on survival: fit.nosurv <- find.mle(lik, c(.1, .03), method="optim", lower=0, condition.surv=FALSE) coef(fit.nosurv) # higher lambda than before ## Look at the marginal likelihoods, computed through MCMC (see ## \link{mcmc} for details, and increase nsteps for smoother ## plots [takes longer]). samples <- mcmc(lik, fit$par, nsteps=500, lower=c(-Inf, -Inf), upper=c(Inf, Inf), w=c(.1, .1), fail.value=-Inf, print.every=100) samples$r <- with(samples, lambda - mu) ## Plot the profiles (see \link{profiles.plot}). ## The vertical lines are the simulated parameters, which match fairly ## well with the estimated ones. col <- c("red", "blue", "green3") profiles.plot(samples[c("lambda", "mu", "r")], col.line=col, las=1, legend="topright") abline(v=0, lty=2) abline(v=c(.1, .03, .07), col=col) ## Sample the phylogeny to include 20 of the species, and run the ## likelihood search assuming random sampling: set.seed(1) phy2 <- drop.tip(phy, sample(25, 5)) lik2 <- make.bd(phy2, sampling.f=20/25) fit2 <- find.mle(lik2, c(.1, .03)) ## The ODE based version gives comparable results. However, it is ## about 55x slower. lik.ode <- make.bd(phy, control=list(method="ode")) all.equal(lik.ode(coef(fit)), lik(coef(fit)), tolerance=2e-7)
Create a likelihood function for a birth-death model where the tree is partitioned into regions with different parameters.
make.bd.split(tree, nodes, split.t, sampling.f=NULL, unresolved=NULL)
make.bd.split(tree, nodes, split.t, sampling.f=NULL, unresolved=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
nodes |
Vector of nodes that will be split (see Details). |
split.t |
Vector of split times, same length as |
sampling.f |
Probability of an extant species being included in the phylogeny (sampling fraction). By default, all extant species are assumed to be included. |
unresolved |
Unresolved clade information. This is a named
vector, with the number of species as the value and names
corresponding to tip labels. Tips that represent a single species
should not be included in this vector. For example
|
Branching times can be controlled with the split.t
argument. If this is Inf
, split at the base of the branch (as in
MEDUSA). If 0
, split at the top (closest to the present, as in
the new option for MEDUSA). If 0 < split.t < Inf
then we split
at that time on the tree (zero is the present, with time growing
backwards).
This function is related to MEDUSA (Alfaro et al. 2009), but does not
include any of the code for efficiently moving between different splits
(split creation here is fairly slow). The primary use for this model is
for generating starting points for state dependent split models (e.g.,
make.bisse.split
) and testing a priori splits.
Richard G. FitzJohn
set.seed(1) pars <- c(.1, .03) phy <- trees(pars, "bd", max.taxa=30)[[1]] ## Here is the phylogeny: plot(phy, show.node.label=TRUE, label.offset=.1, font=1, cex=.75, no.margin=TRUE) ## Construct the plain likelihood function as a benchmark: lik <- make.bd(phy) lik(pars) # -21.74554 ## Split this phylogeny at three points: nd11, nd13 and nd26 nodes <- c("nd11", "nd13", "nd26") ## This is the index in ape's node indexing: nodes.i <- match(nodes, phy$node.label) + length(phy$tip.label) nodelabels(node=nodes.i, pch=19, cex=2, col="#FF000099") ## To make a split likelihood function, pass the node locations and times in: lik.s <- make.bd.split(phy, nodes) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1] pars4 <- rep(pars, 4) names(pars4) <- argnames(lik.s) ## Run the likelihod calculation: lik.s(pars4) # -21.74554 ## These are basically identical (to acceptable tolerance) lik.s(pars4) - lik(pars) ## You can use the labelled nodes rather than indices: lik.s2 <- make.bd.split(phy, nodes) identical(lik.s(pars4), lik.s2(pars4)) ## All the usual ML/MCMC functions work as before: fit <- find.mle(lik.s, pars4)
set.seed(1) pars <- c(.1, .03) phy <- trees(pars, "bd", max.taxa=30)[[1]] ## Here is the phylogeny: plot(phy, show.node.label=TRUE, label.offset=.1, font=1, cex=.75, no.margin=TRUE) ## Construct the plain likelihood function as a benchmark: lik <- make.bd(phy) lik(pars) # -21.74554 ## Split this phylogeny at three points: nd11, nd13 and nd26 nodes <- c("nd11", "nd13", "nd26") ## This is the index in ape's node indexing: nodes.i <- match(nodes, phy$node.label) + length(phy$tip.label) nodelabels(node=nodes.i, pch=19, cex=2, col="#FF000099") ## To make a split likelihood function, pass the node locations and times in: lik.s <- make.bd.split(phy, nodes) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1] pars4 <- rep(pars, 4) names(pars4) <- argnames(lik.s) ## Run the likelihod calculation: lik.s(pars4) # -21.74554 ## These are basically identical (to acceptable tolerance) lik.s(pars4) - lik(pars) ## You can use the labelled nodes rather than indices: lik.s2 <- make.bd.split(phy, nodes) identical(lik.s(pars4), lik.s2(pars4)) ## All the usual ML/MCMC functions work as before: fit <- find.mle(lik.s, pars4)
Create a likelihood function for the birth-death model, where birth and/or death rates are arbitrary functions of time.
make.bd.t(tree, functions, sampling.f=NULL, unresolved=NULL, control=list(), truncate=FALSE, spline.data=NULL)
make.bd.t(tree, functions, sampling.f=NULL, unresolved=NULL, control=list(), truncate=FALSE, spline.data=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
functions |
A named list of functions of time. See details. |
sampling.f |
Probability of an extant species being included in the phylogeny (sampling fraction). By default, all extant species are assumed to be included. |
unresolved |
Not yet included: present in the argument list for
future compatibility with |
control |
List of control parameters for the ODE solver. See
details in |
truncate |
Logical, indicating if functions should be truncated to zero when negative (rather than causing an error). May be scalar (applying to all functions) or a vector (of length 2). |
spline.data |
List of data for spline-based time functions. See details |
.
Richard G. FitzJohn
## First, show equivalence to the plain Birth-death model. This is not ## a very interesting use of the functions, but it serves as a useful ## check. ## Here is a simulated 25 species tree for testing. set.seed(1) pars <- c(.1, .03) phy <- trees(pars, "bd", max.taxa=25)[[1]] ## Next, make three different likelihood functions: a "normal" one that ## uses the direct birth-death calculation, an "ode" based one (that ## uses numerical integration to compute the likelihood, and is ## therefore not exact), and one that is time-varying, but that the ## time-dependent functions are constant.t(). lik.direct <- make.bd(phy) lik.ode <- make.bd(phy, control=list(method="ode")) lik.t <- make.bd.t(phy, c("constant.t", "constant.t")) lik.direct(pars) # -22.50267 ## ODE-based likelihood calculations are correct to about 1e-6. lik.direct(pars) - lik.ode(pars) ## The ODE calculation agrees exactly with the time-varying (but ## constant) calculation. lik.ode(pars) - lik.t(pars) ## Next, make a real case, where speciation is a linear function of ## time. lik.t2 <- make.bd.t(phy, c("linear.t", "constant.t")) ## Confirm that this agrees with the previous calculations when the ## slope is zero pars2 <- c(pars[1], 0, pars[2]) lik.t2(pars2) - lik.t(pars) ## The time penalty comes from moving to the ODE-based solution, not ## from the time dependence. system.time(lik.direct(pars)) # ~ 0.000 system.time(lik.ode(pars)) # ~ 0.003 system.time(lik.t(pars)) # ~ 0.003 system.time(lik.t2(pars2)) # ~ 0.003 ## Not run: fit <- find.mle(lik.direct, pars) fit.t2 <- find.mle(lik.t2, pars2) ## No significant improvement in model fit: anova(fit, time.varying=fit.t2) ## End(Not run)
## First, show equivalence to the plain Birth-death model. This is not ## a very interesting use of the functions, but it serves as a useful ## check. ## Here is a simulated 25 species tree for testing. set.seed(1) pars <- c(.1, .03) phy <- trees(pars, "bd", max.taxa=25)[[1]] ## Next, make three different likelihood functions: a "normal" one that ## uses the direct birth-death calculation, an "ode" based one (that ## uses numerical integration to compute the likelihood, and is ## therefore not exact), and one that is time-varying, but that the ## time-dependent functions are constant.t(). lik.direct <- make.bd(phy) lik.ode <- make.bd(phy, control=list(method="ode")) lik.t <- make.bd.t(phy, c("constant.t", "constant.t")) lik.direct(pars) # -22.50267 ## ODE-based likelihood calculations are correct to about 1e-6. lik.direct(pars) - lik.ode(pars) ## The ODE calculation agrees exactly with the time-varying (but ## constant) calculation. lik.ode(pars) - lik.t(pars) ## Next, make a real case, where speciation is a linear function of ## time. lik.t2 <- make.bd.t(phy, c("linear.t", "constant.t")) ## Confirm that this agrees with the previous calculations when the ## slope is zero pars2 <- c(pars[1], 0, pars[2]) lik.t2(pars2) - lik.t(pars) ## The time penalty comes from moving to the ODE-based solution, not ## from the time dependence. system.time(lik.direct(pars)) # ~ 0.000 system.time(lik.ode(pars)) # ~ 0.003 system.time(lik.t(pars)) # ~ 0.003 system.time(lik.t2(pars2)) # ~ 0.003 ## Not run: fit <- find.mle(lik.direct, pars) fit.t2 <- find.mle(lik.t2, pars2) ## No significant improvement in model fit: anova(fit, time.varying=fit.t2) ## End(Not run)
Prepare to run BiSSE (Binary State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.bisse(tree, states, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) starting.point.bisse(tree, q.div=5, yule=FALSE)
make.bisse(tree, states, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) starting.point.bisse(tree, q.div=5, yule=FALSE)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0 or
1, or |
unresolved |
Unresolved clade information: see section below for structure. |
sampling.f |
Vector of length 2 with the estimated proportion of
extant species in state 0 and 1 that are included in the phylogeny.
A value of |
nt.extra |
The number of species modelled in unresolved clades (this is in addition to the largest observed clade). |
control |
List of control parameters for the ODE solver. See details below. |
strict |
The |
q.div |
Ratio of diversification rate to character change rate. Eventually this will be changed to allow for Mk2 to be used for estimating q parameters. |
yule |
Logical: should starting parameters be Yule estimates rather than birth-death estimates? |
make.bisse
returns a function of class bisse
. This
function has argument list (and default values)
f(pars, condition.surv=TRUE, root=ROOT.OBS, root.p=NULL, intermediates=FALSE)
The arguments are interpreted as
pars
A vector of six parameters, in the order
lambda0
, lambda1
, mu0
, mu1
,
q01
, q10
.
condition.surv
(logical): should the likelihood
calculation condition on survival of two lineages and the speciation
event subtending them? This is done by default, following Nee et
al. 1994.
root
: Behaviour at the root (see Maddison et al. 2007,
FitzJohn et al. 2009). The possible options are
ROOT.FLAT
: A flat prior, weighting
and
equally.
ROOT.EQUI
: Use the equilibrium distribution
of the model, as described in Maddison et al. (2007).
ROOT.OBS
: Weight and
by their relative probability of observing the
data, following FitzJohn et al. 2009:
ROOT.GIVEN
: Root will be in state 0
with probability root.p[1]
, and in state 1 with
probability root.p[2]
.
ROOT.BOTH
: Don't do anything at the root,
and return both values. (Note that this will not give you a
likelihood!).
root.p
: Root weightings for use when
root=ROOT.GIVEN
. sum(root.p)
should equal 1.
intermediates
: Add intermediates to the returned value as
attributes:
cache
: Cached tree traversal information.
intermediates
: Mostly branch end information.
vals
: Root values.
At this point, you will have to poke about in the source for more information on these.
starting.point.bisse
produces a heuristic starting point to
start from, based on the character-independent birth-death model. You
can probably do better than this; see the vignette, for example.
bisse.starting.point
is the same code, but deprecated in favour
of starting.point.bisse
- it will be removed in a future
version.
Since 0.10.10 this is no longer supported. See the package README for more information.
This must be a data.frame
with at least the four columns
tip.label
, giving the name of the tip to which the data
applies
Nc
, giving the number of species in the clade
n0
, n1
, giving the number of species known to be
in state 0 and 1, respectively.
These columns may be in any order, and additional columns will be ignored. (Note that column names are case sensitive).
An alternative way of specifying unresolved clade information is to
use the function make.clade.tree
to construct a tree
where tips that represent clades contain information about which
species are contained within the clades. With a clade.tree
,
the unresolved
object will be automatically constructed from
the state information in states
. (In this case, states
must contain state information for the species contained within the
unresolved clades.)
The differential equations that define the
BiSSE model are solved numerically using ODE solvers from the GSL
library or deSolve's LSODA. The control
argument to
make.bisse
controls the behaviour of the integrator. This is a
list that may contain elements:
tol
: Numerical tolerance used for the calculations.
The default value of 1e-8
should be a reasonable trade-off
between speed and accuracy. Do not expect too much more than this
from the abilities of most machines!
eps
: A value that when the sum of the D values drops
below, the integration results will be discarded and the
integration will be attempted again (the second-chance integration
will divide a branch in two and try again, recursively until the
desired accuracy is reached). The default value of 0
will
only discard integration results when the parameters go negative.
However, for some problems more restrictive values (on the order
of control$tol
) will give better stability.
backend
: Select the solver. The three options here are
gslode
: (the default). Use the GSL solvers, by
default a Runge Kutta Kash Carp stepper.
deSolve
: Use the LSODA solver from the
deSolve
package. This is quite a bit slower at the
moment.
deSolve
is the only supported backend on Windows.
Richard G. FitzJohn
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.
Nee S., May R.M., and Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:305-311.
constrain
for making submodels, find.mle
for ML parameter estimation, mcmc
for MCMC integration,
and make.bd
for state-independent birth-death models.
The help pages for find.mle
has further examples of ML
searches on full and constrained BiSSE models.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) ## Here is the 52 species tree with the true character history coded. ## Red is state '1', which has twice the speciation rate of black (state ## '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) lik <- make.bisse(phy, phy$tip.state) lik(pars) # -159.71
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) ## Here is the 52 species tree with the true character history coded. ## Red is state '1', which has twice the speciation rate of black (state ## '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) lik <- make.bisse(phy, phy$tip.state) lik(pars) # -159.71
Create a likelihood function for a BiSSE model where the
tree is partitioned into regions with different parameters.
Alternatively, make.bisse.uneven
can be used where different
regions of the tree have different fractions of species known.
make.bisse.split(tree, states, nodes, split.t, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) make.bisse.uneven(tree, states, nodes, split.t, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list())
make.bisse.split(tree, states, nodes, split.t, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) make.bisse.uneven(tree, states, nodes, split.t, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list())
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0 or
1, or |
nodes |
Vector of nodes that will be split (see Details). |
split.t |
Vector of split times, same length as |
unresolved |
Unresolved clade information: see section below for structure. |
sampling.f |
Vector of length 2 with the estimated proportion of
extant species in state 0 and 1 that are included in the phylogeny.
A value of |
nt.extra |
The number of species modelled in unresolved clades (this is in addition to the largest observed clade). |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
Branching times can be controlled with the split.t
argument. If this is Inf
, split at the base of the branch (as in
MEDUSA). If 0
, split at the top (closest to the present, as in
the new option for MEDUSA). If 0 < split.t < Inf
then we split
at that time on the tree (zero is the present, with time growing
backwards).
TODO: Describe nodes
and split.t
here.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(546) phy <- tree.bisse(pars, max.taxa=30, x0=0) ## Here is the phylogeny: plot(phy, show.node.label=TRUE, label.offset=.1, font=1, cex=.75, no.margin=TRUE) ## Here is a plain BiSSE function for comparison: lik.b <- make.bisse(phy, phy$tip.state) lik.b(pars) # -93.62479 ## Split this phylogeny at three points: nd15, nd18 and nd26 nodes <- c("nd15", "nd18", "nd26") ## This is the index in ape's node indexing: nodes.i <- match(nodes, phy$node.label) + length(phy$tip.label) nodelabels(node=nodes.i, pch=19, cex=2, col="#FF000099") ## To make a split BiSSE function, pass the node locations and times in: lik.s <- make.bisse.split(phy, phy$tip.state, nodes.i) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1] pars4 <- rep(pars, 4) pars4 ## Run the likelihod calculation: lik.s(pars4) # -93.62479 ## These are basically identical (to acceptable tolerance) lik.s(pars4) - lik.b(pars) ## You can use the labelled nodes rather than indices: lik.s2 <- make.bisse.split(phy, phy$tip.state, nodes) identical(lik.s(pars4), lik.s2(pars4))
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(546) phy <- tree.bisse(pars, max.taxa=30, x0=0) ## Here is the phylogeny: plot(phy, show.node.label=TRUE, label.offset=.1, font=1, cex=.75, no.margin=TRUE) ## Here is a plain BiSSE function for comparison: lik.b <- make.bisse(phy, phy$tip.state) lik.b(pars) # -93.62479 ## Split this phylogeny at three points: nd15, nd18 and nd26 nodes <- c("nd15", "nd18", "nd26") ## This is the index in ape's node indexing: nodes.i <- match(nodes, phy$node.label) + length(phy$tip.label) nodelabels(node=nodes.i, pch=19, cex=2, col="#FF000099") ## To make a split BiSSE function, pass the node locations and times in: lik.s <- make.bisse.split(phy, phy$tip.state, nodes.i) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1] pars4 <- rep(pars, 4) pars4 ## Run the likelihod calculation: lik.s(pars4) # -93.62479 ## These are basically identical (to acceptable tolerance) lik.s(pars4) - lik.b(pars) ## You can use the labelled nodes rather than indices: lik.s2 <- make.bisse.split(phy, phy$tip.state, nodes) identical(lik.s(pars4), lik.s2(pars4))
Create a likelihood function for a BiSSE model where different chunks of time have different parameters. This code is experimental!
make.bisse.td(tree, states, n.epoch, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) make.bisse.t(tree, states, functions, unresolved=NULL, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
make.bisse.td(tree, states, n.epoch, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list()) make.bisse.t(tree, states, functions, unresolved=NULL, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0 or
1, or |
n.epoch |
Number of epochs. 1 corresponds to plain BiSSE, so this will generally be an integer at least 2. |
functions |
A named character vector of functions of time. See details. |
unresolved |
Unresolved clade information: see
|
sampling.f |
Vector of length 2 with the estimated proportion of
extant species in state 0 and 1 that are included in the phylogeny.
See |
nt.extra |
The number of species modelled in unresolved clades (this is in addition to the largest observed clade). |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
truncate |
Logical, indicating if functions should be truncated to zero when negative (rather than causing an error). May be scalar (applying to all functions) or a vector (of length 6). |
spline.data |
List of data for spline-based time functions. See details |
.
This builds a BiSSE likelihood function where different regions of time (epochs) have different parameter sets. By default, all parameters are free to vary between epochs, so some constraining will probably be required to get reasonable answers.
For n
epochs, there are n-1
time points; the first
n-1
elements of the likelihood's parameter vector are these
points. These are measured from the present at time zero, with time
increasing towards the base of the tree. The rest of the parameter
vector are BiSSE parameters; the elements n:(n+6)
are for the
first epoch (closest to the present), elements (n+7):(n+13)
are
for the second epoch, and so on.
For make.bisse.t
, the funtions
is a vector of names of
functions of time.
For example, to have speciation rates be linear functions of
time, while the extinction and character change rates be constant with
respect to time, one can do
functions=rep(c("linear.t", "constant.t"), c(2, 4))
The functions here must have t
as their first argument,
interpreted as time back from the present. Other possible functions
are "sigmoid.t", "stepf.t", "spline.t", "exp.t", and "spline.linear.t".
Unfortunately, documentation is still pending.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } set.seed(4) pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) phy <- tree.bisse(pars, max.t=30, x0=0) ## Suppose we want to see if diversification is different in the most ## recent 3 time units, compared with the rest of the tree (yes, this is ## a totally contrived example!): plot(phy) axisPhylo() abline(v=max(branching.times(phy)) - 3, col="red", lty=3) ## For comparison, make a plain BiSSE likelihood function lik.b <- make.bisse(phy, phy$tip.state) ## Create the time-dependent likelihood function. The final argument ## here is the number of 'epochs' that are allowed. Two epochs is one ## switch point. lik.t <- make.bisse.td(phy, phy$tip.state, 2) ## The switch point is the first argument. The remaining 12 parameters ## are the BiSSE parameters, with the first 6 being the most recent ## epoch. argnames(lik.t) pars.t <- c(3, pars, pars) names(pars.t) <- argnames(lik.t) ## Calculations are identical to a reasonable tolerance: lik.b(pars) - lik.t(pars.t) ## It will often be useful to constrain the time as a fixed quantity. lik.t2 <- constrain(lik.t, t.1 ~ 3) ## Parameter estimation under maximum likelihood. This is marked "don't ## run" because the time-dependent fit takes a few minutes. ## Not run: ## Fit the BiSSE ML model fit.b <- find.mle(lik.b, pars) ## And fit the BiSSE/td model fit.t <- find.mle(lik.t2, pars.t[argnames(lik.t2)], control=list(maxit=20000)) ## Compare these two fits with a likelihood ratio test (lik.t2 is nested ## within lik.b) anova(fit.b, td=fit.t) ## End(Not run) ## The time varying model (bisse.t) is more general, but substantially ## slower. Here, I will show that the two functions are equivalent for ## step function models. We'll constrain all the non-lambda parameters ## to be the same over a time-switch at t=5. This leaves 8 parameters. lik.td <- make.bisse.td(phy, phy$tip.state, 2) lik.td2 <- constrain(lik.td, t.1 ~ 5, mu0.2 ~ mu0.1, mu1.2 ~ mu1.1, q01.2 ~ q01.1, q10.2 ~ q10.1) lik.t <- make.bisse.t(phy, phy$tip.state, rep(c("stepf.t", "constant.t"), c(2, 4))) lik.t2 <- constrain(lik.t, lambda0.tc ~ 5, lambda1.tc ~ 5) ## Note that the argument names for these functions are different from ## one another. This reflects different ways that the functions will ## tend to be used, but is potentially confusing here. argnames(lik.td2) argnames(lik.t2) ## First, evaluate the functions with no time effect and check that they ## are the same as the base BiSSE model p.td <- c(pars, pars[1:2]) p.t <- pars[c(1, 1, 2, 2, 3:6)] ## All agree: lik.b(pars) # -159.7128 lik.td2(p.td) # -159.7128 lik.t2(p.t) # -159.7128 ## In fact, the time-varying BiSSE will tend to be identical to plain ## BiSSE where the functions to not change: lik.b(pars) - lik.t2(p.t) ## Slight numerical differences are typical for the time-chunk BiSSE, ## because it forces the integration to be carried out more carefully ## around the switch point. lik.b(pars) - lik.td2(p.td) ## Next, evaluate the functions with a time effect (5 time units ago, ## speciation rates were twice the contemporary rate) p.td2 <- c(pars, pars[1:2]*2) p.t2 <- c(pars[1], pars[1]*2, pars[2], pars[2]*2, pars[3:6]) ## Huge drop in the likelihood (from -159.7128 to -172.7874) lik.b(pars) lik.td2(p.td2) lik.t2(p.t2) ## The small difference remains between the two approaches, but they are ## basically the same. lik.td2(p.td2) - lik.t2(p.t2) ## There is a small time cost to both time-dependent methods, ## heavily paid for the time-chunk case: system.time(lik.b(pars)) system.time(lik.td2(p.td)) # 1.9x slower than plain BiSSE system.time(lik.td2(p.td2)) # 1.9x slower than plain BiSSE system.time(lik.t2(p.t)) # about the same speed system.time(lik.t2(p.t2)) # about the same speed
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } set.seed(4) pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) phy <- tree.bisse(pars, max.t=30, x0=0) ## Suppose we want to see if diversification is different in the most ## recent 3 time units, compared with the rest of the tree (yes, this is ## a totally contrived example!): plot(phy) axisPhylo() abline(v=max(branching.times(phy)) - 3, col="red", lty=3) ## For comparison, make a plain BiSSE likelihood function lik.b <- make.bisse(phy, phy$tip.state) ## Create the time-dependent likelihood function. The final argument ## here is the number of 'epochs' that are allowed. Two epochs is one ## switch point. lik.t <- make.bisse.td(phy, phy$tip.state, 2) ## The switch point is the first argument. The remaining 12 parameters ## are the BiSSE parameters, with the first 6 being the most recent ## epoch. argnames(lik.t) pars.t <- c(3, pars, pars) names(pars.t) <- argnames(lik.t) ## Calculations are identical to a reasonable tolerance: lik.b(pars) - lik.t(pars.t) ## It will often be useful to constrain the time as a fixed quantity. lik.t2 <- constrain(lik.t, t.1 ~ 3) ## Parameter estimation under maximum likelihood. This is marked "don't ## run" because the time-dependent fit takes a few minutes. ## Not run: ## Fit the BiSSE ML model fit.b <- find.mle(lik.b, pars) ## And fit the BiSSE/td model fit.t <- find.mle(lik.t2, pars.t[argnames(lik.t2)], control=list(maxit=20000)) ## Compare these two fits with a likelihood ratio test (lik.t2 is nested ## within lik.b) anova(fit.b, td=fit.t) ## End(Not run) ## The time varying model (bisse.t) is more general, but substantially ## slower. Here, I will show that the two functions are equivalent for ## step function models. We'll constrain all the non-lambda parameters ## to be the same over a time-switch at t=5. This leaves 8 parameters. lik.td <- make.bisse.td(phy, phy$tip.state, 2) lik.td2 <- constrain(lik.td, t.1 ~ 5, mu0.2 ~ mu0.1, mu1.2 ~ mu1.1, q01.2 ~ q01.1, q10.2 ~ q10.1) lik.t <- make.bisse.t(phy, phy$tip.state, rep(c("stepf.t", "constant.t"), c(2, 4))) lik.t2 <- constrain(lik.t, lambda0.tc ~ 5, lambda1.tc ~ 5) ## Note that the argument names for these functions are different from ## one another. This reflects different ways that the functions will ## tend to be used, but is potentially confusing here. argnames(lik.td2) argnames(lik.t2) ## First, evaluate the functions with no time effect and check that they ## are the same as the base BiSSE model p.td <- c(pars, pars[1:2]) p.t <- pars[c(1, 1, 2, 2, 3:6)] ## All agree: lik.b(pars) # -159.7128 lik.td2(p.td) # -159.7128 lik.t2(p.t) # -159.7128 ## In fact, the time-varying BiSSE will tend to be identical to plain ## BiSSE where the functions to not change: lik.b(pars) - lik.t2(p.t) ## Slight numerical differences are typical for the time-chunk BiSSE, ## because it forces the integration to be carried out more carefully ## around the switch point. lik.b(pars) - lik.td2(p.td) ## Next, evaluate the functions with a time effect (5 time units ago, ## speciation rates were twice the contemporary rate) p.td2 <- c(pars, pars[1:2]*2) p.t2 <- c(pars[1], pars[1]*2, pars[2], pars[2]*2, pars[3:6]) ## Huge drop in the likelihood (from -159.7128 to -172.7874) lik.b(pars) lik.td2(p.td2) lik.t2(p.t2) ## The small difference remains between the two approaches, but they are ## basically the same. lik.td2(p.td2) - lik.t2(p.t2) ## There is a small time cost to both time-dependent methods, ## heavily paid for the time-chunk case: system.time(lik.b(pars)) system.time(lik.td2(p.td)) # 1.9x slower than plain BiSSE system.time(lik.td2(p.td2)) # 1.9x slower than plain BiSSE system.time(lik.t2(p.t)) # about the same speed system.time(lik.t2(p.t2)) # about the same speed
Prepare to run BiSSE-ness (Binary State Speciation and Extinction (Node Enhanced State Shift)) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.bisseness(tree, states, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list())
make.bisseness(tree, states, unresolved=NULL, sampling.f=NULL, nt.extra=10, strict=TRUE, control=list())
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0 or
1, or |
unresolved |
Unresolved clade information: see section below for structure. |
sampling.f |
Vector of length 2 with the estimated proportion of
extant species in state 0 and 1 that are included in the phylogeny.
A value of |
nt.extra |
The number of "extra" species to include in the unresolved clade calculations. This is in addition to the largest included unresolved clade. |
control |
List of control parameters for the ODE solver. See
details in |
strict |
The |
make.bisse
returns a function of class bisse
. This
function has argument list (and default values) [RICH: Update to BiSSEness?]
f(pars, condition.surv=TRUE, root=ROOT.OBS, root.p=NULL, intermediates=FALSE)
The arguments are interpreted as
pars
A vector of 10 parameters, in the order
lambda0
, lambda1
, mu0
, mu1
,
q01
, q10
, p0c
, p0a
, p1c
, p1a
.
condition.surv
(logical): should the likelihood
calculation condition on survival of two lineages and the speciation
event subtending them? This is done by default, following Nee et
al. 1994. For BiSSE-ness, equation (A5) in Magnuson-Ford and Otto
describes how conditioning on survival alters the likelihood of
observing the data.
root
: Behaviour at the root (see Maddison et al. 2007,
FitzJohn et al. 2009). The possible options are
ROOT.FLAT
: A flat prior, weighting
and
equally.
ROOT.EQUI
: Use the equilibrium distribution
of the model, as described in Maddison et al. (2007) using
equation (A6) in Magnuson-Ford and Otto.
ROOT.OBS
: Weight and
by their relative probability of observing the
data, following FitzJohn et al. 2009:
ROOT.GIVEN
: Root will be in state 0
with probability root.p[1]
, and in state 1 with
probability root.p[2]
.
ROOT.BOTH
: Don't do anything at the root,
and return both values. (Note that this will not give you a
likelihood!).
root.p
: Root weightings for use when
root=ROOT.GIVEN
. sum(root.p)
should equal 1.
intermediates
: Add intermediates to the returned value as
attributes:
cache
: Cached tree traversal information.
intermediates
: Mostly branch end information.
vals
: Root values.
At this point, you will have to poke about in the source for more information on these.
Since 0.10.10 this is no longer supported. See the package README for more information.
This must be a data.frame
with at least the four columns
tip.label
, giving the name of the tip to which the data
applies
Nc
, giving the number of species in the clade
n0
, n1
, giving the number of species known to be
in state 0 and 1, respectively.
These columns may be in any order, and additional columns will be ignored. (Note that column names are case sensitive).
An alternative way of specifying unresolved clade information is to
use the function make.clade.tree
to construct a tree
where tips that represent clades contain information about which
species are contained within the clades. With a clade.tree
,
the unresolved
object will be automatically constructed from
the state information in states
. (In this case, states
must contain state information for the species contained within the
unresolved clades.)
Karen Magnuson-Ford
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.
Magnuson-Ford, K., and Otto, S.P. 2012. Linking the investigations of character evolution and species diversification. American Naturalist, in press.
Nee S., May R.M., and Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:305-311.
make.bisse
for the model with no state change at nodes.
tree.bisseness
for simulating trees under the BiSSE-ness
model.
constrain
for making submodels, find.mle
for ML parameter estimation, mcmc
for MCMC integration,
and make.bd
for state-independent birth-death models.
The help pages for find.mle
has further examples of ML
searches on full and constrained BiSSE models.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## First we simulate a 50 species tree, assuming cladogenetic shifts in ## the trait (i.e., the trait only changes at speciation). ## Red is state '1', black is state '0', and we let red lineages ## speciate at twice the rate of black lineages. ## The simulation starts in state 0. set.seed(3) pars <- c(0.1, 0.2, 0.03, 0.03, 0, 0, 0.1, 0, 0.1, 0) phy <- tree.bisseness(pars, max.taxa=50, x0=0) phy$tip.state h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## This builds the likelihood of the data according to BiSSEness: lik <- make.bisseness(phy, phy$tip.state) ## e.g., the likelihood of the true parameters is: lik(pars) # -174.7954 ## ML search: First we make hueristic guess at a starting point, based ## on the constant-rate birth-death model assuming anagenesis (uses ## \link{make.bd}). startp <- starting.point.bisse(phy) ## We then take the total amount of anagenetic change expected across ## the tree and assign half of this change to anagenesis and half to ## cladogenetic change at the nodes as a heuristic starting point: t <- branching.times(phy) tryq <- 1/2 * startp[["q01"]] * sum(t)/length(t) p <- c(startp[1:4], startp[5:6]/2, p0c=tryq, p0a=0.5, p1c=tryq, p1a=0.5)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## First we simulate a 50 species tree, assuming cladogenetic shifts in ## the trait (i.e., the trait only changes at speciation). ## Red is state '1', black is state '0', and we let red lineages ## speciate at twice the rate of black lineages. ## The simulation starts in state 0. set.seed(3) pars <- c(0.1, 0.2, 0.03, 0.03, 0, 0, 0.1, 0, 0.1, 0) phy <- tree.bisseness(pars, max.taxa=50, x0=0) phy$tip.state h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## This builds the likelihood of the data according to BiSSEness: lik <- make.bisseness(phy, phy$tip.state) ## e.g., the likelihood of the true parameters is: lik(pars) # -174.7954 ## ML search: First we make hueristic guess at a starting point, based ## on the constant-rate birth-death model assuming anagenesis (uses ## \link{make.bd}). startp <- starting.point.bisse(phy) ## We then take the total amount of anagenetic change expected across ## the tree and assign half of this change to anagenesis and half to ## cladogenetic change at the nodes as a heuristic starting point: t <- branching.times(phy) tryq <- 1/2 * startp[["q01"]] * sum(t)/length(t) p <- c(startp[1:4], startp[5:6]/2, p0c=tryq, p0a=0.5, p1c=tryq, p1a=0.5)
Create a likelihood function for models of simple Brownian Motion (BM), Ornstein-Uhlenbeck (OU), or Early Burst (EB) character evolution, or BM on a “lambda” rescaled tree. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.bm(tree, states, states.sd=0, control=list()) make.ou(tree, states, states.sd=0, with.optimum=FALSE, control=list()) make.eb(tree, states, states.sd=0, control=list()) make.lambda(tree, states, states.sd=0, control=list())
make.bm(tree, states, states.sd=0, control=list()) make.ou(tree, states, states.sd=0, with.optimum=FALSE, control=list()) make.eb(tree, states, states.sd=0, control=list()) make.lambda(tree, states, states.sd=0, control=list())
tree |
A bifurcating phylogenetic tree, in |
states |
A vector of continuous valued character states. This
vector must be named with the tip labels of |
states.sd |
An optional vector of measurement errors, as standard
deviation around the mean. If a single value is given it is used
for all tips, otherwise the vector must be named as for
|
with.optimum |
Should the optimum (often "theta") be considered
a free parameter? The default, |
control |
A list of control parameters. See details below. |
The control
argument is a named list of options.
The main option is method
. Specifying
control=list(method="vcv")
uses a variance-covariance matrix
based approach to compute the likelihood. This is similar to the
approach used by geiger, and is the default.
Two alternative approaches are available.
control=list(method="pruning")
uses the transition density
function for brownian motion along each branch, similar to how most
methods in diversitree are computed. This second approach is much
faster for very large trees. control=list(method="contrasts")
uses Freckleton (2012)'s contrasts based approach, which is also much
faster on large trees.
When method="pruning"
is specified, backend="R"
or
backend="C"
may also be provided, which switch between a slow
(and stable) R calculator and a fast (but less extensively tested) C
calculator. backend="R"
is currently the default.
The VCV-based functions are heavily based on fitContinuous
in
the geiger
package.
For non-ultrametric trees with OU models, computed likelihoods may differ because of the different root treatments. This is particularly the case for models where the optimum is estimated.
For the EB model, the parameter intepretation follows geiger; the 'a' parameter is equivalent to -log(g) in Bloomberg et al. 2003; when negative it indicates a decelerating rate of trait evolution over time. When zero, it reduces to Brownian motion.
Richard G. FitzJohn
See https://www.zoology.ubc.ca/prog/diversitree/examples/ou-nonultrametric/ for a discussion about calculations on non-ultrametric trees.
## Random data (following APE) data(bird.orders) set.seed(1) x <- structure(rnorm(length(bird.orders$tip.label)), names=bird.orders$tip.label) ## Not run: ## With the VCV approach fit1 <- find.mle(make.bm(bird.orders, x), .1) ## With the pruning calculations lik.pruning <- make.bm(bird.orders, x, control=list(method="pruning")) fit2 <- find.mle(lik.pruning, .1) ## All the same (need to drop the function from this though) all.equal(fit1[-7], fit2[-7]) ## If this is the same as the estimates from Geiger, to within the ## tolerances expected for the calculation and optimisation: fit3 <- fitContinuous(bird.orders, x) all.equal(fit3$Trait1$lnl, fit1$lnLik) all.equal(fit3$Trait1$beta, fit1$par, check.attributes=FALSE) ## End(Not run)
## Random data (following APE) data(bird.orders) set.seed(1) x <- structure(rnorm(length(bird.orders$tip.label)), names=bird.orders$tip.label) ## Not run: ## With the VCV approach fit1 <- find.mle(make.bm(bird.orders, x), .1) ## With the pruning calculations lik.pruning <- make.bm(bird.orders, x, control=list(method="pruning")) fit2 <- find.mle(lik.pruning, .1) ## All the same (need to drop the function from this though) all.equal(fit1[-7], fit2[-7]) ## If this is the same as the estimates from Geiger, to within the ## tolerances expected for the calculation and optimisation: fit3 <- fitContinuous(bird.orders, x) all.equal(fit3$Trait1$lnl, fit1$lnLik) all.equal(fit3$Trait1$beta, fit1$par, check.attributes=FALSE) ## End(Not run)
This function makes a “clade tree”, where tips
represent clades. It is designed to make working with unresolved
clade information in make.bisse
more straightforward.
clade.tree
objects have their own plotting methods.
make.clade.tree(tree, clades) clades.from.polytomies(tree) clades.from.classification(tree, class, check=TRUE)
make.clade.tree(tree, clades) clades.from.polytomies(tree) clades.from.classification(tree, class, check=TRUE)
tree |
An ultrametric phylogenetic tree, in |
clades |
A list, where the name of each element represents a tip
in |
class |
A vector along |
check |
Logical, indicating whether a (potentially slow) check will be done to ensure that all resulting clades are reciprocally monophyletic within the tree. |
The idea here is that make.bisse
takes a tree and a named
character state vector. If the phylogenetic tree contains information
about the membership of clades, then the unresolved clade information
can be constructed automatically. The names chosen should therefore
reflect the names used in the state information.
Currently, clade.tree
objects work poorly with some ape
functions.
Richard G. FitzJohn
Prepare to run ClaSSE (Cladogenetic State change Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.classe(tree, states, k, sampling.f=NULL, strict=TRUE, control=list()) starting.point.classe(tree, k, eps=0.5)
make.classe(tree, states, k, sampling.f=NULL, strict=TRUE, control=list()) starting.point.classe(tree, k, eps=0.5)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be an
integer between 1 and |
k |
The number of states. (The maximum now is 31, but that can easily be increased if necessary.) |
sampling.f |
Vector of length |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
eps |
Ratio of extinction to speciation rates to be used when choosing a starting set of parameters. The procedure used is based on Magallon & Sanderson (2001). |
The ClaSSE model with k = 2
is equivalent to but a different
parameterization than the BiSSE-ness model.
The GeoSSE model can be constructed from ClaSSE
with k = 3
; see the example below.
make.classe
returns a function of class classe
. The
arguments and default values for this function are:
f(pars, condition.surv=TRUE, root=ROOT.OBS, root.p=NULL, intermediates=FALSE)
The arguments of this function are explained in make.bisse. The speciation rate parameters are lambda_ijk, ordered with k changing fastest and insisting on j < k.
With more than 9 states, lambda_ijk and q_ij can be ambiguous (e.g. is q113 1->13 or 11->3?). To avoid this, the numbers are zero padded (so that the above would be q0113 or q1103 for 1->13 and 11->3 respectively). It might be easier to rename the arguments in practice though. More human-friendly handling of large speciation rate arrays is in the works.
starting.point.classe
produces a first-guess set of parameters,
ignoring character states.
Unresolved clade methods are not available for ClaSSE.
Tree simulation methods are not yet available for ClaSSE.
Emma E. Goldberg
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.
Goldberg E.E. and Igic B. Tempo and mode in plant breeding system evolution. In review.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.
Magallon S. and Sanderson M.J. 2001. Absolute diversification rates in angiospem clades. Evol. 55:1762-1780.
Magnuson-Ford, K., and Otto, S.P. 2012. Linking the investigations of character evolution and species diversification. American Naturalist, in press.
constrain
for making submodels, find.mle
for ML parameter estimation, and mcmc
for MCMC
integration. The help page for find.mle
has further
examples of ML searches on full and constrained BiSSE models. Things
work similarly for ClaSSE, just with different speciation parameters.
make.bisse
, make.bisseness
,
make.geosse
, make.musse
for similar models
and further relevant examples.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## GeoSSE equivalence ## Same tree simulated in ?make.geosse pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) lik.g <- make.geosse(phy, phy$tip.state) pars.g <- c(1.5, 0.5, 1.0, 0.7, 0.7, 1.4, 1.3) names(pars.g) <- argnames(lik.g) lik.c <- make.classe(phy, phy$tip.state+1, 3) pars.c <- 0 * starting.point.classe(phy, 3) pars.c['lambda222'] <- pars.c['lambda112'] <- pars.g['sA'] pars.c['lambda333'] <- pars.c['lambda113'] <- pars.g['sB'] pars.c['lambda123'] <- pars.g['sAB'] pars.c['mu2'] <- pars.c['q13'] <- pars.g['xA'] pars.c['mu3'] <- pars.c['q12'] <- pars.g['xB'] pars.c['q21'] <- pars.g['dA'] pars.c['q31'] <- pars.g['dB'] lik.g(pars.g) # -175.7685 lik.c(pars.c) # -175.7685
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## GeoSSE equivalence ## Same tree simulated in ?make.geosse pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) lik.g <- make.geosse(phy, phy$tip.state) pars.g <- c(1.5, 0.5, 1.0, 0.7, 0.7, 1.4, 1.3) names(pars.g) <- argnames(lik.g) lik.c <- make.classe(phy, phy$tip.state+1, 3) pars.c <- 0 * starting.point.classe(phy, 3) pars.c['lambda222'] <- pars.c['lambda112'] <- pars.g['sA'] pars.c['lambda333'] <- pars.c['lambda113'] <- pars.g['sB'] pars.c['lambda123'] <- pars.g['sAB'] pars.c['mu2'] <- pars.c['q13'] <- pars.g['xA'] pars.c['mu3'] <- pars.c['q12'] <- pars.g['xB'] pars.c['q21'] <- pars.g['dA'] pars.c['q31'] <- pars.g['dB'] lik.g(pars.g) # -175.7685 lik.c(pars.c) # -175.7685
Prepare to run GeoSSE (Geographic State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.geosse(tree, states, sampling.f=NULL, strict=TRUE, control=list()) starting.point.geosse(tree, eps=0.5)
make.geosse(tree, states, sampling.f=NULL, strict=TRUE, control=list()) starting.point.geosse(tree, eps=0.5)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0
(in both regions/widespread; AB), 1 or 2 (endemic to one region; A
or B), or |
sampling.f |
Vector of length 3 with the estimated proportion of
extant species in states 0, 1 and 2 that are included in the
phylogeny. A value of |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
eps |
Ratio of extinction to speciation rates to be used when choosing a starting set of parameters. The procedure used is based on Magallon & Sanderson (2001). |
make.geosse
returns a function of class geosse
. The
arguments and default values for this function are:
f(pars, condition.surv=TRUE, root=ROOT.OBS, root.p=NULL, intermediates=FALSE)
The arguments of this function are explained in make.bisse.
The parameter vector pars
is ordered sA
, sB
,
sAB
, xA
, xB
, dA
, dB
.
Unresolved clade methods are not available for GeoSSE. With three states, it would rapidly become computationally infeasible.
Emma E. Goldberg
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.
Goldberg E.E., Lancaster L.T., and Ree R.H. 2011. Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. Syst. Biol. 60:451-465.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.
Magallon S. and Sanderson M.J. 2001. Absolute diversification rates in angiospem clades. Evol. 55:1762-1780.
constrain
for making submodels, find.mle
for ML parameter estimation, mcmc
for MCMC integration,
make.bisse
for further relevant examples.
The help page for find.mle
has further examples of ML
searches on full and constrained BiSSE models. Things work similarly
for GeoSSE, just with different parameters.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Parameter values pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() ## Simulate a tree set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) ## See the data statecols <- c("AB"="purple", "A"="blue", "B"="red") plot(phy, tip.color=statecols[phy$tip.state+1], cex=0.5) ## The likelihood function lik <- make.geosse(phy, phy$tip.state) ## With "true" parameter values lik(pars) # -168.4791 ## A guess at a starting point. p <- starting.point.geosse(phy) ## Start an ML search from this point (takes a couple minutes to run). ## Not run: fit <- find.mle(lik, p, method="subplex") logLik(fit) # -165.9965 ## Compare with sim values. rbind(real=pars, estimated=round(coef(fit), 2)) ## A model with constraints on the dispersal rates. lik.d <- constrain(lik, dA ~ dB) fit.d <- find.mle(lik.d, p[-7]) logLik(fit.d) # -166.7076 ## A model with constraints on the speciation rates. lik.s <- constrain(lik, sA ~ sB, sAB ~ 0) fit.s <- find.mle(lik.s, p[-c(2,3)]) logLik(fit.s) # -169.0123 ## End(Not run) ## "Skeletal tree" sampling is supported. For example, if your tree ## includes all AB species, half of A species, and a third of B species, ## create the likelihood function like this: lik.f <- make.geosse(phy, phy$tip.state, sampling.f=c(1, 0.5, 1/3)) ## If you have external evidence that the base of your tree must have ## been in state 1, say (endemic to region A), you can fix the root ## when computing the likelihood, like this: lik(pars, root=ROOT.GIVEN, root.p=c(0,1,0))
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Parameter values pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() ## Simulate a tree set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) ## See the data statecols <- c("AB"="purple", "A"="blue", "B"="red") plot(phy, tip.color=statecols[phy$tip.state+1], cex=0.5) ## The likelihood function lik <- make.geosse(phy, phy$tip.state) ## With "true" parameter values lik(pars) # -168.4791 ## A guess at a starting point. p <- starting.point.geosse(phy) ## Start an ML search from this point (takes a couple minutes to run). ## Not run: fit <- find.mle(lik, p, method="subplex") logLik(fit) # -165.9965 ## Compare with sim values. rbind(real=pars, estimated=round(coef(fit), 2)) ## A model with constraints on the dispersal rates. lik.d <- constrain(lik, dA ~ dB) fit.d <- find.mle(lik.d, p[-7]) logLik(fit.d) # -166.7076 ## A model with constraints on the speciation rates. lik.s <- constrain(lik, sA ~ sB, sAB ~ 0) fit.s <- find.mle(lik.s, p[-c(2,3)]) logLik(fit.s) # -169.0123 ## End(Not run) ## "Skeletal tree" sampling is supported. For example, if your tree ## includes all AB species, half of A species, and a third of B species, ## create the likelihood function like this: lik.f <- make.geosse(phy, phy$tip.state, sampling.f=c(1, 0.5, 1/3)) ## If you have external evidence that the base of your tree must have ## been in state 1, say (endemic to region A), you can fix the root ## when computing the likelihood, like this: lik(pars, root=ROOT.GIVEN, root.p=c(0,1,0))
Create a likelihood function for a GeoSSE model where the tree is partitioned into regions with different parameters.
make.geosse.split(tree, states, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list()) make.geosse.uneven(tree, states, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list())
make.geosse.split(tree, states, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list()) make.geosse.uneven(tree, states, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list())
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be an
integer between 0 and 2: see |
nodes |
Vector of nodes that will be split (see Details). |
split.t |
Vector of split times, same length as |
sampling.f |
Vector of length 3 where |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
Branching times can be controlled with the split.t
argument. If this is Inf
, split at the base of the branch (as in
MEDUSA). If 0
, split at the top (closest to the present, as in
the new option for MEDUSA). If 0 < split.t < Inf
then we split
at that time on the tree (zero is the present, with time growing
backwards).
The nodes
at the top of the split location are specified as a
vector of node names. For example, a value of c("nd10",
"nd12")
means that the splits are along the branches leading from each
of these nodes towards the root.
Emma E. Goldberg
Prepare to run time dependent GeoSSE (Geographic State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.geosse.t(tree, states, functions, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
make.geosse.t(tree, states, functions, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
tree |
A phylogenetic tree, in |
states |
A vector of character states, each of which must be 0
(in both regions/widespread; AB), 1 or 2 (endemic to one region; A or
B), or |
functions |
A named character vector of functions of time. See details. |
sampling.f |
Vector of length 3 with the estimated proportion of
extant species in states 0, 1 and 2 that are included in the
phylogeny. A value of |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
truncate |
Logical, indicating if functions should be truncated to zero when negative (rather than causing an error). May be scalar (applying to all functions) or a vector (of length 7). |
spline.data |
List of data for spline-based time functions. See
details in |
.
Please see make.bisse.t
for further details.
make.geosse.t
returns a function of class geosse.t
.
The funtions
is a vector of named functions of time. For
example, to have speciation rates be linear functions of time, while
the extinction and dispersal rates be constant with respect to time,
one can do
functions=rep(c("linear.t", "constant.t"), c(3, 4))
. The functions here must have t
as their first
argument, interpreted as time back from the present. See
make.bisse.t
for more information, and for some
potentially useful time functions.
The function has argument list (and default values):
f(pars, condition.surv=FALSE, root=ROOT.OBS, root.p=NULL, intermediates=FALSE)
The parameter vector pars is ordered sA
, sB
, sAB
,
xA
, xB
, dA
, dB
. Unresolved clade methods
are not available for GeoSSE. With three states, it would rapidly
become computationally infeasible. The arguments of this function are
also explained in make.bisse
.
starting.point.geosse
produces a first-guess set of parameters,
ignoring character states.
This computer intensive code is experimental!
Jonathan Rolland
FitzJohn R.G. 2012. Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution. 3, 1084-1092.
FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.
Goldberg E.E., Lancaster L.T., and Ree R.H. 2011. Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. Syst. Biol. 60:451-465.
Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.
Nee S., May R.M., and Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:305-311.
constrain
for making submodels and reduce number
of parameters, find.mle
for ML parameter estimation,
mcmc
for MCMC integration, make.bisse
and
make.bisse.t
for further relevant examples.
The help page for find.mle
has further examples of ML
searches on full and constrained BiSSE models. Things work similarly
for GeoSSE and GeoSSE.t, just with different parameters.
See make.geosse
for explanation of the base model.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Parameter values pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() ## Simulate a tree set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) ## See the data statecols <- c("AB"="purple", "A"="blue", "B"="red") plot(phy, tip.color=statecols[phy$tip.state+1], cex=0.5) ## Create your list of functions. Its length corresponds to the number ## of parameters (speciation, extinction and dispersal) you want to ## estimate. ## For an unconstrained model, at least 7 parameters are estimated for ## sA, sB, sAB, xA, xB, dA, dB. ## In the case you want to define a model with linear functions of ## speciation and extinction, and constant dispersal: functions <- rep(c("linear.t", "constant.t"), c(5, 2)) ## Create the likelihood function lik <- make.geosse.t(phy, phy$tip.state, functions) ## This function will estimate a likelihood from 12 parameters. argnames(lik) ## Imagine that you want to get an estimate of the likelihood from a ## known set of parameters. pars <- c(0.01,0.001,0.01,0.001,0.01,0.001,0.02,0.002,0.02,0.002,0.1,0.1) names(pars)<-argnames(lik) lik(pars) # -640.1644 ## A guess at a starting point from character independent birth-death ## model (constant across time) . p <- starting.point.geosse(phy) #it only gives 7 parameters for time-constant model. names(p) ## it can be modified for time-dependent with a guess on the slopes of ## speciation and extinction rates. p.t<-c(p[1],0.001,p[2],0.001,p[3],0.001,p[4],0.001,p[5],0.001,p[6],p[7]) names(p.t)<-argnames(lik) ## Start an ML search from this point (takes from one minute to a very ## long time depending on your computer). ## Not run: fit <- find.mle(lik, p.t, method="subplex") fit$logLik coef(fit) ## End(Not run) ## A model with constraints on the dispersal rates. lik.d <- constrain(lik, dA ~ dB) ##Now dA and dB are the same parameter dB. argnames(lik.d) ##The parameter dA must be removed from maximum likelihood initial parameters ## Not run: fit.d <- find.mle(lik.d, p.t[-which(names(p.t)=="dA")]) fit$logLik coef(fit) ## End(Not run)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Parameter values pars <- c(1.5, 0.5, 1.0, 0.7, 0.7, 2.5, 0.5) names(pars) <- diversitree:::default.argnames.geosse() ## Simulate a tree set.seed(5) phy <- tree.geosse(pars, max.t=4, x0=0) ## See the data statecols <- c("AB"="purple", "A"="blue", "B"="red") plot(phy, tip.color=statecols[phy$tip.state+1], cex=0.5) ## Create your list of functions. Its length corresponds to the number ## of parameters (speciation, extinction and dispersal) you want to ## estimate. ## For an unconstrained model, at least 7 parameters are estimated for ## sA, sB, sAB, xA, xB, dA, dB. ## In the case you want to define a model with linear functions of ## speciation and extinction, and constant dispersal: functions <- rep(c("linear.t", "constant.t"), c(5, 2)) ## Create the likelihood function lik <- make.geosse.t(phy, phy$tip.state, functions) ## This function will estimate a likelihood from 12 parameters. argnames(lik) ## Imagine that you want to get an estimate of the likelihood from a ## known set of parameters. pars <- c(0.01,0.001,0.01,0.001,0.01,0.001,0.02,0.002,0.02,0.002,0.1,0.1) names(pars)<-argnames(lik) lik(pars) # -640.1644 ## A guess at a starting point from character independent birth-death ## model (constant across time) . p <- starting.point.geosse(phy) #it only gives 7 parameters for time-constant model. names(p) ## it can be modified for time-dependent with a guess on the slopes of ## speciation and extinction rates. p.t<-c(p[1],0.001,p[2],0.001,p[3],0.001,p[4],0.001,p[5],0.001,p[6],p[7]) names(p.t)<-argnames(lik) ## Start an ML search from this point (takes from one minute to a very ## long time depending on your computer). ## Not run: fit <- find.mle(lik, p.t, method="subplex") fit$logLik coef(fit) ## End(Not run) ## A model with constraints on the dispersal rates. lik.d <- constrain(lik, dA ~ dB) ##Now dA and dB are the same parameter dB. argnames(lik.d) ##The parameter dA must be removed from maximum likelihood initial parameters ## Not run: fit.d <- find.mle(lik.d, p.t[-which(names(p.t)=="dA")]) fit$logLik coef(fit) ## End(Not run)
Prepare to run a Mk2/Mk-n model on a phylogenetic tree and
binary/discrete trait data. This fits the Pagel 1994 model,
duplicating the ace
function in ape. Differences with that
function include (1) alternative root treatments are possible, (2)
easier to tweak parameter combinations through
constrain
, and (3) run both MCMC and MLE fits to
parameters. Rather than exponentiate the Q matrix, this
implementation solves the ODEs that this matrix defines. This may or
may not be robust on trees leading to low probabilities.
make.mk2(tree, states, strict=TRUE, control=list()) make.mkn(tree, states, k, strict=TRUE, control=list()) make.mkn.meristic(tree, states, k, control=list())
make.mk2(tree, states, strict=TRUE, control=list()) make.mkn(tree, states, k, strict=TRUE, control=list()) make.mkn.meristic(tree, states, k, control=list())
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be 0 or
1 for |
k |
Number of states to model. |
strict |
The |
control |
List of control parameters for the ODE solver. See Details below. |
make.mk2
and make.mkn
return functions of class mkn
.
These functions have argument list (and default values)
f(pars, pars, prior=NULL, root=ROOT.OBS, root.p=NULL, fail.value=NULL)
The arguments are interpreted as
pars
For make.mk2
, a vector of two parameters,
in the order q01
, q10
. For make.mkn
, a
vector of k(k-1)
parameters, in the order
q12,q13,...q1k, q21,q23,...,q2k,...qk(k-1)
, corresponding
to the off-diagonal elements of the Q
matrix in row order.
The order of parameters can be seen by running
argnames(f)
.
prior
: a valid prior. See make.prior
for
more information.
root
: Behaviour at the root (see Maddison et al. 2007,
FitzJohn et al. 2009). The possible options are
ROOT.FLAT
: A flat prior, weighting all variables
equally.
ROOT.EQUI
: Use the equilibrium distribution
of the model (not yet implemented).
ROOT.OBS
: Weight and
by their relative probability of observing the
data, following FitzJohn et al. 2009:
ROOT.GIVEN
: Root will be in state i
with probability root.p[i]
.
ROOT.BOTH
: Don't do anything at the root,
and return both values. (Note that this will not give you a
likelihood for use with ML or MCMC functions!).
root.p
: Vector of probabilities/weights to use when
ROOT.GIVEN
is specified. Must be of length k
(2 for
make.mk2
).
intermediates
: Add intermediates to the returned value as
attributes. Currently undocumented.
With more than 9 states, qij can be ambiguous (e.g. is q113 1->13 or 11->3?). To avoid this, the numbers are zero padded (so that the above would be q0113 or q1103 for 1->13 and 11->3 respectively). It might be easier to rename the arguments in practice though.
The control
argument controls how the calculations will be
carried out. It is a list, which may contain elements in
make.bisse
. In addition, the list element method
may be present, which selects between three different ways of
computing the likelihood:
method="exp"
: Uses a matrix exponentiation approach,
where all transition probabilities are computed (i.e., for a rate
matrix and time interval
, it computes
).
method="mk2"
: As for exp
, but for 2 states only.
Faster, direct, calculations are available here, rather than
numerically computing the exponentiation.
method="ode"
: Uses an ODE-based approach to compute
only the variables over time, rather than the
transition probabilities in the
exp
approach. This will be
much more efficient when k
is large.
Richard G. FitzJohn
constrain
for making submodels, find.mle
for ML parameter estimation, mcmc
for MCMC integration,
and make.bisse
for state-dependent birth-death models.
## Simulate a tree and character distribution. This is on a birth-death ## tree, with high rates of character evolution and an asymmetry in the ## character transition rates. pars <- c(.1, .1, .03, .03, .1, .2) set.seed(3) phy <- trees(pars, "bisse", max.taxa=25, max.t=Inf, x0=0)[[1]] ## Here is the 25 species tree with the true character history coded. ## Red is state '1', which has twice the character transition rate of ## black (state '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## Maximum likelihood parameter estimation: p <- c(.1, .1) # initial parameter guess ## Not run: lik <- make.mk2(phy, phy$tip.state) fit.mk2 <- find.mle(lik, p) coef(fit.mk2) # q10 >> q01 logLik(fit.mk2) # -10.9057 ## This can also be done using the more general Mk-n. ## This uses an approximation for the likelihood calculations. make.mkn ## assumes that states are numbered 1, 2, ..., k, so 1 needs to be added ## to the states returned by trees. lik.mkn <- make.mkn(phy, phy$tip.state + 1, 2) fit.mkn <- find.mle(lik.mkn, p) fit.mkn[1:2] ## These are the same (except for the naming of arguments) all.equal(fit.mkn[-7], fit.mk2[-7], check.attr=FALSE, tolerance=1e-7) ## Equivalence to ape's ace function: model <- matrix(c(0, 2, 1, 0), 2) fit.ape <- ace(phy$tip.state, phy, "discrete", model=model, ip=p) ## To do the comparison, we need to rerun the diversitree version with ## the same root conditions as ape. fit.mk2 <- find.mle(lik, p, root=ROOT.GIVEN, root.p=c(1,1)) ## These are the same to a reasonable degree of accuracy, too (the ## matrix exponentiation is slightly less accurate than the ODE ## solving approach. The make.mk2 version is exact) all.equal(fit.ape[c("rates", "loglik")], fit.mk2[1:2], check.attributes=FALSE, tolerance=1e-4) ## The ODE calculation method may be useful when there are a large ## number of possible states (say, over 20). lik.ode <- make.mkn(phy, phy$tip.state + 1, 2, control=list(method="ode")) fit.ode <- find.mle(lik.ode, p) fit.ode[1:2] all.equal(fit.ode[-7], fit.mkn[-7], tolerance=1e-7) ## End(Not run)
## Simulate a tree and character distribution. This is on a birth-death ## tree, with high rates of character evolution and an asymmetry in the ## character transition rates. pars <- c(.1, .1, .03, .03, .1, .2) set.seed(3) phy <- trees(pars, "bisse", max.taxa=25, max.t=Inf, x0=0)[[1]] ## Here is the 25 species tree with the true character history coded. ## Red is state '1', which has twice the character transition rate of ## black (state '0'). h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## Maximum likelihood parameter estimation: p <- c(.1, .1) # initial parameter guess ## Not run: lik <- make.mk2(phy, phy$tip.state) fit.mk2 <- find.mle(lik, p) coef(fit.mk2) # q10 >> q01 logLik(fit.mk2) # -10.9057 ## This can also be done using the more general Mk-n. ## This uses an approximation for the likelihood calculations. make.mkn ## assumes that states are numbered 1, 2, ..., k, so 1 needs to be added ## to the states returned by trees. lik.mkn <- make.mkn(phy, phy$tip.state + 1, 2) fit.mkn <- find.mle(lik.mkn, p) fit.mkn[1:2] ## These are the same (except for the naming of arguments) all.equal(fit.mkn[-7], fit.mk2[-7], check.attr=FALSE, tolerance=1e-7) ## Equivalence to ape's ace function: model <- matrix(c(0, 2, 1, 0), 2) fit.ape <- ace(phy$tip.state, phy, "discrete", model=model, ip=p) ## To do the comparison, we need to rerun the diversitree version with ## the same root conditions as ape. fit.mk2 <- find.mle(lik, p, root=ROOT.GIVEN, root.p=c(1,1)) ## These are the same to a reasonable degree of accuracy, too (the ## matrix exponentiation is slightly less accurate than the ODE ## solving approach. The make.mk2 version is exact) all.equal(fit.ape[c("rates", "loglik")], fit.mk2[1:2], check.attributes=FALSE, tolerance=1e-4) ## The ODE calculation method may be useful when there are a large ## number of possible states (say, over 20). lik.ode <- make.mkn(phy, phy$tip.state + 1, 2, control=list(method="ode")) fit.ode <- find.mle(lik.ode, p) fit.ode[1:2] all.equal(fit.ode[-7], fit.mkn[-7], tolerance=1e-7) ## End(Not run)
Prepare to run MuSSE (Multi-State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
MuSSE is agnostic as to whether multiple states or multiple traits are modelled (following Pagel 1994). Instead, a transition rate matrix amongst possible trait/state combinations is constructed and the analysis is conducted on this.
The helper function make.musse.multitrait
wraps the
basic MuSSE model for the case of a combination of several binary
traits; its argument handling are a little different; please see the
help page for more information.
make.musse(tree, states, k, sampling.f=NULL, strict=TRUE, control=list()) starting.point.musse(tree, k, q.div=5, yule=FALSE)
make.musse(tree, states, k, sampling.f=NULL, strict=TRUE, control=list()) starting.point.musse(tree, k, q.div=5, yule=FALSE)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be an
integer between 1 and |
k |
The number of states. |
sampling.f |
Vector of length |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
q.div |
Ratio of diversification rate to character change rate. Eventually this will be changed to allow for Mk2 to be used for estimating q parameters. |
yule |
Logical: should starting parameters be Yule estimates rather than birth-death estimates? |
With more than 9 states, qij can be ambiguous (e.g. is q113 1->13 or 11->3?). To avoid this, the numbers are zero padded (so that the above would be q0113 or q1103 for 1->13 and 11->3 respectively). It might be easier to rename the arguments in practice though.
Richard G. FitzJohn
make.bisse
for the basic binary model, and
make.musse.multitrait
for the case where the data are
really combinations of binary traits.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## 1: BiSSE equivalence pars <- c(.1, .2, .03, .04, 0.05, 0.1) set.seed(2) phy <- tree.musse(pars, 20, x0=1) ## Show that the likelihood functions give the same answers. Ignore the ## warning when creating the MuSSE function. lik.b <- make.bisse(phy, phy$tip.state-1) lik.m <- make.musse(phy, phy$tip.state, 2) all.equal(lik.b(pars), lik.m(pars), tolerance=1e-7) ## Notice that default argument names are different between BiSSE and ## MuSSE, but that the order is the same. argnames(lik.b) # BiSSE: 0/1 argnames(lik.m) # MuSSE: 1/2 ## 2: A 3-state example where movement is only allowed between ## neighbouring states (1 <-> 2 <-> 3), and where speciation and ## extinction rates increase moving from 1 -> 2 -> 3: ## You can get the expected argument order for any number of states ## this way (sorry - clunky). The help file also lists the order. diversitree:::default.argnames.musse(3) ## Here are the parameters: pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1) ## Extract history from simulated tree and plot ## (colours are 1: black, 2: red, 3: blue) col <- c("blue", "orange", "red") h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7, col=col) ## The states are numbered 1:3, rather than 0:1 in bisse. states <- phy$tip.state table(states) ## 2: Likelihood ## Making a likelihood function is basically identical to bisse. The ## third argument needs to be the number of states. In a future ## version this will probably be max(states), but there are some ## pitfalls about this that I am still worried about. lik <- make.musse(phy, states, 3) ## Here are the arguments. Even with three states, this is getting ## ridiculous. argnames(lik) ## Start with a fully constrained model, but still enforcing stepwise ## changes (disallowing 1 <-> 3 shifts) lik.base <- constrain(lik, lambda2 ~ lambda1, lambda3 ~ lambda1, mu2 ~ mu1, mu3 ~ mu1, q13 ~ 0, q21 ~ q12, q23 ~ q12, q31 ~ 0, q32 ~ q12) ## Not run: p <- starting.point.musse(phy, 3) fit.base <- find.mle(lik.base, p[argnames(lik.base)]) ## Now, allow the speciation rates to vary: lik.lambda <- constrain(lik, mu2 ~ mu1, mu3 ~ mu1, q13 ~ 0, q21 ~ q12, q23 ~ q12, q31 ~ 0, q32 ~ q12) fit.lambda <- find.mle(lik.lambda, p[argnames(lik.lambda)]) ## Very little improvement in fit (this *is* a small tree) anova(fit.base, lambda=fit.lambda) ## Run an MCMC with this set. Priors will be necessary (using the ## usual exponential with mean of 2r) prior <- make.prior.exponential(1 / (2 * (p[1] - p[4]))) samples <- mcmc(lik.lambda, coef(fit.lambda), nstep=1000, w=1, prior=prior, print.every=50) ## Posterior probability profile plots for the three speciation rates. profiles.plot(samples[2:4], col) abline(v=c(.1, .15, .2), col=col) ## End(Not run)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## 1: BiSSE equivalence pars <- c(.1, .2, .03, .04, 0.05, 0.1) set.seed(2) phy <- tree.musse(pars, 20, x0=1) ## Show that the likelihood functions give the same answers. Ignore the ## warning when creating the MuSSE function. lik.b <- make.bisse(phy, phy$tip.state-1) lik.m <- make.musse(phy, phy$tip.state, 2) all.equal(lik.b(pars), lik.m(pars), tolerance=1e-7) ## Notice that default argument names are different between BiSSE and ## MuSSE, but that the order is the same. argnames(lik.b) # BiSSE: 0/1 argnames(lik.m) # MuSSE: 1/2 ## 2: A 3-state example where movement is only allowed between ## neighbouring states (1 <-> 2 <-> 3), and where speciation and ## extinction rates increase moving from 1 -> 2 -> 3: ## You can get the expected argument order for any number of states ## this way (sorry - clunky). The help file also lists the order. diversitree:::default.argnames.musse(3) ## Here are the parameters: pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1) ## Extract history from simulated tree and plot ## (colours are 1: black, 2: red, 3: blue) col <- c("blue", "orange", "red") h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7, col=col) ## The states are numbered 1:3, rather than 0:1 in bisse. states <- phy$tip.state table(states) ## 2: Likelihood ## Making a likelihood function is basically identical to bisse. The ## third argument needs to be the number of states. In a future ## version this will probably be max(states), but there are some ## pitfalls about this that I am still worried about. lik <- make.musse(phy, states, 3) ## Here are the arguments. Even with three states, this is getting ## ridiculous. argnames(lik) ## Start with a fully constrained model, but still enforcing stepwise ## changes (disallowing 1 <-> 3 shifts) lik.base <- constrain(lik, lambda2 ~ lambda1, lambda3 ~ lambda1, mu2 ~ mu1, mu3 ~ mu1, q13 ~ 0, q21 ~ q12, q23 ~ q12, q31 ~ 0, q32 ~ q12) ## Not run: p <- starting.point.musse(phy, 3) fit.base <- find.mle(lik.base, p[argnames(lik.base)]) ## Now, allow the speciation rates to vary: lik.lambda <- constrain(lik, mu2 ~ mu1, mu3 ~ mu1, q13 ~ 0, q21 ~ q12, q23 ~ q12, q31 ~ 0, q32 ~ q12) fit.lambda <- find.mle(lik.lambda, p[argnames(lik.lambda)]) ## Very little improvement in fit (this *is* a small tree) anova(fit.base, lambda=fit.lambda) ## Run an MCMC with this set. Priors will be necessary (using the ## usual exponential with mean of 2r) prior <- make.prior.exponential(1 / (2 * (p[1] - p[4]))) samples <- mcmc(lik.lambda, coef(fit.lambda), nstep=1000, w=1, prior=prior, print.every=50) ## Posterior probability profile plots for the three speciation rates. profiles.plot(samples[2:4], col) abline(v=c(.1, .15, .2), col=col) ## End(Not run)
Prepare to run MuSSE or Mkn (Multi-State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
This is a helper function that wraps the basic MuSSE/Mkn models for the case of a combination of several binary traits; its parametrisation and argument handling are a little different to the other models in diversitree.
make.musse.multitrait(tree, states, sampling.f=NULL, depth=NULL, allow.multistep=FALSE, strict=TRUE, control=list()) make.mkn.multitrait(tree, states, depth=NULL, allow.multistep=FALSE, strict=TRUE, control=list()) musse.multitrait.translate(n.trait, depth=NULL, names=NULL, allow.multistep=FALSE) mkn.multitrait.translate(n.trait, depth=NULL, names=NULL, allow.multistep=FALSE) starting.point.musse.multitrait(tree, lik, q.div=5, yule=FALSE)
make.musse.multitrait(tree, states, sampling.f=NULL, depth=NULL, allow.multistep=FALSE, strict=TRUE, control=list()) make.mkn.multitrait(tree, states, depth=NULL, allow.multistep=FALSE, strict=TRUE, control=list()) musse.multitrait.translate(n.trait, depth=NULL, names=NULL, allow.multistep=FALSE) mkn.multitrait.translate(n.trait, depth=NULL, names=NULL, allow.multistep=FALSE) starting.point.musse.multitrait(tree, lik, q.div=5, yule=FALSE)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A |
depth |
A scalar or vector of length 3 indicating the depth of interactions to include in the model. See Details. |
allow.multistep |
Should transition rates be included that imply
simultaneous changes in more than one trait? By default this is not
allowed, but if set to |
sampling.f |
Scalar with the estimated proportion of extant
species that are included in the phylogeny. A value of |
strict |
Each column in |
control |
List of control parameters for the ODE solver. See
details in |
lik |
A likelihood function created by
|
q.div |
Ratio of diversification rate to character change rate. Eventually this will be changed to allow for Mk2 to be used for estimating q parameters. |
yule |
Logical: should starting parameters be Yule estimates rather than birth-death estimates? |
n.trait |
Number of binary traits. |
names |
Vector of names for the traits when using musse.multitrait.translate (optional). |
Suppose that you have two binary traits that may affect speciation and
extinction. In previous versions of diversitree, you had to code the
possible combinations as states 1, 2, 3, 4, which makes the
interpretation of the speciation rates (lambda1
,
lambda2
, etc) unintuitive.
Let states
is a data.frame with columns "A" and "B",
representing the two binary traits. We can write the speciation rate
as
where and
are indicator variables that take the
value of trait A and B respectively (with values 0 or 1). In this
form,
is the intercept,
and
are "main
effects" of traits A and B, and
is the
"interaction" between these. We can do a similar trick for the
extinction rates.
For character transition rates, we first consider changes only in a
single trait. For our two trait case we have four "types" of
character change allowed (A 0->1, A 1->0, B 0->1, and B 1->0), but the
rates of change for trait A might depend on the current state of trait
B (and vice versa). So we have, for the A0->1 trait change
. Note that one fewer levels of
interaction are possible for these character changes than for the
speciation/extinction parameters.
It may sometimes be desirable to have the multi-trait changes in the
model. At present, if allow.multistep
is TRUE
, all the
multiple change transitions are included at the end of the parameter
vector. For the two trait case these are labelled q00.11
,
q10.01
, q01.10
, and q11.00
, where qij.kl
represents a change from (A=i, B=j) to (C=k, D=l). The argument name,
and treatment, of these may change in future.
This approach generalises out to more than two traits. For N
traits, interactions are possible up to the N
th order for
lambda and mu, and up to the N-1
th order for q. The
depth
argument controls how many of these are returned. If
this is a scalar, then the same level is used for lambda
,
mu
and q
. If it is a vector of length 3, then different
depths are used for these three types of parameters. By default, all
possible interactions are returned and the model has the same number
of degrees of freedom as the models returned by make.musse
(except for a reduction in the possible q parameters when
allow.multistep
is FALSE
). Parameters can then be
further refined with constrain
.
Richard G. FitzJohn
make.bisse
for the basic binary model, and
make.musse
for the basic multistate model.
## The translation between these two bases is fairly straightforward; if ## we have a vector of parameters in our new basis 'p' we can convert it ## into the original MuSSE basis ('q') through this matrix: tr <- musse.multitrait.translate(2) tr ## Notice that the rows that correspond to transitions in multiple ## traits are all zero by default; this means that these q values will ## be zero regardless of the parameter vector used. tr["q00.11",] ## And here is the section of the transition matrix corresponding to the ## lambda values; every rate gets a contribution from the intercept term ## (lambda0), lambda10 and lambda11 get a contribution from lambdaA, etc. tr[1:4,1:4] ## There is currently no nice simulation support for this, so bear with ## an ugly script to generate the tree and traits. pars <- c(.10, .15, .20, .25, # lambda 00, 10, 01, 11 .03, .03, .03, .03, # mu 00, 10, 01, 11 .05, .05, .0, # q00.10, q00.01, q00.11 .05, .0, .05, # q10.00, q10.01, q10.11 .05, .0, .05, # q01.00, q01.10, q01.11 .0, .05, .05) # q11.00, q11.10, q11.01 set.seed(2) phy <- tree.musse(pars, 60, x0=1) states <- expand.grid(A=0:1, B=0:1)[phy$tip.state,] rownames(states) <- phy$tip.label ## Here, states has row names corresponding to the different taxa, and ## the states of two traits "A" and "B" are recorded in the columns. head(states) ## Note that transition from the original MuSSE basis to this basis is ## only possible in general when depth=n.trait and allow.multistep=TRUE ## (as only this generates a square matrix that is invertible). ## However, when it is possible to express the set of parameters in the ## new basis (as it is above), this can be done through a pseudoinverse ## (here, a left inverse). pars2 <- drop(solve(t(tr) %*% tr) %*% t(tr) %*% pars) ## Going from our new basis to the original MuSSE parameters is always ## straightforward. This is done automatically in the likelihood ## function. all.equal(drop(tr %*% pars2), pars, check.attributes=FALSE) ## This shows that the two traits act additively on speciation rate ## (lambdaAB is zero), that there is no effect of any trait on ## extinction (the only nonzero mu parameter is mu0) and transition ## rates for one trait are unaffected by other traits (the only nonzero ## q parameters are the qXij.0 parameters; qXij.Y parameters are all ## zero). ## Here is our new MuSSE function parametrised as a multi-trait ## function: lik <- make.musse.multitrait(phy, states) ## Here are the argument names for the likelihood function. argnames(lik) ## Basic MuSSE function for comparison lik.m <- make.musse(phy, phy$tip.state, 4) argnames(lik.m) ## Rather than fit this complicated model first, let's start with a ## simple model with no state dependent diversification. This model ## allows the forwards and backwards transition rates to vary, but the ## speciation and extinction rates do not depend on the character ## state: lik0 <- make.musse.multitrait(phy, states, depth=0) argnames(lik0) ## This can be used in analyses as usual. However, this can take a ## while to run, so is not run by default. ## Not run: p <- starting.point.musse.multitrait(phy, lik0) fit0 <- find.mle(lik0, p) ## Now, allow the speciation rates to vary additively with both ## character states (extinction and character changes are left as in the ## previous model) lik1 <- make.musse.multitrait(phy, states, depth=c(1, 0, 0)) ## Start from the previous ML point: p <- starting.point.musse.multitrait(phy, lik1) p[names(coef(fit0))] <- coef(fit0) fit1 <- find.mle(lik1, p) ## The likelihood improves, but the difference is not statistically ## significant (p = 0.35). anova(fit1, fit0) ## We can fit an interaction for the speciation rates, too: lik2 <- make.musse.multitrait(phy, states, depth=c(2, 0, 0)) p <- starting.point.musse.multitrait(phy, lik2) p[names(coef(fit1))] <- coef(fit1) fit2 <- find.mle(lik2, p) ## There is next to no support for the interaction term (which is good, ## as the original model did not have any interaction!) anova(fit2, fit1) ## Constraining also works with these models. For example, constraining ## the lambdaA parameter to zero: lik1b <- constrain(lik1, lambdaA ~ 0) argnames(lik1b) p <- starting.point.musse.multitrait(phy, lik1b) p[names(coef(fit0))] <- coef(fit0) fit1b <- find.mle(lik1b, p) anova(fit1b, fit0) ## Or constraining both main effects to take the same value: lik1c <- constrain(lik1, lambdaB ~ lambdaA) argnames(lik1c) p <- starting.point.musse.multitrait(phy, lik1c) p[names(coef(fit0))] <- coef(fit0) fit1c <- find.mle(lik1c, p) anova(fit1c, fit0) ## End(Not run)
## The translation between these two bases is fairly straightforward; if ## we have a vector of parameters in our new basis 'p' we can convert it ## into the original MuSSE basis ('q') through this matrix: tr <- musse.multitrait.translate(2) tr ## Notice that the rows that correspond to transitions in multiple ## traits are all zero by default; this means that these q values will ## be zero regardless of the parameter vector used. tr["q00.11",] ## And here is the section of the transition matrix corresponding to the ## lambda values; every rate gets a contribution from the intercept term ## (lambda0), lambda10 and lambda11 get a contribution from lambdaA, etc. tr[1:4,1:4] ## There is currently no nice simulation support for this, so bear with ## an ugly script to generate the tree and traits. pars <- c(.10, .15, .20, .25, # lambda 00, 10, 01, 11 .03, .03, .03, .03, # mu 00, 10, 01, 11 .05, .05, .0, # q00.10, q00.01, q00.11 .05, .0, .05, # q10.00, q10.01, q10.11 .05, .0, .05, # q01.00, q01.10, q01.11 .0, .05, .05) # q11.00, q11.10, q11.01 set.seed(2) phy <- tree.musse(pars, 60, x0=1) states <- expand.grid(A=0:1, B=0:1)[phy$tip.state,] rownames(states) <- phy$tip.label ## Here, states has row names corresponding to the different taxa, and ## the states of two traits "A" and "B" are recorded in the columns. head(states) ## Note that transition from the original MuSSE basis to this basis is ## only possible in general when depth=n.trait and allow.multistep=TRUE ## (as only this generates a square matrix that is invertible). ## However, when it is possible to express the set of parameters in the ## new basis (as it is above), this can be done through a pseudoinverse ## (here, a left inverse). pars2 <- drop(solve(t(tr) %*% tr) %*% t(tr) %*% pars) ## Going from our new basis to the original MuSSE parameters is always ## straightforward. This is done automatically in the likelihood ## function. all.equal(drop(tr %*% pars2), pars, check.attributes=FALSE) ## This shows that the two traits act additively on speciation rate ## (lambdaAB is zero), that there is no effect of any trait on ## extinction (the only nonzero mu parameter is mu0) and transition ## rates for one trait are unaffected by other traits (the only nonzero ## q parameters are the qXij.0 parameters; qXij.Y parameters are all ## zero). ## Here is our new MuSSE function parametrised as a multi-trait ## function: lik <- make.musse.multitrait(phy, states) ## Here are the argument names for the likelihood function. argnames(lik) ## Basic MuSSE function for comparison lik.m <- make.musse(phy, phy$tip.state, 4) argnames(lik.m) ## Rather than fit this complicated model first, let's start with a ## simple model with no state dependent diversification. This model ## allows the forwards and backwards transition rates to vary, but the ## speciation and extinction rates do not depend on the character ## state: lik0 <- make.musse.multitrait(phy, states, depth=0) argnames(lik0) ## This can be used in analyses as usual. However, this can take a ## while to run, so is not run by default. ## Not run: p <- starting.point.musse.multitrait(phy, lik0) fit0 <- find.mle(lik0, p) ## Now, allow the speciation rates to vary additively with both ## character states (extinction and character changes are left as in the ## previous model) lik1 <- make.musse.multitrait(phy, states, depth=c(1, 0, 0)) ## Start from the previous ML point: p <- starting.point.musse.multitrait(phy, lik1) p[names(coef(fit0))] <- coef(fit0) fit1 <- find.mle(lik1, p) ## The likelihood improves, but the difference is not statistically ## significant (p = 0.35). anova(fit1, fit0) ## We can fit an interaction for the speciation rates, too: lik2 <- make.musse.multitrait(phy, states, depth=c(2, 0, 0)) p <- starting.point.musse.multitrait(phy, lik2) p[names(coef(fit1))] <- coef(fit1) fit2 <- find.mle(lik2, p) ## There is next to no support for the interaction term (which is good, ## as the original model did not have any interaction!) anova(fit2, fit1) ## Constraining also works with these models. For example, constraining ## the lambdaA parameter to zero: lik1b <- constrain(lik1, lambdaA ~ 0) argnames(lik1b) p <- starting.point.musse.multitrait(phy, lik1b) p[names(coef(fit0))] <- coef(fit0) fit1b <- find.mle(lik1b, p) anova(fit1b, fit0) ## Or constraining both main effects to take the same value: lik1c <- constrain(lik1, lambdaB ~ lambdaA) argnames(lik1c) p <- starting.point.musse.multitrait(phy, lik1c) p[names(coef(fit0))] <- coef(fit0) fit1c <- find.mle(lik1c, p) anova(fit1c, fit0) ## End(Not run)
Create a likelihood function for a MuSSE model where the tree is partitioned into regions with different parameters.
make.musse.split(tree, states, k, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list())
make.musse.split(tree, states, k, nodes, split.t, sampling.f=NULL, strict=TRUE, control=list())
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be an
integer between 1 and |
k |
The number of states. |
nodes |
Vector of nodes that will be split (see Details). |
split.t |
Vector of split times, same length as |
sampling.f |
Vector of length |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
Branching times can be controlled with the split.t
argument. If this is Inf
, split at the base of the branch (as in
MEDUSA). If 0
, split at the top (closest to the present, as in
the new option for MEDUSA). If 0 < split.t < Inf
then we split
at that time on the tree (zero is the present, with time growing
backwards).
Richard G. FitzJohn
## This example picks up from the tree used in the ?make.musse example. ## First, simulate the tree: set.seed(2) pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 phy <- tree.musse(pars, 30, x0=1) ## Here is the phylogeny, with true character history superposed: h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, show.node.label=TRUE, font=1, cex=.75, no.margin=TRUE) ## Here is a plain MuSSE function for later comparison: lik.m <- make.musse(phy, phy$tip.state, 3) lik.m(pars) # -110.8364 ## Split this phylogeny at three points: nd16 and nd25, splitting it ## into three chunks nodes <- c("nd16", "nd25") nodelabels(node=match(nodes, phy$node.label) + length(phy$tip.label), pch=19, cex=2, col="#FF000099") ## To make a split BiSSE function, pass the node locations and times ## in. Here, we'll use 'Inf' as the split time to mimick MEDUSA's ## behaviour of placing the split at the base of the branch subtended by ## a node. lik.s <- make.musse.split(phy, phy$tip.state, 3, nodes, split.t=Inf) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1]: argnames(lik.s) ## Because we have two nodes, there are three sets of parameters. ## Replicate the original list to get a starting point for the analysis: pars.s <- rep(pars, 3) names(pars.s) <- argnames(lik.s) lik.s(pars.s) # -110.8364 ## This is basically identical (to acceptable tolerance) to the plain ## MuSSE version: lik.s(pars.s) - lik.m(pars) ## The resulting likelihood function can be used in ML analyses with ## find.mle. However, because of the large number of parameters, this ## may take some time (especially with as few species as there are in ## this tree - getting convergence in a reasonable number of iterations ## is difficult). ## Not run: fit.s <- find.mle(lik.s, pars.s, control=list(maxit=20000)) ## End(Not run) ## Bayesian analysis also works, using the mcmc function. Given the ## large number of parameters, priors will be essential, as there will ## be no signal for several parameters. Here, I am using an exponential ## distribution with a mean of twice the state-independent ## diversification rate. ## Not run: prior <- make.prior.exponential(1/(-2*diff(starting.point.bd(phy)))) samples <- mcmc(lik.s, pars.s, 100, prior=prior, w=1, print.every=10) ## End(Not run)
## This example picks up from the tree used in the ?make.musse example. ## First, simulate the tree: set.seed(2) pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 phy <- tree.musse(pars, 30, x0=1) ## Here is the phylogeny, with true character history superposed: h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, show.node.label=TRUE, font=1, cex=.75, no.margin=TRUE) ## Here is a plain MuSSE function for later comparison: lik.m <- make.musse(phy, phy$tip.state, 3) lik.m(pars) # -110.8364 ## Split this phylogeny at three points: nd16 and nd25, splitting it ## into three chunks nodes <- c("nd16", "nd25") nodelabels(node=match(nodes, phy$node.label) + length(phy$tip.label), pch=19, cex=2, col="#FF000099") ## To make a split BiSSE function, pass the node locations and times ## in. Here, we'll use 'Inf' as the split time to mimick MEDUSA's ## behaviour of placing the split at the base of the branch subtended by ## a node. lik.s <- make.musse.split(phy, phy$tip.state, 3, nodes, split.t=Inf) ## The parameters must be a list of the same length as the number of ## partitions. Partition '1' is the root partition, and partition i is ## the partition rooted at the node[i-1]: argnames(lik.s) ## Because we have two nodes, there are three sets of parameters. ## Replicate the original list to get a starting point for the analysis: pars.s <- rep(pars, 3) names(pars.s) <- argnames(lik.s) lik.s(pars.s) # -110.8364 ## This is basically identical (to acceptable tolerance) to the plain ## MuSSE version: lik.s(pars.s) - lik.m(pars) ## The resulting likelihood function can be used in ML analyses with ## find.mle. However, because of the large number of parameters, this ## may take some time (especially with as few species as there are in ## this tree - getting convergence in a reasonable number of iterations ## is difficult). ## Not run: fit.s <- find.mle(lik.s, pars.s, control=list(maxit=20000)) ## End(Not run) ## Bayesian analysis also works, using the mcmc function. Given the ## large number of parameters, priors will be essential, as there will ## be no signal for several parameters. Here, I am using an exponential ## distribution with a mean of twice the state-independent ## diversification rate. ## Not run: prior <- make.prior.exponential(1/(-2*diff(starting.point.bd(phy)))) samples <- mcmc(lik.s, pars.s, 100, prior=prior, w=1, print.every=10) ## End(Not run)
Create a likelihood function for a MuSSE model where different chunks of time have different parameters. This code is experimental!
make.musse.td(tree, states, k, n.epoch, sampling.f=NULL, strict=TRUE, control=list()) make.musse.t(tree, states, k, functions, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
make.musse.td(tree, states, k, n.epoch, sampling.f=NULL, strict=TRUE, control=list()) make.musse.t(tree, states, k, functions, sampling.f=NULL, strict=TRUE, control=list(), truncate=FALSE, spline.data=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be an
integer between 1 and |
k |
The number of states. |
n.epoch |
Number of epochs. 1 corresponds to plain MuSSE, so this will generally be an integer at least 2. |
functions |
A named list of functions of time. See details. |
sampling.f |
Vector of length |
strict |
The |
control |
List of control parameters for the ODE solver. See
details in |
truncate |
Logical, indicating if functions should be truncated to zero when negative (rather than causing an error). May be scalar (applying to all functions) or a vector (of same length as the functions vector). |
spline.data |
List of data for spline-based time functions. See details |
.
Please see make.bisse.t
for further details.
Richard G. FitzJohn
## Here we will start with the tree and three-state character set from ## the make.musse example. This is a poorly contrived example. pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1) ## Suppose we want to see if diversification is different in the most ## recent 3 time units, compared with the rest of the tree (yes, this is ## a totally contrived example!): plot(phy) axisPhylo() abline(v=max(branching.times(phy)) - 3, col="red", lty=3) ## For comparison, make a plain MuSSE likelihood function lik.m <- make.musse(phy, phy$tip.state, 3) ## Create the time-dependent likelihood function. The final argument ## here is the number of 'epochs' that are allowed. Two epochs is one ## switch point. lik.t <- make.musse.td(phy, phy$tip.state, 3, 2) ## The switch point is the first argument. The remaining 12 parameters ## are the MuSSE parameters, with the first 6 being the most recent ## epoch. argnames(lik.t) pars.t <- c(3, pars, pars) names(pars.t) <- argnames(lik.t) ## Calculations are identical to a reasonable tolerance: lik.m(pars) - lik.t(pars.t) ## It will often be useful to constrain the time as a fixed quantity. lik.t2 <- constrain(lik.t, t.1 ~ 3) ## Parameter estimation under maximum likelihood. This is marked "don't ## run" because the time-dependent fit takes a few minutes. ## Not run: ## Fit the MuSSE ML model fit.m <- find.mle(lik.m, pars) ## And fit the MuSSE/td model fit.t <- find.mle(lik.t2, pars.t[argnames(lik.t2)], control=list(maxit=20000)) ## Compare these two fits with a likelihood ratio test (lik.t2 is nested ## within lik.m) anova(fit.m, td=fit.t) ## End(Not run)
## Here we will start with the tree and three-state character set from ## the make.musse example. This is a poorly contrived example. pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1) ## Suppose we want to see if diversification is different in the most ## recent 3 time units, compared with the rest of the tree (yes, this is ## a totally contrived example!): plot(phy) axisPhylo() abline(v=max(branching.times(phy)) - 3, col="red", lty=3) ## For comparison, make a plain MuSSE likelihood function lik.m <- make.musse(phy, phy$tip.state, 3) ## Create the time-dependent likelihood function. The final argument ## here is the number of 'epochs' that are allowed. Two epochs is one ## switch point. lik.t <- make.musse.td(phy, phy$tip.state, 3, 2) ## The switch point is the first argument. The remaining 12 parameters ## are the MuSSE parameters, with the first 6 being the most recent ## epoch. argnames(lik.t) pars.t <- c(3, pars, pars) names(pars.t) <- argnames(lik.t) ## Calculations are identical to a reasonable tolerance: lik.m(pars) - lik.t(pars.t) ## It will often be useful to constrain the time as a fixed quantity. lik.t2 <- constrain(lik.t, t.1 ~ 3) ## Parameter estimation under maximum likelihood. This is marked "don't ## run" because the time-dependent fit takes a few minutes. ## Not run: ## Fit the MuSSE ML model fit.m <- find.mle(lik.m, pars) ## And fit the MuSSE/td model fit.t <- find.mle(lik.t2, pars.t[argnames(lik.t2)], control=list(maxit=20000)) ## Compare these two fits with a likelihood ratio test (lik.t2 is nested ## within lik.m) anova(fit.m, td=fit.t) ## End(Not run)
Generate the likelihood function that underlies PGLS
(Phylogenetic Generalised Least Squares). This is a bit of a misnomer
here, as you may not be interested in least squares (e.g., if using
this with mcmc
for Bayesian inference).
make.pgls(tree, formula, data, control=list())
make.pgls(tree, formula, data, control=list())
tree |
A bifurcating phylogenetic tree, in |
formula |
A model formula; see |
data |
A data frame containing the variables in the model. If
not found in |
control |
A list of control parameters. Currently the only
option is the key “method” which can be |
Richard G. FitzJohn
Freckleton R.P. 2012. Fast likelihood calculations for comparative analyses. Methods in Ecology and Evolution 3: 940-947.
Functions for generating prior functions for use with
mcmc
, etc.
make.prior.exponential(r) make.prior.uniform(lower, upper, log=TRUE)
make.prior.exponential(r) make.prior.uniform(lower, upper, log=TRUE)
r |
Scalar or vector of rate parameters |
lower |
Lower bound of the parameter |
upper |
Upper bound of the parameter |
log |
Logical: should the prior be on a log basis? |
The exponential prior probability distribution has probability density
where the denotes the
th parameter. If
r
is a
scalar, then the same rate is used for all parameters.
These functions each return a function that may be used as the
prior
argument to mcmc()
.
Richard G. FitzJohn
Prepare to run QuaSSE (Quantitative State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.
make.quasse(tree, states, states.sd, lambda, mu, control, sampling.f=NULL) starting.point.quasse(tree, states, states.sd=NULL)
make.quasse(tree, states, states.sd, lambda, mu, control, sampling.f=NULL) starting.point.quasse(tree, states, states.sd=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be a
numeric real values. Missing values ( |
states.sd |
A scalar or vector corresponding to the standard error around the mean in states (the initial probability distribution is assumed to be normal). |
lambda |
A function to use as the speciation function. The first
argument of this must be |
mu |
A function to use as the extinction function. The first
argument of this must be |
control |
A list of parameters for tuning the performance of the integrator. A guess at reasonble values will be made here. See Details for possible entries. |
sampling.f |
Scalar with the estimated proportion of extant
species that are included in the phylogeny. A value of |
The control
list may contain the following elements:
method
: one of fftC
or fftR
to switch
between C
(fast) and R (slow) backends for the integration.
Both use non-adaptive fft-based convolutions. Eventually, an
adaptive methods-of-lines approach will be available.
dt.max
: Maximum time step to use for the integration.
By default, this will be set to 1/1000 of the tree depth. Smaller
values will slow down calculations, but improve accuracy.
nx
: The number of bins into which the character space
is divided (default=1024). Larger values will be slower and more
accurate. For the fftC
integration method, this should be an
integer power of 2 (512, 2048, etc).
r
: Scaling factor that multiplies nx
for a "high
resolution" section at the tips of the tree (default=4, giving a
high resolution character space divided into 4096 bins). This helps
improve accuracy while possibly tight initial probability
distributions flatten out as time progresses towards the root.
Larger values will be slower and more accurate. For the fftC
integration method, this should be a power of 2 (2, 4, 8, so that
nx*r
is a power of 2).
tc
: where in the tree to switch to the low-resolution
integration (zero corresponds to the present, larger numbers moving
towards the root). By default, this happens at 10% of the tree
depth. Smaller values will be faster, but less accurate.
xmid
: Mid point to center the character space. By
default this is at the mid point of the extremes of the character
states.
tips.combined
: Get a modest speed-up by simultaneously
integrating all tips? By default, this is FALSE
, but
speedups of up to 25% are possible with this set to TRUE
.
w
: Number of standard deviations of the normal
distribution induced by Brownian motion to use when doing the
convolutions (default=5). Probably best to leave this one alone.
In an attempt at being computationally efficient, a substantial amount
of information is cached in memory so that it does not have to be
created each time. However, this can interact poorly with the
multicore
package. In particular, likelihood functions should
not be made within a call to mclapply
, or they will not share
memory with the main R thread, and will not work (this will cause an
error, but should no longer crash R).
The method has less general testing than BiSSE, and is a little more
fragile. In particular, because of the way that I chose to implement
the integrator, there is a very real chance of likelihood calculation
failure when your data are a poor fit to the model; this can be
annoyingly difficult to diagnose (you will just get a -Inf
log
likelihood, but the problem is often just caused by two sister species
on short branches with quite different states). There are also a
large number of options for fine tuning the integration, but these
aren't really discussed in any great detail anywhere.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Example showing simple integration with two different backends, ## plus the splits. lambda <- function(x) sigmoid.x(x, 0.1, 0.2, 0, 2.5) mu <- function(x) constant.x(x, 0.03) char <- make.brownian.with.drift(0, 0.025) set.seed(1) phy <- tree.quasse(c(lambda, mu, char), max.taxa=15, x0=0, single.lineage=FALSE, verbose=TRUE) nodes <- c("nd13", "nd9", "nd5") split.t <- Inf pars <- c(.1, .2, 0, 2.5, .03, 0, .01) pars4 <- unlist(rep(list(pars), 4)) sd <- 1/200 control.C.1 <- list(dt.max=1/200) ## Not run: control.R.1 <- list(dt.max=1/200, method="fftR") lik.C.1 <- make.quasse(phy, phy$tip.state, sd, sigmoid.x, constant.x, control.C.1) (ll.C.1 <- lik.C.1(pars)) # -62.06409 ## slow... lik.R.1 <- make.quasse(phy, phy$tip.state, sd, sigmoid.x, constant.x, control.R.1) (ll.R.1 <- lik.R.1(pars)) # -62.06409 lik.s.C.1 <- make.quasse.split(phy, phy$tip.state, sd, sigmoid.x, constant.x, nodes, split.t, control.C.1) (ll.s.C.1 <- lik.s.C.1(pars4)) # -62.06409 ## End(Not run)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Example showing simple integration with two different backends, ## plus the splits. lambda <- function(x) sigmoid.x(x, 0.1, 0.2, 0, 2.5) mu <- function(x) constant.x(x, 0.03) char <- make.brownian.with.drift(0, 0.025) set.seed(1) phy <- tree.quasse(c(lambda, mu, char), max.taxa=15, x0=0, single.lineage=FALSE, verbose=TRUE) nodes <- c("nd13", "nd9", "nd5") split.t <- Inf pars <- c(.1, .2, 0, 2.5, .03, 0, .01) pars4 <- unlist(rep(list(pars), 4)) sd <- 1/200 control.C.1 <- list(dt.max=1/200) ## Not run: control.R.1 <- list(dt.max=1/200, method="fftR") lik.C.1 <- make.quasse(phy, phy$tip.state, sd, sigmoid.x, constant.x, control.C.1) (ll.C.1 <- lik.C.1(pars)) # -62.06409 ## slow... lik.R.1 <- make.quasse(phy, phy$tip.state, sd, sigmoid.x, constant.x, control.R.1) (ll.R.1 <- lik.R.1(pars)) # -62.06409 lik.s.C.1 <- make.quasse.split(phy, phy$tip.state, sd, sigmoid.x, constant.x, nodes, split.t, control.C.1) (ll.s.C.1 <- lik.s.C.1(pars4)) # -62.06409 ## End(Not run)
Create a likelihood function for a QuaSSE model where the tree is partitioned into regions with different parameters.
make.quasse.split(tree, states, states.sd, lambda, mu, nodes, split.t, control=NULL, sampling.f=NULL)
make.quasse.split(tree, states, states.sd, lambda, mu, nodes, split.t, control=NULL, sampling.f=NULL)
tree |
An ultrametric bifurcating phylogenetic tree, in
|
states |
A vector of character states, each of which must be a
numeric real values. Missing values ( |
states.sd |
A scalar or vector corresponding to the standard error around the mean in states (the initial probability distribution is assumed to be normal). |
lambda |
A function to use as the speciation function. The first
argument of this must be |
mu |
A function to use as the extinction function. The first
argument of this must be |
nodes |
Vector of nodes that will be split (see Details). |
split.t |
Vector of split times, same length as |
control |
A list of parameters for tuning the performance of the
integrator. A guess at reasonble values will be made here. See
Details in |
sampling.f |
Scalar with the estimated proportion of extant
species that are included in the phylogeny. A value of |
Branching times can be controlled with the split.t
argument. If this is Inf
, split at the base of the branch (as in
MEDUSA). If 0
, split at the top (closest to the present, as in
the new option for MEDUSA). If 0 < split.t < Inf
then we split
at that time on the tree (zero is the present, with time growing
backwards).
TODO: Describe nodes
and split.t
here.
Richard G. FitzJohn
Run a simple-minded MCMC using slice samples (Neal 2003) for independent updating of each variable.
mcmc(lik, x.init, nsteps, ...) ## Default S3 method: mcmc(lik, x.init, nsteps, w, prior=NULL, sampler=sampler.slice, fail.value=-Inf, lower=-Inf, upper=Inf, print.every=1, control=list(), save.file, save.every=0, save.every.dt=NULL, previous=NULL, previous.tol=1e-4, keep.func=TRUE, ...) sampler.slice(lik, x.init, y.init, w, lower, upper, control) sampler.norm(lik, x.init, y.init, w, lower, upper, control)
mcmc(lik, x.init, nsteps, ...) ## Default S3 method: mcmc(lik, x.init, nsteps, w, prior=NULL, sampler=sampler.slice, fail.value=-Inf, lower=-Inf, upper=Inf, print.every=1, control=list(), save.file, save.every=0, save.every.dt=NULL, previous=NULL, previous.tol=1e-4, keep.func=TRUE, ...) sampler.slice(lik, x.init, y.init, w, lower, upper, control) sampler.norm(lik, x.init, y.init, w, lower, upper, control)
lik |
Likelihood function to run MCMC on. This must return the log likelihood (or the log of a value proportional to the likelihood). |
x.init |
Initial parameter location (vector). |
nsteps |
Number of MCMC steps to take. |
w |
Tuning parameter for the sampler. See Details below for more information. |
prior |
An optional prior probability distribution function.
This must be a function that returns the log prior probability,
given a parameter vector. See |
sampler |
Sampler to use for the MCMC. There are currently only
two implemented; |
lower |
Lower bounds on parameter space (scalar or vector of same
length as |
upper |
Upper bounds on parameter space (scalar or vector of same
length as |
fail.value |
Value to use where function evaluation fails. The
default (negative infinity) corresponds to zero probability. Most
values that fail are invalid for a given model (negative rates, etc)
or have negligble probability, so this is reasonable. Set to
|
print.every |
The position and its probability will be printed
every |
control |
List with additional control parameters for the sampler. Not currently used. |
save.file |
Name of csv or rds file to save temporary output in. Contents will be rewritten at each save (rds is faster than csv, but is R-specific). |
save.every |
Number of steps to save progress into
|
save.every.dt |
Period of time to save after, as a
|
previous |
Output from a previous |
previous.tol |
Before continuing, the sampler re-evaluates the
last point and compares the posterior probability against the
posterior probability in the |
keep.func |
Indicates if the returned samples should include the
likelihood function, which can be accessed with
|
... |
Arguments passed to the function |
y.init |
Likelihood evaluated at |
There are two samplers implemented: a slice sampler (Neal 2003) and a basic Gaussian sampler. In general, only the slice sampler should be used; the Gaussian sampler is provided for illustration and as a starting point for future samplers.
For slice sampling (sampler.slice
), the tuning parameter w
affects how many function evaluations are required between sample
updates, but in almost all cases it does not affect how fast the
MCMC “mixes” (Neal 2003). In particular, w
is not analagous
to the step sizes used in conventional Metropolis-Hastings updaters that
use some fixed kernel for updates (see below). Ideally, w
would
be set to approximately the width of the high probability region. I
find that chosing the distance between the 5% and 95% quantiles of the
marginal distributions of each parameter works well, computed from this
preliminary set of samples (see Examples). If a single value is given,
this is shared across all parameters.
For the Gaussian updates (sampler.norm
), the tuning parameter
w
is the standard deviation of the normal distribution centred on
each parameter as it is updated.
For both samplers, if a single value is given, this is shared across all
parameters. If a vector is given, then it must be the same length as
w
, and parameter i
will use w[i]
.
If the MCMC is stopped by an interrupt (Escape on GUI versions of R, Control-C on command-line version), it will return a truncated chain with as many points as completed so far.
This is far from the most efficient MCMC function possible, as it was designed to work with likelihood functions that are relatively expensive to compute. The overhead for 10,000 slice samples is on the order of 5s on a 2008 Mac Pro (0.0005 s / sample).
The sampler function sampler.norm
and sampler.slice
should
not generally be called directly (though this is possible), but exist
only to be passed in to mcmc
.
Richard G. FitzJohn
Neal R.M. 2003. Slice sampling. Annals of Statistics 31:705-767.
make.bd
, make.bisse
,
make.geosse
, and make.mkn
, all of which
provide likelihood functions that are suitable for use with this
function. The help page for make.bd
has further
examples of using MCMC, and make.bisse
has examples of
using priors with MCMC.
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## To demonstrate, start with a simple bivariate normal. The function ## 'make.mvn' creates likelihood function for the multivariate normal ## distribution given 'mean' (a vector) and 'vcv' (the variance ## covariance matrix). This is based on mvnorm in the package ## mvtnorm, but will be faster where the vcv does not change between ## calls. make.mvn <- function(mean, vcv) { logdet <- as.numeric(determinant(vcv, TRUE)$modulus) tmp <- length(mean) * log(2 * pi) + logdet vcv.i <- solve(vcv) function(x) { dx <- x - mean -(tmp + rowSums((dx %*% vcv.i) * dx))/2 } } ## Our target distribution has mean 0, and a VCV with positive ## covariance between the two parameters. vcv <- matrix(c(1, .25, .25, .75), 2, 2) lik <- make.mvn(c(0, 0), vcv) ## Sample 500 points from the distribution, starting at c(0, 0). set.seed(1) samples <- mcmc(lik, c(0, 0), 500, 1, print.every=100) ## The marginal distribution of V1 (the first axis of the ## distribution) should be a normal distribution with mean 0 and ## variance 1: curve(dnorm, xlim=range(samples$X1), ylim=c(0, .5), col="red") hist(samples$X1, 30, add=TRUE, freq=FALSE) plot(X2 ~ X1, samples, pch=19, cex=.2, col="#00000055", asp=1) ## The estimated variance here matches nicely with the true VCV: (These ## all look much better if you increase the number of sampled points, ## say to 10,000) var(samples[2:3]) ## The above uses slice sampling. We can use simple Gaussian updates ## instead. This performs updates with standard deviation '1' in each ## direction. Unlike slice sampling, the 'w' parameter here will ## matter a great deal in determining how fast the chain will mix. samples.norm <- mcmc(lik, c(0, 0), 500, 1, print.every=100, sampler=sampler.norm) ## This *appears* to run much faster than the slice sampling based ## approach above, but the effective sample size of the second ## approach is much lower. The 'effectiveSize' function in coda says ## that for 10,000 samples using slice sampling, the effective sample ## size (equivalent number of independent samples) is about 8,500, but ## for the Gaussian updates is only 1,200. This can be seen by ## comparing the autocorrelation between samples from the two ## different runs. op <- par(oma=c(0, 0, 2, 0)) acf(samples[2:3]) title(main="Slice sampling", outer=TRUE) acf(samples.norm[2:3]) title(main="Gaussian updates", outer=TRUE) ## The autocorrelation is negligable after just 2 samples under slice ## sampling, but remains significant for about 15 with Gaussian ## updates. ## Not run: ## Next, a diversitree likelihood example. This example uses a 203 ## species phylogeny evolved under the BiSSE model. This takes a ## more substantial amount of time, so is not evaluated by default. pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) ## First, create a likelihood function: lik <- make.bisse(phy, phy$tip.state) lik(pars) ## This produces about a sample a second, so takes a while. The "upper" ## limit is a hard upper limit, above which the sampler will never let ## the parameter go (in effect, putting a uniform prior on the range ## lower..upper, and returning the joint distribution conditional on the ## parameters being in this range). tmp <- mcmc(lik, pars, nsteps=100, w=.1) ## The argument 'w' works best when it is about the width of the "high ## probability" region for that parameter. This takes the with of the ## 90% quantile range. The resulting widths are only slightly faster ## than the first guess. Samples are generated about 1/s; allow 15 ## minutes to generate 1000 samples. w <- diff(sapply(tmp[2:7], quantile, c(.05, .95))) out <- mcmc(lik, pars, nsteps=1000, w=w) ## You can do several things with this. Look for candidate ML points: out[which.max(out$p),] ## Or look at the marginal distribution of parameters profiles.plot(out["lambda0"], col.line="red") ## Or look at the joint marginal distribution of pairs of parameters plot(lambda0 ~ mu0, out) ## End(Not run)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## To demonstrate, start with a simple bivariate normal. The function ## 'make.mvn' creates likelihood function for the multivariate normal ## distribution given 'mean' (a vector) and 'vcv' (the variance ## covariance matrix). This is based on mvnorm in the package ## mvtnorm, but will be faster where the vcv does not change between ## calls. make.mvn <- function(mean, vcv) { logdet <- as.numeric(determinant(vcv, TRUE)$modulus) tmp <- length(mean) * log(2 * pi) + logdet vcv.i <- solve(vcv) function(x) { dx <- x - mean -(tmp + rowSums((dx %*% vcv.i) * dx))/2 } } ## Our target distribution has mean 0, and a VCV with positive ## covariance between the two parameters. vcv <- matrix(c(1, .25, .25, .75), 2, 2) lik <- make.mvn(c(0, 0), vcv) ## Sample 500 points from the distribution, starting at c(0, 0). set.seed(1) samples <- mcmc(lik, c(0, 0), 500, 1, print.every=100) ## The marginal distribution of V1 (the first axis of the ## distribution) should be a normal distribution with mean 0 and ## variance 1: curve(dnorm, xlim=range(samples$X1), ylim=c(0, .5), col="red") hist(samples$X1, 30, add=TRUE, freq=FALSE) plot(X2 ~ X1, samples, pch=19, cex=.2, col="#00000055", asp=1) ## The estimated variance here matches nicely with the true VCV: (These ## all look much better if you increase the number of sampled points, ## say to 10,000) var(samples[2:3]) ## The above uses slice sampling. We can use simple Gaussian updates ## instead. This performs updates with standard deviation '1' in each ## direction. Unlike slice sampling, the 'w' parameter here will ## matter a great deal in determining how fast the chain will mix. samples.norm <- mcmc(lik, c(0, 0), 500, 1, print.every=100, sampler=sampler.norm) ## This *appears* to run much faster than the slice sampling based ## approach above, but the effective sample size of the second ## approach is much lower. The 'effectiveSize' function in coda says ## that for 10,000 samples using slice sampling, the effective sample ## size (equivalent number of independent samples) is about 8,500, but ## for the Gaussian updates is only 1,200. This can be seen by ## comparing the autocorrelation between samples from the two ## different runs. op <- par(oma=c(0, 0, 2, 0)) acf(samples[2:3]) title(main="Slice sampling", outer=TRUE) acf(samples.norm[2:3]) title(main="Gaussian updates", outer=TRUE) ## The autocorrelation is negligable after just 2 samples under slice ## sampling, but remains significant for about 15 with Gaussian ## updates. ## Not run: ## Next, a diversitree likelihood example. This example uses a 203 ## species phylogeny evolved under the BiSSE model. This takes a ## more substantial amount of time, so is not evaluated by default. pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(2) phy <- tree.bisse(pars, max.t=60, x0=0) ## First, create a likelihood function: lik <- make.bisse(phy, phy$tip.state) lik(pars) ## This produces about a sample a second, so takes a while. The "upper" ## limit is a hard upper limit, above which the sampler will never let ## the parameter go (in effect, putting a uniform prior on the range ## lower..upper, and returning the joint distribution conditional on the ## parameters being in this range). tmp <- mcmc(lik, pars, nsteps=100, w=.1) ## The argument 'w' works best when it is about the width of the "high ## probability" region for that parameter. This takes the with of the ## 90% quantile range. The resulting widths are only slightly faster ## than the first guess. Samples are generated about 1/s; allow 15 ## minutes to generate 1000 samples. w <- diff(sapply(tmp[2:7], quantile, c(.05, .95))) out <- mcmc(lik, pars, nsteps=1000, w=w) ## You can do several things with this. Look for candidate ML points: out[which.max(out$p),] ## Or look at the marginal distribution of parameters profiles.plot(out["lambda0"], col.line="red") ## Or look at the joint marginal distribution of pairs of parameters plot(lambda0 ~ mu0, out) ## End(Not run)
Both stochastic character mapping and simulation may create character histories. This function plots these histories
## S3 method for class 'history' plot(x, phy, cols=seq_along(states), states=x$states, xlim=NULL, ylim=NULL, show.tip.label=TRUE, show.node.label=FALSE, show.tip.state=TRUE, show.node.state=TRUE, no.margin=FALSE, cex=1, font=3, srt=0, adj=0, label.offset=NA, lwd=1, ...)
## S3 method for class 'history' plot(x, phy, cols=seq_along(states), states=x$states, xlim=NULL, ylim=NULL, show.tip.label=TRUE, show.node.label=FALSE, show.tip.state=TRUE, show.node.state=TRUE, no.margin=FALSE, cex=1, font=3, srt=0, adj=0, label.offset=NA, lwd=1, ...)
x |
An object of class |
phy |
The phylogeny used to generate the history. Few checks are made to make sure that this is really correct, and all manner of terrible things might happen if these are not compatible. This may change in future. |
cols |
A vector of colours. |
states |
The different state types. Probably best to leave alone. |
xlim |
Plot x-limits (optional). |
ylim |
Plot y-limits (optional). |
show.tip.label |
Logical: show the species tip labels? |
show.node.label |
Logical: show the species node labels? |
show.tip.state |
Logical: draw a symbol at the tips to indicate tip state? |
show.node.state |
Logical: draw a symbol at the nodes to indicate node state? |
no.margin |
Supress drawing of margins around the plot |
cex |
Font and symbol scaling factor. |
font |
Font used for tip and node labels (see
|
srt |
String rotation for tip and node labels. |
adj |
Label |
label.offset |
Horizontal offset of tip and node labels, in branch length units. |
lwd |
Line width |
... |
Additional arguments (currently ignored) |
This attempts to be as compatible with ape
's plotting functions
as possible, but currently implements only right-facing cladegrams.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Simulate a tree, but retain extinct species. pars <- c(.1, .2, .03, .04, 0.05, 0.1) # BiSSE pars set.seed(2) phy <- tree.bisse(pars, 20, x0=0, include.extinct=TRUE) ## Create a 'history' from the information produced by the simulation ## and plot this h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, cex=.7) ## Prune the extinct taxa. phy2 <- prune(phy) ## The history must be recreated for this pruned tree: h2 <- history.from.sim.discrete(phy2, 0:1) plot(h2, phy2, cex=.7)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## Simulate a tree, but retain extinct species. pars <- c(.1, .2, .03, .04, 0.05, 0.1) # BiSSE pars set.seed(2) phy <- tree.bisse(pars, 20, x0=0, include.extinct=TRUE) ## Create a 'history' from the information produced by the simulation ## and plot this h <- history.from.sim.discrete(phy, 0:1) plot(h, phy, cex=.7) ## Prune the extinct taxa. phy2 <- prune(phy) ## The history must be recreated for this pruned tree: h2 <- history.from.sim.discrete(phy2, 0:1) plot(h2, phy2, cex=.7)
Simple plotting assistance for plotting output from MCMC runs
profiles.plot(y, col.line, col.fill, xlim=NULL, ymax=NULL, n.br=50, opacity=.5, xlab="Parameter estimate", ylab="Probability density", legend.pos=NULL, with.bar=TRUE, col.bg=NA, lwd=1, lines.on.top=TRUE, ...)
profiles.plot(y, col.line, col.fill, xlim=NULL, ymax=NULL, n.br=50, opacity=.5, xlab="Parameter estimate", ylab="Probability density", legend.pos=NULL, with.bar=TRUE, col.bg=NA, lwd=1, lines.on.top=TRUE, ...)
y |
Data frame, columns of which will be plotted as separate profiles. |
col.line |
Vector of colours for the lines. |
col.fill |
Vector of colours for the fill of the 95% most
probable region of the distribution. If ommited, this will be a
semi-transparent version of |
xlim |
X-axis limits - calculated automatically if omitted. |
ymax |
Y-axis upper limit - calculated automatically if omitted. |
n.br |
Number of breaks along the range of the data. |
opacity |
Opacity of the filled region (0 is transparent, 1 is fully opaque). |
xlab , ylab
|
Axis labels for the plot. |
legend.pos |
String to pass to |
with.bar |
Should a bar be included that shows the CI ranges below the plot (in addition to the shading)? |
col.bg |
Colour to draw behind the profiles (set to "white" for nicer transparency on non-white backgrounds) |
lwd |
Width of lines around the profiles |
lines.on.top |
Draw lines around profiles on top of all profiles? |
... |
Additional arguments passed through to |
Richard G. FitzJohn
## For usage, see the example in ?make.bd
## For usage, see the example in ?make.bd
Utility functions for working with QuaSSE models. These
provide a minimal set of state-varying functions, suitable for use
with make.quasse
, and simulation assistance functions
for use with tree.quasse
.
This is currently poorly explained!
constant.x(x, c) sigmoid.x(x, y0, y1, xmid, r) stepf.x(x, y0, y1, xmid) noroptimal.x(x, y0, y1, xmid, s2) make.linear.x(x0, x1) make.brownian.with.drift(drift, diffusion)
constant.x(x, c) sigmoid.x(x, y0, y1, xmid, r) stepf.x(x, y0, y1, xmid) noroptimal.x(x, y0, y1, xmid, s2) make.linear.x(x0, x1) make.brownian.with.drift(drift, diffusion)
x |
Character state |
c |
Constant. |
y0 |
y value at very small |
y1 |
y value at very large |
xmid |
Midpoint (inflection point) of sigmoid or step function |
r |
Rate at which exponential decay occurs or sigmoid changes - higher values are steeper |
s2 |
Variance of the normal distribution specified by
|
x0 |
Lower x limit for the linear function: y will take value at x0 for all x smaller than this |
x1 |
Upper x limit for the linear function: y will take value at x1 for all x greater than this |
drift |
Rate of drift |
diffusion |
Rate of diffusion (positive) |
The linear function returned by (make.linear.x
) will go to
zero wherever negative. This may not always be desired, but is
required for valid likelihood calculations under QuaSSE.
Richard G. FitzJohn
Set the default values of formal arguments of a function.
set.defaults(f, ..., defaults)
set.defaults(f, ..., defaults)
f |
A function |
... |
Named arguments to be set |
defaults |
A named list of arguments |
The repetitive argument lists of many of diversitree's likelihood functions are the motivation for this function.
For example, the likelihood function that make.bisse
produces
takes arguments condition.surv
, root
, and root.p
,
each with default values. If you dislike the defaults, you can change
them by passing in alternative values when computing likelihoods, or
when doing an ML search. However, this can get tedious if you are
using a function a lot, and your code will get cluttered with lots of
statements like condition.surv=FALSE
, some of which you may
forget. See the example below for how to avoid this.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) lik <- make.bisse(phy, phy$tip.state) ## default arguments: args(lik) lik.no.cond <- set.defaults(lik, condition.surv=FALSE) args(lik.no.cond) ## Multiple arguments at once: lik2 <- set.defaults(lik, root=ROOT.GIVEN, root.p=c(0, 1)) args(lik2) ## Equivalently (using alist, not list -- see ?alist) defaults <- alist(root=ROOT.GIVEN, root.p=c(0, 1)) lik3 <- set.defaults(lik, defaults=defaults) identical(lik2, lik3)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(4) phy <- tree.bisse(pars, max.t=30, x0=0) lik <- make.bisse(phy, phy$tip.state) ## default arguments: args(lik) lik.no.cond <- set.defaults(lik, condition.surv=FALSE) args(lik.no.cond) ## Multiple arguments at once: lik2 <- set.defaults(lik, root=ROOT.GIVEN, root.p=c(0, 1)) args(lik2) ## Equivalently (using alist, not list -- see ?alist) defaults <- alist(root=ROOT.GIVEN, root.p=c(0, 1)) lik3 <- set.defaults(lik, defaults=defaults) identical(lik2, lik3)
Simulate a character distribution (state of each species) under some simple models of trait evolution. Currently this does not return the full history (node states, and state changes) but this may be added in a future version.
sim.character(tree, pars, x0=0, model="bm", br=NULL) make.sim.character(tree, pars, model="bm", br=NULL)
sim.character(tree, pars, x0=0, model="bm", br=NULL) make.sim.character(tree, pars, model="bm", br=NULL)
tree |
A bifurcating phylogenetic tree, in |
pars |
A set of model parameters (see Details below), as the order and interpretation depends on the model. |
x0 |
Root state. The default is zero, which may not make sense in all models. |
model |
Character string specifying which model to evolve the
character under. Possible values are |
br |
For cases where none of the models specifiable through the
|
This function duplicates functionality in other packages; see
sim.char
in geiger
in particular. The main difference
here is that for continuous characters, this does not use the
variance-covariance matrix, which can make it much faster for very
large trees. I believe that this approach is similar to fastBM
in phytools
.
model="bm"
: Brownian Motion. Takes a single
parameter, representing the rate of diffusion (must be positive)
model="ou"
: Ornstein-Uhlenbeck process. Takes a vector
of three parameters, representing the rate of diffusion, strength of
restoring force, and the "optimum" value. The first two parameters
must be non-negative, and the rate of diffusion must be positive.
model="bbm"
: Bounded Brownian Motion. Takes a vector
of three parameters (s2
, c
, d
), representing
the rate of diffusion, lower and upper bound, respectively. The
rate of diffusion must be positive.
model="mk"
: Mk model (see make.mkn
). Takes a Q
matrix as its argument. The element Q[i,j]
represents the
rate of transition from state i
to state j
, and the
diagonal elements must be such that rowSums(Q)
is zero.
model="meristic"
: A special case of the Mk model, where the
trait is meristic and character transitions are only possible
between adjacent states. There are three parameters (k
,
up
, down
), representing the number of states, and rate
of character change up (from state i
to i+1
) and down.
Richard G. FitzJohn
Evolves one or more trees under the BiSSE (Binary State Speciation and Extinction), MuSSE (Multi-State Speciation and Extinction), BiSSE-ness (BiSSE-node enhanced state shift), ClaSSE (Cladogenetic State change Speciation and Extinction), or GeoSSE (Geographic State Speciation and Extinction) model, or a simple character independent birth-death model. For the SSE models, it simultaneously evolves a character that affects speciation and/or extinction, and the tree itself.
trees(pars, type=c("bisse", "bisseness", "bd", "classe", "geosse", "musse", "quasse", "yule"), n=1, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, ...) tree.bisse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.musse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.musse.multitrait(pars, n.trait, depth, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.quasse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA, single.lineage=TRUE, verbose=FALSE) tree.bisseness(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.classe(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.geosse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.bd(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE) tree.yule(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE) prune(phy, to.drop=NULL)
trees(pars, type=c("bisse", "bisseness", "bd", "classe", "geosse", "musse", "quasse", "yule"), n=1, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, ...) tree.bisse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.musse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.musse.multitrait(pars, n.trait, depth, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.quasse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA, single.lineage=TRUE, verbose=FALSE) tree.bisseness(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.classe(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.geosse(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE, x0=NA) tree.bd(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE) tree.yule(pars, max.taxa=Inf, max.t=Inf, include.extinct=FALSE) prune(phy, to.drop=NULL)
pars |
Vector of parameters. The parameters must be in the same
order as an unconstrained likelihood function returned by
|
type |
Type of tree to generate: May be "bisse" or "bd". |
n |
How many trees to generate? |
max.taxa |
Maximum number of taxa to include in the tree. If
|
max.t |
Maximum length to evolve the phylogeny over. If
|
include.extinct |
Logical: should extinct taxa be included in
the final phylogeny? And should extinct trees be returned by
|
x0 |
Initial character state at the root (state 0 or 1). A value
of |
n.trait , depth
|
For |
single.lineage |
( |
verbose |
( |
... |
Additional arguments |
phy |
A phylogeny, possibly with extinct species, produced by one of the tree evolving functions. |
to.drop |
Optional vector with the species names to drop. |
The phylogeny will begin from a single lineage in state x0
, but
the final phylogeny will include only branches above the first split.
tree.bisse
may return an extinct phylogeny, and trees
might return extinct phylogenies if include.extinct
is
TRUE
.
A phylo
phylogenetic tree (ape format), or for
bisse.trees
, a list
of phylo
trees.
The trees will have an element tip.state
that contains the
binary state information.
There are some logic problems around the creation of zero and one species trees; this will cause occasional errors when running the above functions. Things will change to fix this soon. All these functions may change in the near future.
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(3) phy <- tree.bisse(pars, max.taxa=30, x0=0) phy$tip.state h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## Retain extinct species: set.seed(3) phy2 <- tree.bisse(pars, max.taxa=30, x0=0, include.extinct=TRUE) h2 <- history.from.sim.discrete(phy2, 0:1) plot(h2, phy2) #### MuSSE: ## Two states pars <- c(.1, .2, .03, .04, 0.05, 0.1) set.seed(2) phy <- tree.musse(pars, 20, x0=1) h <- history.from.sim.discrete(phy, 1:2) plot(h, phy) ## A 3-state example where movement is only allowed between neighbouring ## states (1 <-> 2 <-> 3), and where speciation and extinction rates ## increase moving from 1 -> 2 -> 3: pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1, include.extinct=TRUE) h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7) ## And with extinct taxa pruned: phy2 <- prune(phy) h2 <- history.from.sim.discrete(phy2, 1:3) plot(h2, phy2, cex=.7) ## This can all be done in one step (and is by default): set.seed(2) phy <- tree.musse(pars, 30, x0=1) h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01) set.seed(3) phy <- tree.bisse(pars, max.taxa=30, x0=0) phy$tip.state h <- history.from.sim.discrete(phy, 0:1) plot(h, phy) ## Retain extinct species: set.seed(3) phy2 <- tree.bisse(pars, max.taxa=30, x0=0, include.extinct=TRUE) h2 <- history.from.sim.discrete(phy2, 0:1) plot(h2, phy2) #### MuSSE: ## Two states pars <- c(.1, .2, .03, .04, 0.05, 0.1) set.seed(2) phy <- tree.musse(pars, 20, x0=1) h <- history.from.sim.discrete(phy, 1:2) plot(h, phy) ## A 3-state example where movement is only allowed between neighbouring ## states (1 <-> 2 <-> 3), and where speciation and extinction rates ## increase moving from 1 -> 2 -> 3: pars <- c(.1, .15, .2, # lambda 1, 2, 3 .03, .045, .06, # mu 1, 2, 3 .05, 0, # q12, q13 .05, .05, # q21, q23 0, .05) # q31, q32 set.seed(2) phy <- tree.musse(pars, 30, x0=1, include.extinct=TRUE) h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7) ## And with extinct taxa pruned: phy2 <- prune(phy) h2 <- history.from.sim.discrete(phy2, 1:3) plot(h2, phy2, cex=.7) ## This can all be done in one step (and is by default): set.seed(2) phy <- tree.musse(pars, 30, x0=1) h <- history.from.sim.discrete(phy, 1:3) plot(h, phy, cex=.7)
Plot a phylogeny and label the tips with traits. This function is experimental, and may change soon. Currently it can handle discrete-valued traits and two basic tree shapes.
trait.plot(tree, dat, cols, lab=names(cols), str=NULL, class=NULL, type="f", w=1/50, legend=length(cols) > 1, cex.lab=.5, font.lab=3, cex.legend=.75, margin=1/4, check=TRUE, quiet=FALSE, ...)
trait.plot(tree, dat, cols, lab=names(cols), str=NULL, class=NULL, type="f", w=1/50, legend=length(cols) > 1, cex.lab=.5, font.lab=3, cex.legend=.75, margin=1/4, check=TRUE, quiet=FALSE, ...)
tree |
Phylogenetic tree, in ape format. |
dat |
A |
cols |
A list with colors. Each element corresponds to a trait
and must be named so that all names appear in |
lab |
Alternative names for the legend (perhaps longer or more
informative). Must be in the same order as |
str |
Strings used for the states in the legend. If |
class |
A vector along |
type |
Plot type (same as |
w |
Width of the trait plot, as a fraction of the tree depth. |
legend |
Logical: should a legend be plotted? |
cex.lab , font.lab
|
Font size and type for the tip labels. |
cex.legend |
Font size for the legend. |
margin |
How much space, relative to the total tree depth, should be reserved when plotting a higher level classification. |
check |
When TRUE (by default), this will check that the classes
specified by |
quiet |
When TRUE (FALSE by default), this suppresses the warning
caused by |
... |
Additional arguments passed through to phylogeny plotting
code (similar to |
Richard G. FitzJohn
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## These are the parameters: they are a single speciation and extinction ## rate, then 0->1 (trait A), 1->0 (A), 0->1 (B) and 1->0 (B). colnames(musse.multitrait.translate(2, depth=0)) ## Simulate a tree where trait A changes slowly and B changes rapidly. set.seed(1) phy <- tree.musse.multitrait(c(.1, 0, .01, .01, .05, .05), n.trait=2, depth=0, max.taxa=100, x0=c(0,0)) ## Here is the matrix of tip states (each row is a species, each column ## is a trait). head(phy$tip.state) trait.plot(phy, phy$tip.state, cols=list(A=c("pink", "red"), B=c("lightblue", "blue"))) nodes <- c("nd5", "nd4", "nd7", "nd11", "nd10", "nd8") grp <- lapply(nodes, get.descendants, phy, tips.only=TRUE) class <- rep(NA, 100) for ( i in seq_along(grp) ) class[grp[[i]]] <- paste("group", LETTERS[i]) ## Now, 'class' is a vector along phy$tip.label indicating which of six ## groups each species belongs. ## Plotting the phylogeny with these groups: trait.plot(phy, phy$tip.state, cols=list(A=c("pink", "red"), B=c("lightblue", "blue")), class=class, font=1, cex.lab=1, cex.legend=1) ## Add another state, showing values 1:3, names, and trait ordering. tmp <- sim.character(phy, c(-.1, .05, .05, .05, -.1, .05, .05, 0.05, -.1), model="mkn", x0=1) phy$tip.state <- data.frame(phy$tip.state, C=tmp) trait.plot(phy, phy$tip.state, cols=list(C=c("palegreen", "green3", "darkgreen"), A=c("pink", "red"), B=c("lightblue", "blue")), lab=c("Animal", "Vegetable", "Mineral"), str=list(c("crane", "toad", "snail"), c("kale", "carrot"), c("calcite", "beryl"))) ## Rectangular/phylogram plot with groups. trait.plot(ladderize(phy, right=FALSE), phy$tip.state, type="p", cols=list(A=c("pink", "red"), B=c("lightblue", "blue"), C=c("palegreen", "green3", "darkgreen")), class=class, font=1, cex.lab=1)
## Due to a change in sample() behaviour in newer R it is necessary to ## use an older algorithm to replicate the previous examples if (getRversion() >= "3.6.0") { RNGkind(sample.kind = "Rounding") } ## These are the parameters: they are a single speciation and extinction ## rate, then 0->1 (trait A), 1->0 (A), 0->1 (B) and 1->0 (B). colnames(musse.multitrait.translate(2, depth=0)) ## Simulate a tree where trait A changes slowly and B changes rapidly. set.seed(1) phy <- tree.musse.multitrait(c(.1, 0, .01, .01, .05, .05), n.trait=2, depth=0, max.taxa=100, x0=c(0,0)) ## Here is the matrix of tip states (each row is a species, each column ## is a trait). head(phy$tip.state) trait.plot(phy, phy$tip.state, cols=list(A=c("pink", "red"), B=c("lightblue", "blue"))) nodes <- c("nd5", "nd4", "nd7", "nd11", "nd10", "nd8") grp <- lapply(nodes, get.descendants, phy, tips.only=TRUE) class <- rep(NA, 100) for ( i in seq_along(grp) ) class[grp[[i]]] <- paste("group", LETTERS[i]) ## Now, 'class' is a vector along phy$tip.label indicating which of six ## groups each species belongs. ## Plotting the phylogeny with these groups: trait.plot(phy, phy$tip.state, cols=list(A=c("pink", "red"), B=c("lightblue", "blue")), class=class, font=1, cex.lab=1, cex.legend=1) ## Add another state, showing values 1:3, names, and trait ordering. tmp <- sim.character(phy, c(-.1, .05, .05, .05, -.1, .05, .05, 0.05, -.1), model="mkn", x0=1) phy$tip.state <- data.frame(phy$tip.state, C=tmp) trait.plot(phy, phy$tip.state, cols=list(C=c("palegreen", "green3", "darkgreen"), A=c("pink", "red"), B=c("lightblue", "blue")), lab=c("Animal", "Vegetable", "Mineral"), str=list(c("crane", "toad", "snail"), c("kale", "carrot"), c("calcite", "beryl"))) ## Rectangular/phylogram plot with groups. trait.plot(ladderize(phy, right=FALSE), phy$tip.state, type="p", cols=list(A=c("pink", "red"), B=c("lightblue", "blue"), C=c("palegreen", "green3", "darkgreen")), class=class, font=1, cex.lab=1)
These are utility functions that are used internally by diversitree, but which might be more generally useful.
Currently only get.descendants
docuemnted here, which
determines which species or nodes are descended from a particular
node.
get.descendants(node, tree, tips.only=FALSE, edge.index=FALSE) run.cached(filename, expr, regenerate=FALSE) expand.parameters(p, lik.new, repl=0, target=argnames(lik.new)) get.likelihood(object) drop.likelihood(object)
get.descendants(node, tree, tips.only=FALSE, edge.index=FALSE) run.cached(filename, expr, regenerate=FALSE) expand.parameters(p, lik.new, repl=0, target=argnames(lik.new)) get.likelihood(object) drop.likelihood(object)
node |
A node, either a name in |
tree |
A phylogenetic tree, in ape's |
tips.only |
Logical: return only descendant indices of tip species? |
edge.index |
Logical: return the row indices in the edge matrix? |
filename |
Name of the file to store cached results |
expr |
Expression to evaluate |
regenerate |
Logical: force re-evaluation of expr and regeneration of filename? |
object |
For |
p , lik.new , repl , target
|
Undocumented currently |
Richard G. FitzJohn