Title: | Aster Models |
---|---|
Description: | Aster models are exponential family regression models for life history analysis. They are like generalized linear models except that elements of the response vector can have different families (e. g., some Bernoulli, some Poisson, some zero-truncated Poisson, some normal) and can be dependent, the dependence indicated by a graphical structure. Discrete time survival analysis, zero-inflated Poisson regression, and generalized linear models that are exponential family (e. g., logistic regression and Poisson regression with log link) are special cases. Main use is for data in which there is survival over discrete time periods and there is additional data about what happens conditional on survival (e. g., number of offspring). Uses the exponential family canonical parameterization (aster transform of usual parameterization). Unlike the aster package, this package does dependence groups (nodes of the graph need not be conditionally independent given their predecessor node), including multinomial and two-parameter normal as families. Thus this package also generalizes mark-capture-recapture analysis. |
Authors: | Charles J. Geyer [aut, cre] |
Maintainer: | Charles J. Geyer <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.3-2 |
Built: | 2024-12-17 06:30:22 UTC |
Source: | CRAN |
Aster models are exponential family graphical models that combine aspects of generalized linear models and survival analysis.
This package is still under development, only about half finished.
However, it does do maximum likelihood for unconditional aster models
with dependence groups, which the old package aster
does not.
The main differences between this package and the old package are as follows.
The old package had triple indices for model matrices. The first index ran over individuals, the second index over nodes of the graph for an individual, and the third index over regression coefficients. Consequently the model matrix was represented (sometimes, but not consistently) as a three-dimensional array rather than a matrix, which was very confusing, even to the package author. This package ignores individuals, one index runs over all nodes of the combined graph for all individuals. Thus model matrices are always matrices.
The old package did not implement dependence groups, although they were
described in Geyer, Wagenius and Shaw (2007). This package does.
Consequently, this package requires a data frame, a vector pred
that
indicates predecessors, a vector group
that indicates individuals in
the same dependence group, and a vector fam
that indicates families
to specify a saturated aster model (the old package required only
the data frame, pred
, and fam
).
To facilitate the old style model specification, there is a new function
asterdata
that constructs objects of class "asterdata"
given an old style data frame, pred
, and fam
. All other
functions of the package take objects of class "asterdata"
as model
specifications.
The function predict.aster
in the old package
did some parameter transformations, but not all, and the returned value,
when a list, had a component gradient
, that was undocumented but useful
in applying the delta method. The functions
transformSaturated
,
transformConditional
, and
transformUnconditional
in this package transform
between any of the following parameter vectors:
the conditional canonical parameter ,
the unconditional canonical parameter
,
the conditional mean value parameter
,
the unconditional mean value parameter
,
the canonical affine submodel canonical parameter
,
and (unconditional aster models only)
the canonical affine submodel mean value parameter
(this last parameter is new, not discussed in the cited papers below, it is
, where
is the model matrix).
The change of parameter from
to
is equivalent to maximum likelihood estimation for an unconditional
aster model when the value
is used,
where
is the response vector. All of these transformation functions
also compute derivatives, if requested. See examples.
Functions analogous to aster
, anova
, and predict
in the old package are missing, thus model fitting, hypothesis tests,
and confidence intervals are more cumbersome. In fact, since there is
no function to calculate log likelihoods (like mlogl
in the old
package), there is no way to do likelihood ratio tests (but Rao or Wald
tests could be done, for unconditional aster models, since the derivative
of the log likelihood is observed minus expected
.
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika 94 415–426.
Shaw, R. G., Geyer, C. J., Wagenius, S., Hangelbroek, H. H. and Etterson, J. R. (2008) Unifying Life History Analyses for Inference of Fitness and Population Growth. American Naturalist, 172, E35–E47.
asterdata
, transformSaturated
,
families
## Not run: # perfectly good example but takes longer to run than CRAN allows data(echinacea) #### estimate MLE (simpler model than in Biometrika paper cited, not as good) hdct <- as.numeric(grepl("hdct", as.character(echinacea$redata$varb))) modmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop, data = echinacea$redata) tau.hat <- as.numeric(t(modmat) %*% echinacea$redata$resp) beta.hat <- transformUnconditional(tau.hat, modmat, echinacea, from = "tau", to = "beta") inverse.fisher <- jacobian(tau.hat, echinacea, transform = "unconditional", from = "tau", to = "beta", modmat = modmat) #### now have MLE (beta.hat) and pseudo-inverse of Fisher information #### (inverse.fisher), pseudo-inverse because modmat is not full rank foo <- cbind(beta.hat, sqrt(diag(inverse.fisher))) foo <- cbind(foo, foo[ , 1]/foo[ , 2]) foo <- cbind(foo, 2 * pnorm(- abs(foo[ , 3]))) dimnames(foo) <- list(colnames(modmat), c("Estimate", "Std. Error", "z value", "Pr(>|z|)")) printCoefmat(foo) #### coefficients constrained to be zero because parameterization is not #### identifiable have estimate zero and std. error zero (and rest NA) #### estimate fitness in populations #### generate new data with one individual in each pop at location (0, 0) pop.names <- levels(echinacea$redata$pop) pop.idx <- match(pop.names, as.character(echinacea$redata$pop)) pop.id <- echinacea$redata$id[pop.idx] newdata <- subset(echinacea, echinacea$redata$id %in% pop.id) newdata$redata[ , "nsloc"] <- 0 newdata$redata[ , "ewloc"] <- 0 hdct <- as.integer(grepl("hdct", as.character(newdata$redata$varb))) #### modmat for new data newmodmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop, data = newdata$redata) #### matrix that when multiplied mean value parameter vector gives fitness #### in each pop amat <- matrix(NA, nrow = length(pop.id), ncol = nrow(newmodmat)) for (i in 1:nrow(amat)) amat[i, ] <- as.numeric(grepl(paste("^", pop.id[i], ".hdct", sep = ""), rownames(newmodmat))) #### transform to expected fitness parameters efit <- transformUnconditional(beta.hat, newmodmat, newdata, from = "beta", to = "mu") efit <- as.numeric(amat %*% efit) #### jacobian matrix of this transformation jack <- jacobian(beta.hat, newdata, transform = "unconditional", from = "beta", to = "mu", modmat = newmodmat) #### delta method standard errors sefit <- sqrt(diag(amat %*% jack %*% inverse.fisher %*% t(jack) %*% t(amat))) foo <- cbind(efit, sefit) dimnames(foo) <- list(pop.names, c("Est. fitness", "Std. Error")) print(foo) ## End(Not run)
## Not run: # perfectly good example but takes longer to run than CRAN allows data(echinacea) #### estimate MLE (simpler model than in Biometrika paper cited, not as good) hdct <- as.numeric(grepl("hdct", as.character(echinacea$redata$varb))) modmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop, data = echinacea$redata) tau.hat <- as.numeric(t(modmat) %*% echinacea$redata$resp) beta.hat <- transformUnconditional(tau.hat, modmat, echinacea, from = "tau", to = "beta") inverse.fisher <- jacobian(tau.hat, echinacea, transform = "unconditional", from = "tau", to = "beta", modmat = modmat) #### now have MLE (beta.hat) and pseudo-inverse of Fisher information #### (inverse.fisher), pseudo-inverse because modmat is not full rank foo <- cbind(beta.hat, sqrt(diag(inverse.fisher))) foo <- cbind(foo, foo[ , 1]/foo[ , 2]) foo <- cbind(foo, 2 * pnorm(- abs(foo[ , 3]))) dimnames(foo) <- list(colnames(modmat), c("Estimate", "Std. Error", "z value", "Pr(>|z|)")) printCoefmat(foo) #### coefficients constrained to be zero because parameterization is not #### identifiable have estimate zero and std. error zero (and rest NA) #### estimate fitness in populations #### generate new data with one individual in each pop at location (0, 0) pop.names <- levels(echinacea$redata$pop) pop.idx <- match(pop.names, as.character(echinacea$redata$pop)) pop.id <- echinacea$redata$id[pop.idx] newdata <- subset(echinacea, echinacea$redata$id %in% pop.id) newdata$redata[ , "nsloc"] <- 0 newdata$redata[ , "ewloc"] <- 0 hdct <- as.integer(grepl("hdct", as.character(newdata$redata$varb))) #### modmat for new data newmodmat <- model.matrix(resp ~ varb + nsloc + ewloc + pop * hdct - pop, data = newdata$redata) #### matrix that when multiplied mean value parameter vector gives fitness #### in each pop amat <- matrix(NA, nrow = length(pop.id), ncol = nrow(newmodmat)) for (i in 1:nrow(amat)) amat[i, ] <- as.numeric(grepl(paste("^", pop.id[i], ".hdct", sep = ""), rownames(newmodmat))) #### transform to expected fitness parameters efit <- transformUnconditional(beta.hat, newmodmat, newdata, from = "beta", to = "mu") efit <- as.numeric(amat %*% efit) #### jacobian matrix of this transformation jack <- jacobian(beta.hat, newdata, transform = "unconditional", from = "beta", to = "mu", modmat = newmodmat) #### delta method standard errors sefit <- sqrt(diag(amat %*% jack %*% inverse.fisher %*% t(jack) %*% t(amat))) foo <- cbind(efit, sefit) dimnames(foo) <- list(pop.names, c("Est. fitness", "Std. Error")) print(foo) ## End(Not run)
Functions to construct and test conformance to the contract for objects
of class "asterdata"
. All other functions in this package take
model descriptions of this form.
asterdata(data, vars, pred, group, code, families, delta, response.name = "resp", varb.name = "varb", tolerance = 8 * .Machine$double.eps) validasterdata(object, tolerance = 8 * .Machine$double.eps) is.validasterdata(object, tolerance = 8 * .Machine$double.eps)
asterdata(data, vars, pred, group, code, families, delta, response.name = "resp", varb.name = "varb", tolerance = 8 * .Machine$double.eps) validasterdata(object, tolerance = 8 * .Machine$double.eps) is.validasterdata(object, tolerance = 8 * .Machine$double.eps)
data |
a data frame containing response and predictor variables for the aster model. |
vars |
a character vector containing names of variables in the data
frame |
pred |
an integer vector satisfying |
group |
an integer vector satisfying The lines indicate a transitive relation. If there is a line from
node For example, if nodes |
code |
an integer vector satisfying all(code %in% seq(along = families) Node Note that |
families |
a list of family specifications
(see |
delta |
a numeric vector satisfying |
response.name |
a character string giving the name of the response vector. |
varb.name |
a character string giving the name of the factor covariate
that says which of the variables in the data frame |
tolerance |
numeric >= 0. Relative errors smaller
than |
object |
an object of class |
Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.
This function only handles the usual case where the subgraph for every
individual is isomorphic to subgraph for every other individual
and all initial nodes (formerly
called root nodes) correspond to the constant one. Each row of data
is the data for one individual. The vectors vars
, pred
,
group
, code
, and delta
(if not missing) describe
the subgraph for one individual (which is the same for all individuals).
In other cases for which this function does not have the flexibility to
construct the appropriate object of class "asterdata"
, such an
object will have to be constructed “by hand” using R statements
not involving this function or modifying an object produced by this
function. See the following section for description of such objects.
The functions validasterdata
and is.validasterdata
can be
used to check whether objects constructed “by hand” have been
constructed correctly.
an object of class "asterdata"
is a list containing the
following components
redata |
a data frame having |
repred |
an integer vector satisfying
Note that
|
initial |
a numeric vector specifying constants associated with
initial nodes (formerly called root nodes) of the graphical model
for all individuals. If |
regroup |
an integer vector satisfying
It also requires that the dimension of the family specified by
The lines indicate a transitive relation. If there is a line from
node For example, if nodes Note that
|
recode |
an integer vector satisfying
all(recode %in% seq(along = families) Node Note that |
families |
a copy of the argument of the same name of this function
except that any character string abbreviations are converted to objects
of class |
redelta |
a numeric vector satisfying
Note that
|
response.name |
a character string giving the name of the response
variable in |
varb.name |
a character string giving the name of the “varb”
variable in |
In addition an object of class "asterdata"
may contain (and those
constructed by this function do contain) components
pred
, group
, and code
,
which are copies of the arguments of the same names of this function.
Objects of class "asterdata"
not constructed by this function need
not contain these additional components, since they may make no sense if
the graph for all individuals is not the repetition of isomorphic subgraphs,
one for each individual.
data(test1) fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1), group = c(0, 0, 2), code = c(1, 2, 2), families = list("bernoulli", "normal.location.scale")) is.validasterdata(fred)
data(test1) fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1), group = c(0, 0, 2), code = c(1, 2, 2), families = list("bernoulli", "normal.location.scale")) is.validasterdata(fred)
Produce basis for constancy space of an aster model. Test whether the difference of two canonical parameter vectors is in the constancy space (so the two parameter vectors correspond to the same probability model).
constancy(data, parm.type = c("theta", "phi")) is.same(parm1, parm2, data, parm.type = c("theta", "phi"), tolerance = sqrt(.Machine$double.eps))
constancy(data, parm.type = c("theta", "phi")) is.same(parm1, parm2, data, parm.type = c("theta", "phi"), tolerance = sqrt(.Machine$double.eps))
data |
an object of class |
parm.type |
the parametrization for which the constancy space is wanted. |
parm1 |
a parameter vector of the type specified by |
parm2 |
another parameter vector of the type specified
by |
tolerance |
numeric >= 0. Relative errors smaller
than |
There is no need for functions to test whether different mean value parameters
( or
) correspond to the same probability
distribution because these parametrizations are identifiable (different valid
parameter vectors correspond to different probability distributions).
for is.same
a logical value;
for constancy
a matrix whose rows constitute a basis for the constancy space.
This means that if is a linear combination of rows
of this matrix then for all real
the distributions having parameter
vectors
and
are the
same, where
or
depending on whether
parm.type = "theta"
or parm.type = "phi"
.
Conversely, if and
are valid parameter
vectors of the same type, then they correspond to the same probability
distribution only if
is a linear
combination of rows of this matrix.
data(test1) fred <- asterdata(test1, vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"), pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0), code = c(1, 1, 1, 2, 2, 3, 4, 5), families = list(fam.multinomial(3), "normal.location.scale", "bernoulli", "poisson", "zero.truncated.poisson")) cmat <- constancy(fred, parm.type = "phi")
data(test1) fred <- asterdata(test1, vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"), pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0), code = c(1, 1, 1, 2, 2, 3, 4, 5), families = list(fam.multinomial(3), "normal.location.scale", "bernoulli", "poisson", "zero.truncated.poisson")) cmat <- constancy(fred, parm.type = "phi")
Calculate cumulant function and up to three derivatives for families known to the package.
cumulant(theta, fam, deriv = 0, delta)
cumulant(theta, fam, deriv = 0, delta)
theta |
canonical parameter value. |
fam |
an object of class |
deriv |
the number of derivatives wanted. Must be nonnegative integer less than or equal to three. |
delta |
direction in which limit is taken. Cumulant
function is for family that is limit of family specified, limit
being for distributions with parameter
|
a list containing some of the following components:
zeroth |
the value of the cumulant function at |
first |
the value of the first derivative at |
second |
the value of the second derivative at |
third |
the value of the third derivative at |
Not intended for use by ordinary users. Provides R interface for testing to C code called by many other functions in the package.
cumulant(-0.5, fam.bernoulli(), deriv = 3) cumulant(-0.5, fam.bernoulli(), deriv = 3, delta = 1)
cumulant(-0.5, fam.bernoulli(), deriv = 3) cumulant(-0.5, fam.bernoulli(), deriv = 3, delta = 1)
Data on life history traits for the purple coneflower Echinacea angustifolia
data(echinacea)
data(echinacea)
An object of class "asterdata"
(see asterdata
)
comprising records for 570 plants observed over three years.
Nodes of the graph for one individual are associated with the variables
(levels of the factor echinacea$redata$varb
)
Indicator of being alive in 2002. Bernoulli, predecessor the constant one.
Ditto for 2003. Bernoulli, predecessor ld02
.
Ditto for 2004. Bernoulli, predecessor ld03
.
Indicator of flowering 2002. Bernoulli,
predecessor ld02
.
Ditto for 2003. Bernoulli, predecessor ld03
.
Ditto for 2004. Bernoulli, predecessor ld04
.
Count of number of flower heads in 2002.
Zero-truncated Poisson, predecessor fl02
.
Ditto for 2003.
Zero-truncated Poisson, predecessor fl03
.
Ditto for 2004.
Zero-truncated Poisson, predecessor fl04
.
Covariates are
the remnant population of origin of the plant
(all plants were grown together, pop
encodes ancestry).
east-west location in plot.
north-south location in plot.
This is the data for the example in Geyer, Wagenius, and Shaw (2007).
These data were included in the R package aster
which was the
predecessor of this package as the dataset echinacea
.
Stuart Wagenius, https://www.chicagobotanic.org/research/staff/wagenius
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika 94 415–426.
data(echinacea) names(echinacea) names(echinacea$redata) levels(echinacea$redata$varb)
data(echinacea) names(echinacea) names(echinacea$redata) levels(echinacea$redata$varb)
Families known to the package. These functions construct simple family specifications used in specifying aster models. Statistical properties of these families are described.
fam.bernoulli() fam.poisson() fam.zero.truncated.poisson() fam.normal.location.scale() fam.multinomial(dimension)
fam.bernoulli() fam.poisson() fam.zero.truncated.poisson() fam.normal.location.scale() fam.multinomial(dimension)
dimension |
the dimension (number of categories) for the multinomial distribution. |
Currently implemented families are
"bernoulli"
Bernoulli (binomial with sample size one).
The distribution of any
zero-or-one-valued random variable , which is the canonical
statistic. The mean value parameter is
The canonical parameter is
,
also called logit of
. The cumulant function is
This distribution has degenerate limiting distributions. The lower
limit as is the
distribution concentrated at zero, having cumulant function which
is the constant function everywhere equal to zero. The upper
limit as
is the
distribution concentrated at one, having cumulant function which
is the identity function satisfying
for all
.
For predecessor (sample size) , the successor is the sum of
independent and identically distributed (IID) Bernoulli
random variables, that is,
binomial with sample size
. The mean value parameter is
times the mean value parameter for sample size one; the cumulant function
is
times the cumulant function for sample size one; the canonical
parameter is the same for all sample sizes.
"poisson"
Poisson. The mean value parameter
is the mean of the Poisson distribution.
The canonical parameter is
.
The cumulant function is
This distribution has a degenerate limiting distribution. The lower
limit as is the
distribution concentrated at zero, having cumulant function which
is the constant function everywhere equal to zero. There is no upper
limit because the canonical statistic is unbounded above.
For predecessor (sample size) , the successor is the sum of
IID Poisson random variables, that is,
Poisson with mean
. The mean value parameter is
times the mean value parameter for sample size one; the cumulant function
is
times the cumulant function for sample size one; the canonical
parameter is the same for all sample sizes.
"zero.truncated.poisson"
Poisson conditioned on being
greater than zero. Let be the mean of the corresponding
untruncated Poisson distribution. Then the canonical parameters for both
truncated and untruncated distributions are the same
.
The mean value parameter for the zero-truncated Poisson distribution is
and the cumulant function is
where is as defined above,
so
.
This distribution has a degenerate limiting distribution. The lower
limit as is the
distribution concentrated at one, having cumulant function which
is the identity function satisfying
for all
.
There is no upper
limit because the canonical statistic is unbounded above.
For predecessor (sample size) , the successor is the sum of
IID zero-truncated Poisson random variables, which is not
a brand-name distribution. The mean value parameter is
times the mean value parameter for sample size one; the cumulant function
is
times the cumulant function for sample size one; the canonical
parameter is the same for all sample sizes.
"normal.location.scale"
The distribution of a normal
random variable with unknown mean
and unknown variance
. Thought of as an exponential family, this is
a two-parameter family, hence must have a two-dimensional canonical
statistic
. The canonical parameter
vector
has components
and
The value of is unrestricted, but
must be strictly negative.
The mean value parameter vector
has components
and
The cumulant function is
This distribution has no degenerate limiting distributions, because the canonical statistic is a continuous random vector so the boundary of its support has probability zero.
For predecessor (sample size) , the successor is the sum of
IID random vectors
,
where each
is normal
with mean
and variance
, and this is not
a brand-name multivariate distribution (the first component of the sum
is normal, the second component noncentral chi-square, and the
components are not independent).
The mean value parameter vector is
times the mean value parameter vector for sample size one;
the cumulant function
is
times the cumulant function for sample size one; the canonical
parameter vector is the same for all sample sizes.
"multinomial"
Multinomial with sample size one.
The distribution of any random vector having all components zero
except for one component which is one (
is the
canonical statistic vector).
The mean value parameter is the vector
having
components
The mean value parameter vector is given as a function
of the canonical parameter vector
by
where is the dimension of
and
and
. This transformation is not one-to-one;
adding the same number
to each component of
does not change the value
of
.
The cumulant function is
This distribution is degenerate. The sum of the components of the
canonical statistic is equal to one with probability
one, which implies the nonidentifiability of the -dimensional
canonical parameter vector mentioned above. Hence one parameter
(at least) is always constrained to to be zero in
fitting an aster model with a multinomial family.
This distribution has many degenerate distributions. For any vector
the limit of distributions having canonical
parameter vectors
as
exists and is another
multinomial distribution (the limit distribution in the direction
).
Let
be the set of
such that
,
where
denotes the maximum over the
components of
.
Then the limit distribution in the direction
has components
of the canonical statistic
for
concentrated at zero.
The cumulant function of this degenerate distribution is
The canonical parameters for
are not identifiable, and one other canonical parameter is not
identifiable because of the constraint that the sum of the components
of the canonical statistic is equal to one with probability one.
For predecessor (sample size) , the successor is the sum of
IID multinomial-sample-size-one random vectors, that is,
multinomial with sample size
. The mean value parameter is
times the mean value parameter for sample size one; the cumulant function
is
times the cumulant function for sample size one; the canonical
parameter is the same for all sample sizes.
a list of class "astfam"
giving name and values of any
hyperparameters.
fam.bernoulli() fam.multinomial(4)
fam.bernoulli() fam.multinomial(4)
Data on life history traits for the tobacco hornworm Manduca sexta
data(hornworm)
data(hornworm)
An object of class "asterdata"
(see asterdata
)
comprising records for 162 insects (54 female, 68 male, and 40 for which
there was no opportunity to determine sex) observed over 40 days.
Nodes of the graph for one individual are associated with the variables
(levels of the factor hornworm$redata$varb
) in dependence groups
Bernoulli. Predecessor 1 (initial node). Indicator of pupation.
Three-dimensional multinomial dependence group.
Predecessor P
.
Indicator of death after pupation. In these data, all deaths after pupation are considered to have happened on day 33 regardless of when they occurred (because the actual day of death was not recorded in the original data).
Indicator of survival to day 33 but still pre-eclosion.
Indicator of eclosion (emergence from pupa as adult moth on day 33.
Zero-truncated Poisson. Predecessor T332
.
Count of ovarioles on day 33. Only females have this node in their
graphs.
For x
= 34, ..., 40. Two-dimensional multinomial
dependence group. Predecessor Tw1
, where w = x - 1
.
Indicator of survival to day x
but still
pre-eclosion.
Indicator of eclosion (emergence from pupa as adult moth
on day x
.
Zero-truncated Poisson. Predecessor Tx2
.
Count of ovarioles on day x
. Only females have these nodes in
their graph.
Covariates are
a factor. F
is known female, M
is known male,
U
is unknown (no opportunity to observe).
time (in weeks) to reach the 2nd instar stage. Larval instars are stages between molts (shedding of exoskeleton) of the larval form (caterpillar).
mass (in grams) at the 2nd instar stage.
mass (in grams) at eclosion.
name of an individual in the original data.
This is the data described by and analyzed by non-aster methods by Kingsolver et al. (2012) and re-analyzed using this package by Eck et al. (submitted).
For an illustration of the graph, see Figure 1 in Eck et al. (submitted).
In the description above, a concrete example of the x
and w
notation is that T351 and T352 form a two-dimensional multinomial dependence
group, the predecessor of which is T341, and B35 is a dependence group all
by itself, its predecessor being T352.
Every multinomial dependence group acts like a switch. If the predecessor
is one, the dependence group is multinomial with sample size one (exactly
one variable is one and the rest are zero). So this indicates which way
the life history goes. If the predecessor is zero, then all successors are
zero. This goes for all variables in any aster model. If Tx2
is zero,
then so is Bx
. The ovariole count is zero except for the day on
which the individual eclosed.
Joel Kingsolver https://bio.unc.edu/people/faculty/kingsolver/
Kingsolver, J. G., Diamond, S. E., Seiter, S. A., and Higgins, J. K. (2012) Direct and indirect phenotypic selection on developmental trajectories in Manduca sexta. Functional Ecology 26 598–607.
Eck, D., Shaw, R. G., Geyer, C. J., and Kingsolver, J. (submitted) An integrated analysis of phenotypic selection on insect body size and development time.
data(hornworm) names(hornworm) names(hornworm$redata) levels(hornworm$redata$varb)
data(hornworm) names(hornworm) names(hornworm$redata) levels(hornworm$redata$varb)
Calculate link function and up to one derivative for families known to the package.
link(xi, fam, deriv = 0, delta)
link(xi, fam, deriv = 0, delta)
xi |
mean value parameter value, a numeric vector. |
fam |
an object of class |
deriv |
the number of derivatives wanted. Must be either zero or one. |
delta |
direction in which limit is taken. Link
function is for family that is limit of family specified, limit
being for distributions with canonical parameter
|
a list containing some of the following components:
zeroth |
the value of the link function at |
is the dimension of .
first |
the value of the first derivative at |
Not intended for use by ordinary users. Provides R interface for testing to C code called by many other functions in the package.
link(0.3, fam.bernoulli(), deriv = 1) link(0.3, fam.bernoulli(), deriv = 1, delta = 1)
link(0.3, fam.bernoulli(), deriv = 1) link(0.3, fam.bernoulli(), deriv = 1, delta = 1)
Subset an object of class "asterdata"
,
for which see asterdata
.
## S3 method for class 'asterdata' subset(x, subset, successors = TRUE, ...)
## S3 method for class 'asterdata' subset(x, subset, successors = TRUE, ...)
x |
an object of class |
subset |
a logical vector indicating nodes of the graph to keep: missing values are taken as false. |
successors |
a logical scalar indicating whether the subgraph must be a union of connected components of the original graph, that is, if all successors of nodes in the subset must also be in the subset. |
... |
further arguments, which are ignored (this argument is required
for methods of the generic function |
Argument subset
is a logical vector of the same length as the number
of nodes in the graph specified by argument x
. It indicates the
subset of nodes in the subgraph wanted. The subgraph must be closed with
respect to predecessors (all predecessors of nodes in the subset are also
in the subset) and if successors = TRUE
with respect
to successors (all successors of nodes in the subset are also in the subset).
And similarly for dependence groups: each dependence
group in the original graph must have all or none of its elements
in the subgraph.
an object of class "asterdata"
that represents the aster model having
subgraph with nodes specified by subset
.
data(echinacea) #### select one individual from each level of pop foo <- echinacea$redata$pop bar <- match(levels(foo), as.character(foo)) baz <- is.element(echinacea$redata$id, echinacea$redata$id[bar]) out <- subset(echinacea, baz)
data(echinacea) #### select one individual from each level of pop foo <- echinacea$redata$pop bar <- match(levels(foo), as.character(foo)) baz <- is.element(echinacea$redata$id, echinacea$redata$id[bar]) out <- subset(echinacea, baz)
Test data of no biological interest. Does have all families implemented at the time the test data was created. No predictor variables.
data(test1)
data(test1)
A data frame with 100 observations on the following 8 variables.
m1
a numeric vector, part of a multinomial dependence group
(with m2
and m3
). Predecessor of this group is the
constant 1.
m2
a numeric vector.
m3
a numeric vector.
n1
a numeric vector, part of a normal location-scale
dependence group (with n2
). Predecessor of this group
is m1
.
n2
a numeric vector (actually n1^2
).
b1
a numeric vector, Bernoulli. Predecessor is m2
.
p1
a numeric vector, Poisson. Predecessor is m3
.
z1
a numeric vector, zero-truncated Poisson. Predecessor
is b1
.
created by R script test1.R
in directory makedata
of the
installation directory for this package.
data(test1) fred <- asterdata(test1, vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"), pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0), code = c(1, 1, 1, 2, 2, 3, 4, 5), families = list(fam.multinomial(3), "normal.location.scale", "bernoulli", "poisson", "zero.truncated.poisson"))
data(test1) fred <- asterdata(test1, vars = c("m1", "m2", "m3", "n1", "n2", "b1", "p1", "z1"), pred = c(0, 0, 0, 1, 1, 2, 3, 6), group = c(0, 1, 2, 0, 4, 0, 0, 0), code = c(1, 1, 1, 2, 2, 3, 4, 5), families = list(fam.multinomial(3), "normal.location.scale", "bernoulli", "poisson", "zero.truncated.poisson"))
Calculate a change-of-parameter for an aster model or the derivative of such a change-of-parameter. Validate certain parameter vectors.
transformSaturated(parm, data, from = c("theta", "phi", "xi", "mu"), to = c("theta", "phi", "xi", "mu"), differential, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) transformConditional(parm, modmat, data, from = "beta", to = c("theta", "phi", "xi", "mu"), differential, offset, tolerance = 8 * .Machine$double.eps) transformUnconditional(parm, modmat, data, from = c("beta", "tau"), to = c("beta", "theta", "phi", "xi", "mu", "tau"), differential, offset, tolerance = 8 * .Machine$double.eps) jacobian(parm, data, transform = c("saturated", "conditional", "unconditional"), from = c("beta", "theta", "phi", "xi", "mu", "tau"), to = c("beta", "theta", "phi", "xi", "mu", "tau"), modmat, offset, tolerance = 8 * .Machine$double.eps) validtheta(data, theta, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) is.validtheta(data, theta, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) validxi(data, xi, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) is.validxi(data, xi, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps)
transformSaturated(parm, data, from = c("theta", "phi", "xi", "mu"), to = c("theta", "phi", "xi", "mu"), differential, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) transformConditional(parm, modmat, data, from = "beta", to = c("theta", "phi", "xi", "mu"), differential, offset, tolerance = 8 * .Machine$double.eps) transformUnconditional(parm, modmat, data, from = c("beta", "tau"), to = c("beta", "theta", "phi", "xi", "mu", "tau"), differential, offset, tolerance = 8 * .Machine$double.eps) jacobian(parm, data, transform = c("saturated", "conditional", "unconditional"), from = c("beta", "theta", "phi", "xi", "mu", "tau"), to = c("beta", "theta", "phi", "xi", "mu", "tau"), modmat, offset, tolerance = 8 * .Machine$double.eps) validtheta(data, theta, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) is.validtheta(data, theta, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) validxi(data, xi, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps) is.validxi(data, xi, model.type = c("unconditional", "conditional"), tolerance = 8 * .Machine$double.eps)
parm |
parameter vector to transform,
a numerical vector of length |
data |
an object of class |
from |
the kind of parameter which |
to |
the kind of parameter to which |
differential |
if not missing a numeric vector of the same length
as |
modmat |
the model matrix for a canonical affine submodel, a
numerical matrix having |
offset |
the offset vector for a canonical affine submodel, a
numerical vector of length |
theta |
conditional canonical parameter vector to validate,
a numerical vector of length |
xi |
conditional canonical parameter vector to validate,
a numerical vector of length |
model.type |
which kind of model (see Details section). May be abbreviated. |
tolerance |
numeric >= 0. Relative errors smaller
than |
transform |
the “transform” function that will be called to
calculate derivatives, e. g., |
If differential
is missing, the returned value is a new parameter
vector of the specified type. If differential
is not missing,
the returned value is the derivative evaluated at parm
and differential
, that is, if is the change-of variable
function and
is the
from
parameter, then
is calculated when the differential is missing and
is calculated when the
differential
is not missing, where the latter is defined by
for small .
The kinds of parameters are "theta"
the conditional canonical parameter
for the saturated model, "phi"
the unconditional canonical parameter
for the saturated model, "xi"
the conditional mean value parameter
for the saturated model, "mu"
the unconditional mean value parameter
for the saturated model,
"beta"
the regression coefficient parameter for a canonical affine
submodel ( for a conditional
canonical affine submodel or
for an unconditional
canonical affine submodel, where
is the offset vector
and
is the model matrix),
"tau"
the mean value parameter for an unconditional canonical affine
submodel (,
where
is the model matrix).
Only the conditional canonical parameter vector and
the conditional mean value parameter vector
can be checked
directly. (To check the validity of another parameter, transform to one
of these and check that.) This means that in conversions to these parameters
the output vector is checked rather than the input vector, and conversions
(apparently) not involving these parameters (which do go through these
parameters inside the transformation function) a conversion to one of
these parameters is what is checked rather than the input vector.
There is a difference between conditional and unconditional aster models
in the way they treat zero predecessors. For a conditional aster model,
if the observed value of the predecessor is zero, then the successor is
zero almost surely and can have any parameter value for
or
. For an unconditional aster model,
if the expected value of the predecessor is zero, then the successor is
zero almost surely and can have any parameter value for
or
.
Since zero values are not allowed at initial nodes (not
considered valid by the function validasterdata
), the only
way predecessor data can be zero almost surely in an unconditional aster model
is if the delta vector (data$redelta
) is not zero so we have a limiting
model.
The function jacobian
turns the derivative considered as
a linear transformation calculated by the “transform” functions
into the matrix that represents the linear transformation (sometimes
called the Jacobian matrix of the transformation). The arguments
modmat
and offset
are only used if
transform == "conditional"
or transform == "unconditional"
,
and (as with the “transform” functions) the argument offset
may be missing, in which case the zero vector is used. Not all of the
candidate values for from
and to
arguments
for the jacobian
function are valid: the value must be valid for
the “transform” function that will be called.
a numeric vector of the same length as parm
. The new parameter if
deriv == FALSE
or the transform of the differential
if deriv = TRUE
. See details.
data(echinacea) theta <- rnorm(nrow(echinacea$redata), 0, 0.1) phi <- transformSaturated(theta, echinacea, from = "theta", to = "phi") ## rarely (if ever) want jacobian for unsaturated model transform ## result here is 5130 by 5130 matrix ## Not run: jack <- jacobian(theta, echinacea, from = "theta", to = "phi")
data(echinacea) theta <- rnorm(nrow(echinacea$redata), 0, 0.1) phi <- transformSaturated(theta, echinacea, from = "theta", to = "phi") ## rarely (if ever) want jacobian for unsaturated model transform ## result here is 5130 by 5130 matrix ## Not run: jack <- jacobian(theta, echinacea, from = "theta", to = "phi")