Title: | Maximum Likelihood Estimation for Generalized Linear Mixed Models |
---|---|
Description: | Maximum likelihood estimation for generalized linear mixed models via Monte Carlo EM. For a description of the algorithm see Brian S. Caffo, Wolfgang Jank and Galin L. Jones (2005) <DOI:10.1111/j.1467-9868.2005.00499.x>. |
Authors: | Felipe Acosta Archila |
Maintainer: | Felipe Acosta Archila <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.3 |
Built: | 2024-12-07 07:49:12 UTC |
Source: | CRAN |
mcemGLM
performs maximum likelihood estimation for logistic,
Poisson, and negative binomial regression when random effects are
present. The package uses an MCEM algorithm to estimate the model's
fixed parameters and variance components with their respective
standard errors.
A Wald test based anova
is available to test significance of
multi-leveled variables and for multiple contrast testing.
Package: | mcemGLM |
Type: | Package |
Version: | 1.1.2 |
Date: | 2023-01-12 |
License: | GPL (>= 2) |
Felipe Acosta Archila
Maintainer: Felipe Acosta Archila <[email protected]>
set.seed(123) x <- rnorm(30, 10, 1) z <- factor(rep(1:6, each = 5)) obs <- sample(0:1, 30, TRUE) fit <- mcemGLMM(obs ~ x, random = ~ 0 + z, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 15, MCit = 10000), initial = c(3.30, -0.35, 0.005)) summary(fit) anova(fit)
set.seed(123) x <- rnorm(30, 10, 1) z <- factor(rep(1:6, each = 5)) obs <- sample(0:1, 30, TRUE) fit <- mcemGLMM(obs ~ x, random = ~ 0 + z, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 15, MCit = 10000), initial = c(3.30, -0.35, 0.005)) summary(fit) anova(fit)
ANOVA table based on Wald tests for a model fitted with mcemGLMM
.
## S3 method for class 'mcemGLMM' anova(object, ...)
## S3 method for class 'mcemGLMM' anova(object, ...)
object |
a model fitted with the mcemGLMM function. |
... |
additional arguments. |
A matrix with the rows corresponding to a test for the different terms of the model and the following columns:
degrees of freedom for the term.
Wald's chi squared statistic.
p value for the test statistic.
Felipe Acosta Archila <[email protected]>
Extract the fixed effect coefficients from a model fitted with mcemGLMM
.
## S3 method for class 'mcemGLMM' coef(object, ...)
## S3 method for class 'mcemGLMM' coef(object, ...)
object |
a model fitted with the mcemGLMM function. |
... |
additional arguments. |
A vector with the fixed effect coefficients.
Felipe Acosta Archila <[email protected]>
Contrast testing for a model fitted with mcemGLMM
.
contrasts.mcemGLMM(object, ctr.mat)
contrasts.mcemGLMM(object, ctr.mat)
object |
a model fitted with the mcemGLMM function. |
ctr.mat |
contrast matrix. Each row corresponds to a linear combination of the fixed effect coefficients. |
A matrix with each row corresponding to a contrast and with the following columns:
contrasts's point estimator.
standard error for each fitted contrast.
Wald statistic for each fitted contrast.
p-value adjusted for multiple comparison (Bonferroni.)
Felipe Acosta Archila <[email protected]>
set.seed(72327) data(exdata) fit <- mcemGLMM(obs ~ z2 + x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(verb = FALSE, EMit = 15, MCit = 10000), initial = c(-0.13, -0.15, -0.21, 1.59, 0.002)) mat <- rbind("1 - 2" = c(0, -1, 0, 0), "1 - 3" = c(0, 0, -1, 0), "2 - 3" = c(0, 1, -1, 0)) contrasts.mcemGLMM(fit, mat)
set.seed(72327) data(exdata) fit <- mcemGLMM(obs ~ z2 + x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(verb = FALSE, EMit = 15, MCit = 10000), initial = c(-0.13, -0.15, -0.21, 1.59, 0.002)) mat <- rbind("1 - 2" = c(0, -1, 0, 0), "1 - 3" = c(0, 0, -1, 0), "2 - 3" = c(0, 1, -1, 0)) contrasts.mcemGLMM(fit, mat)
Extract the fixed effect's covariance matrix from a model fitted with mcemGLMM
.
covMat.mcemGLMM(object, ...)
covMat.mcemGLMM(object, ...)
object |
a model fitted with the mcemGLMM function. |
... |
additional arguments. |
A matrix corresponding to the covariance matrix of the fixed effects.
Felipe Acosta Archila <[email protected]>
Data from an experiment with $i=1,...,59$ epilepsy patients. Each of the patients was assigned to a control group or a treatment group. The experiment recorded the number of seizures experienced by each patient over four two-week periods. The experiment also recorded a baseline count of the number of seizures the patients had experienced during the previous eight weeks.
data("epilepsy")
data("epilepsy")
A data frame with 236 observations on the following 6 variables.
count
Number of seizures experienced.
id
Patient ID.
group
Treatment or control groups.
age
The logarithm of the patient's age.
base
The logarithm of baseline/4.
visit
The corresponding time period.
Thall, P. F. and Vail, S. C. (1990) Some covariance models for longitudinal count data with overdispersion In Biometrics 46, 657–671
Simulated binary response dataset to use in examples.
data("exdata")
data("exdata")
A data frame with 60 observations on the following 4 variables.
obs
a numeric binary vector
obs2
a numeric count vector
x
a numeric vector
z1
a factor with levels 1
, and 2
z2
a factor with levels 1
, 2
, and 3
The observations were generated independently with the code shown in the examples section.
set.seed(123) obs <- c(sample(0:1, 30, TRUE, prob = c(0.5, 0.5)), sample(0:1, 30, TRUE, prob = c(0.3, 0.7))) obs2 <- c(rpois(30, 5), rpois(30, 10))
set.seed(123) obs <- c(sample(0:1, 30, TRUE, prob = c(0.5, 0.5)), sample(0:1, 30, TRUE, prob = c(0.3, 0.7))) obs2 <- c(rpois(30, 5), rpois(30, 10))
Maximum likelihood estimation for logistic, Poisson, and negative binomial models with random effects using a Monte Carlo EM algorithm.
mcemGLMM(fixed, random, data, family = c("bernoulli", "poisson", "negbinom", "gamma"), vcDist = c("normal", "t"), df, controlEM = list(), controlTrust = list(), initial)
mcemGLMM(fixed, random, data, family = c("bernoulli", "poisson", "negbinom", "gamma"), vcDist = c("normal", "t"), df, controlEM = list(), controlTrust = list(), initial)
fixed |
the fixed effects model. This is specified
by a |
random |
the random effects models. This is specified
by a |
data |
an optional data frame containing the variables in the model. If missing the variables are taken from the current environment. |
family |
a string indicating the type of model to be fitted. The options are "bernoulli" for logistic regression, "Poisson" for Poisson count regression, and "negbinom" for negative binomial count regression. |
vcDist |
a string indicating the distribution of the marginal variance components. The options are "normal" and "t" for normal and t distributed random effects respectively. |
df |
a vector of degrees of freedom of the random effects when these are t distributed. The length of the vector must be equal to the number of variance components in the model. |
controlEM |
a list of options for the algorithm. See Details below. |
controlTrust |
a list of options to be passed to the |
initial |
optional initial values for the parameters. If missing the initial values for the fixed effects are taken from a generalized linear model fitted without random effects and the initial values for the variance components are set to 5. |
The function mcemGLMM
allows the fitting of generalized linear
mixed models when the random effects are normal or a t distributed.
The supported models are the logistic, Poisson and negative binomial.
The degrees of freedom for the t case must be supplied by the user
with a vector in the df
argument. The length of the vector
must be equal to the number of variance components. For normal random
effects the argument df
does not need to be included.
To fit a model with one random effect a formula must be supplied in
the random
argument. Note that it is necessary that the
variable is a factor and to specify that there is no intercept for
the random part of the model. To use more than one random effects a
list of formulas must be supplied. Each member must be formula with
in which the variables involved must be factors and it also it is
necessary to specify that there is no intercept. To fit crossed
random effects each variable must no appear in its own formula. To
fit nested random effects a formula with the highest level variable
must be specified and each subsequent variable must be specified with
an interaction of the variables above it. See examples below.
A note on the negative binomial overdispersion:
The variance for the negative binomial model is set equal to
so we have that there is no overdispersion
as
goes to infinity.
The variance for the gamma distribution is set to . The value
of
corresponds to an exponential regression.
Stopping rules and convergence criteria:
The algorithm runs for a maximum of EMit
iterations or until
the criteria
is satisfied three times in a row for pre-defined values of
and
. Once this criterion has been achieved
two times we increase the Monte Carlo sample size more rapidly to
have a better estimation of the model's information matrix. For
a detailed discussion on convergence diagnostics see Neath R.C. (2012).
After fitting a model it is recommended to plot the EM estimates
at each step to assess convergence.
Control options for EM:
maximum number of EM iterations.
initial number of Monte Carlo iterations for the MCMC step.
factor in which the MC iterations increase in each EM iteration.
logical. If TRUE at each EM iteration the function will print convergence information and a trace plot for on of the random effects. This is useful to assess the performance of the algorithm but it can impact the actual running time.
initial standard deviation for the proposal density of the MCMC step. If zero (default) an auto-tuning step will be performed.
constant for the EM error assessment.
constant for the EM error assessment.
Control options for trust, see help(trust)
for more details:
starting trust region radius. Default value set to 20.
maximum allowed trust region radius. Default value set to 200.
maximum number of iterations. Default value set to 100.
A list of class "mcemGLMM" with the following items:
a matrix with the value of the maximum likelihood estimators at the end of each EM step.
Fisher's information matrix.
value (up to a constant) of the Q function. Used to perform likelihood ratio tests.
Q function MCMC sample.
a sample from the conditional distribution of the random effects given the data and the maximum likelihood estimators.
vector of observations.
design matrix for the fixed effects.
design matrix for the random effects.
relative error at the last iteration. See details.
last value for MCMC step size.
original call.
Felipe Acosta Archila <[email protected]>
Neath, R. C. (2012) On Convergence Properties of the Monte Carlo EM Algorithm In Advances in Modern Statistical Theory and Applications: A Festschrift in Honor of Morris L. Eaton. Institute of Mathematical Statistics 43–62
# Data set for a logistic model with one binary fixed effects and two # possible random effects. # Initial values and MC iterations are given to speed up the examples # but these are not necessary in general. set.seed(0123210) data(exdata) # To fit a model with one random effect fit.1 <- mcemGLMM(obs ~ x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(MCit = 10000), initial = c(0.27, -0.13, 0.003)) summary(fit.1) # We can assess convergence by looking at a trace plot of the EM estimates # and the loglikelihood values ts.plot(fit.1$mcemEST) ts.plot(fit.1$QfunVal) # To fit a model with crossed random effects fit.crossed <- mcemGLMM(obs ~ x, random = list(~ 0 + z1, ~ 0 + z2), data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(0.28, -0.15, 0.001, 0.4)) summary(fit.crossed) # To fit a model with crossed random effects fit.nested <- mcemGLMM(obs ~ x, random = list(~ 0 + z2, ~ 0 + z2:z1), data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(0.31, -0.15, 0.29, 0.27)) summary(fit.nested) # Fit a Poisson model fit.pois <- mcemGLMM(obs2 ~ x, random = ~ 0 + z1, data = exdata, family = "poisson", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(1.95, 0.03, 0.003)) summary(fit.pois)
# Data set for a logistic model with one binary fixed effects and two # possible random effects. # Initial values and MC iterations are given to speed up the examples # but these are not necessary in general. set.seed(0123210) data(exdata) # To fit a model with one random effect fit.1 <- mcemGLMM(obs ~ x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(MCit = 10000), initial = c(0.27, -0.13, 0.003)) summary(fit.1) # We can assess convergence by looking at a trace plot of the EM estimates # and the loglikelihood values ts.plot(fit.1$mcemEST) ts.plot(fit.1$QfunVal) # To fit a model with crossed random effects fit.crossed <- mcemGLMM(obs ~ x, random = list(~ 0 + z1, ~ 0 + z2), data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(0.28, -0.15, 0.001, 0.4)) summary(fit.crossed) # To fit a model with crossed random effects fit.nested <- mcemGLMM(obs ~ x, random = list(~ 0 + z2, ~ 0 + z2:z1), data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(0.31, -0.15, 0.29, 0.27)) summary(fit.nested) # Fit a Poisson model fit.pois <- mcemGLMM(obs2 ~ x, random = ~ 0 + z1, data = exdata, family = "poisson", vcDist = "normal", controlEM = list(EMit = 10, MCit = 10000), initial = c(1.95, 0.03, 0.003)) summary(fit.pois)
Given a model fitted with the function mcemGLMM
this function will add iterations and update the model estimates for more accurate results.
This is recommended if the initial fitting seems to have a large Monte Carlo error. This function will use the previous maximum likelihood estimate as its initial point and will also start with a Monte Carlo sample size equal to the sample size used in the last iteration of the previous fitting.
mcemGLMMext(object, it = 20, controlEM)
mcemGLMMext(object, it = 20, controlEM)
object |
an model fitted with |
it |
the maximum number of iterations to be performed. |
controlEM |
a list. New set of options for the EM algorithm. Can be missing |
An updated object of class mcemGLMM
.
If controlEM
is supplied it is important that the value for MCit
is at least equal to number of Monte Carlo iterations used in the last EM step to fit object
since providing a lower number will increase the Monte Carlo error.
Felipe Acosta Archila <[email protected]>
set.seed(72327) data(exdata) fit1 <- mcemGLMM(obs ~ z2 + x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(verb = FALSE, EMit = 5, MCit = 8000), initial = c(-0.13, -0.15, -0.21, 1.59, 0.002)) # Now we extend the algorithm to do at least another 10 iterations fit2 <- mcemGLMMext(fit1, it = 10)
set.seed(72327) data(exdata) fit1 <- mcemGLMM(obs ~ z2 + x, random = ~ 0 + z1, data = exdata, family = "bernoulli", vcDist = "normal", controlEM = list(verb = FALSE, EMit = 5, MCit = 8000), initial = c(-0.13, -0.15, -0.21, 1.59, 0.002)) # Now we extend the algorithm to do at least another 10 iterations fit2 <- mcemGLMMext(fit1, it = 10)
This functions returns predicted link function of observations for a model fitted with mcemGLMM
.
## S3 method for class 'mcemGLMM' predict(object, newdata, type = c("link", "response"), se.fit = FALSE, ...)
## S3 method for class 'mcemGLMM' predict(object, newdata, type = c("link", "response"), se.fit = FALSE, ...)
object |
a model fitted with the mcemGLMM function. |
newdata |
optional data frame with new data. The variable names must match the original variables. If missing, the function will return predicted values at each observation. |
type |
character string. The type of predictions to be returned. Either "link" or "response" predictions are available. |
se.fit |
logical. If true, standard errors will be returned. |
... |
additional arguments. |
A vector with the predictions from the observed data or by using the supplied new data.
Felipe Acosta Archila <[email protected]>
Extract the random effect predictions from a model fitted with mcemGLMM
.
## S3 method for class 'mcemGLMM' ranef(object, ...)
## S3 method for class 'mcemGLMM' ranef(object, ...)
object |
a model fitted with the mcemGLMM function. |
... |
additional arguments. |
A vector with the random effect predictions.
Felipe Acosta Archila <[email protected]>
This functions returns the residuals of a model fitted with mcemGLMM
.
## S3 method for class 'mcemGLMM' residuals(object, type = c("deviance", "pearson"), ...)
## S3 method for class 'mcemGLMM' residuals(object, type = c("deviance", "pearson"), ...)
object |
a model fitted with the mcemGLMM function. |
type |
character string. The type of residuals to be returned. |
... |
additional arguments. |
A vector with the residuals of the model.
Felipe Acosta Archila <[email protected]>
Salamander mating experiment data from McCullagh and Nelder.
data("salamander")
data("salamander")
A data frame with 60 observations on the following 4 variables.
Cross
the type of salamanders involved in the crossing.
Female
the female salamander's ID.
Male
the male salamander's ID.
Mate
whether the crossing was successful or not.
McCullagh, P and Nelder J. A. (1989) Generalized Linear Models, Second Edition. Chapman and Hall, 1989
Example data for logistic, Poisson and negative binomial models.
data("simData")
data("simData")
A data frame with 200 observations on the following 8 variables.
obs
a binary vector. Used as a response for a logistic model.
x1
a numeric vector.
x2
a numeric vector.
x3
a categorical vector with levels blue
, red
, and yellow
.
z1
a categorical vector with levels D1
, D2
, D3
, D4
, and D5
.
z2
a categorical vector with levels 1
, 2
, 3
, 4
, and 5
.
z3
a categorical vector with levels A
, B
, and D
.
count
a numeric vector. Used as a response for a Poisson model.
The levels of z2
can be nested within $z1
$. The observations were generated with the code shown in the examples section.
set.seed(47819) x1 <- rnorm(200, 10, 1) x2 <- rnorm(200, 5, 1) x3 <- sample(c("red", "blue", "yellow"), size = 200, replace = TRUE) z1 <- factor(rep(c("D1", "D2", "D3", "D4", "D5"), each = 40)) z2 <- factor(rep(rep(1:4, each = 5), 10)) z3 <- factor(c(rep("A", 100), rep("B", 60), rep("D", 40))) kX <- model.matrix(~x1 + x2 + x3) kZ <- cbind(model.matrix(~ 0+z1), model.matrix(~ 0+z1:z2), model.matrix(~ 0+z3)) kBeta <- c(5, -4, 5, 0, 8) kU <- 3 * rt(28, 5) linf0 <- kX prob0 <- exp(linf0)/(1+exp(linf0)) obs <- as.numeric(runif(100) < prob0) simData <- data.frame(obs, x1, x2, x3, z1, z2, z3)
set.seed(47819) x1 <- rnorm(200, 10, 1) x2 <- rnorm(200, 5, 1) x3 <- sample(c("red", "blue", "yellow"), size = 200, replace = TRUE) z1 <- factor(rep(c("D1", "D2", "D3", "D4", "D5"), each = 40)) z2 <- factor(rep(rep(1:4, each = 5), 10)) z3 <- factor(c(rep("A", 100), rep("B", 60), rep("D", 40))) kX <- model.matrix(~x1 + x2 + x3) kZ <- cbind(model.matrix(~ 0+z1), model.matrix(~ 0+z1:z2), model.matrix(~ 0+z3)) kBeta <- c(5, -4, 5, 0, 8) kU <- 3 * rt(28, 5) linf0 <- kX prob0 <- exp(linf0)/(1+exp(linf0)) obs <- as.numeric(runif(100) < prob0) simData <- data.frame(obs, x1, x2, x3, z1, z2, z3)
Summary for an object obtained from mcemGLMM
.
## S3 method for class 'mcemGLMM' summary(object, ...)
## S3 method for class 'mcemGLMM' summary(object, ...)
object |
a model fitted with the mcemGLMM function. |
... |
additional arguments. |
The function prints a table for Wald tests for the fixed effect
coefficients and the variance estimators. For the negative binomial
and the gamma distributions the estimate of is reported
with its respective standard error.
A list with the following items:
a list with the fixed effects coefficients and the predicted random effects.
the estimated variances for each variance component.
the standard errors for the fixed effects coefficients and the variance estimates.
z test values for the fixed effects coefficients and the variance estimators.
Felipe Acosta Archila <[email protected]>