Title: | Generalized Linear Mixed Model Trees |
---|---|
Description: | Recursive partitioning based on (generalized) linear mixed models (GLMMs) combining lmer()/glmer() from 'lme4' and lmtree()/glmtree() from 'partykit'. The fitting algorithm is described in more detail in Fokkema, Smits, Zeileis, Hothorn & Kelderman (2018; <DOI:10.3758/s13428-017-0971-x>). For detecting and modeling subgroups in growth curves with GLMM trees see Fokkema & Zeileis (2024; <DOI:10.3758/s13428-024-02389-1>). |
Authors: | Marjolein Fokkema [aut, cre] , Achim Zeileis [aut] |
Maintainer: | Marjolein Fokkema <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.2-6 |
Built: | 2024-12-06 06:56:59 UTC |
Source: | CRAN |
Model-based recursive partitioning based on mixed-effects beta regression.
betamertree(formula, data, family = NULL, weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, REML = TRUE, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, glmmTMB.control = glmmTMB::glmmTMBControl(), ...)
betamertree(formula, data, family = NULL, weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, REML = TRUE, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, glmmTMB.control = glmmTMB::glmmTMBControl(), ...)
formula |
formula specifying the response variable and a three-part right-hand-side describing the regressors, random effects, and partitioning variables, respectively. For details see below. |
data |
data.frame to be used for estimating the model tree. |
family |
currently not used. The default beta distribution parameterization
of package |
weights |
numeric. An optional numeric vector of weights. Can be a
name of a column in data or a vector of length |
cluster |
currently not used. |
ranefstart |
currently not used. |
offset |
optional numeric vector to be included in the linear predictor
with a coeffcient of one. Note that |
joint |
currently not used. Fixed effects from the tree are always (re-)estimated jointly along with the random effects. |
abstol |
numeric. The convergence criterion used for estimation of the model.
When the difference in log-likelihoods of the random-effects model from two
consecutive iterations is smaller than |
maxit |
numeric. The maximum number of iterations to be performed in estimation of the model tree. |
dfsplit |
logical or numeric. |
verbose |
Should the log-likelihood value of the estimated random-effects model be printed for every iteration of the estimation? |
plot |
Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function. |
REML |
logical scalar. Should the fixed-effects estimates be chosen to
optimize the REML criterion (as opposed to the log-likelihood)? Will be
passed to funtion |
glmmTMB.control |
list. An optional list with control
parameters to be passed to |
... |
Additional arguments to be passed to |
Function betamertree aims to learn a tree where each terminal node is associated with
different fixed-effects regression coefficients, while adjusting for global
random effects (such as a random intercept). It is a generalization of the ideas
underlying function glmertree
, to allow for detection of
subgroups with different fixed-effects parameter estimates, keeping the
random effects constant throughout the tree (i.e., random effects are
estimated globally). The estimation algorithm iterates between (1) estimation
of the tree given an offset of random effects, and (2) estimation of the
random effects given the tree structure. See Fokkema et al. (2018) for
a detailed description.
Where glmertree
uses function glmtree
from package partykit to
find the subgroups, and function glmer
from package lme4 to estimate
the mixed-effects model, betamertree
uses function betatree
from
package betareg to find the subgroups, and function glmmTMB
from
package package glmmTMB to estimate the mixed-effects model.
The code is experimental and will change in future versions.
The function returns a list with the following objects:
tree |
The final |
glmmTMB |
The final |
ranef |
The corresponding random effects of |
varcorr |
The corresponding |
variance |
The corresponding |
data |
The dataset specified with the |
loglik |
The log-likelihood value of the last iteration. |
iterations |
The number of iterations used to estimate the |
maxit |
The maximum number of iterations specified with the |
ranefstart |
The random effects used as an offset, as specified with
the |
formula |
The formula as specified with the |
randomformula |
The formula as specified with the |
abstol |
The prespecified value for the change in log-likelihood to evaluate
convergence, as specified with the |
mob.control |
A list containing control parameters passed to
|
glmmTMB.control |
A list containing control parameters passed to
|
joint |
Whether the fixed effects from the tree were (re-)estimated jointly along
with the random effects, specified with the |
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. doi:10.3758/s13428-017-0971-x
Fokkema M, Edbrooke-Childs J, Wolpert M (2021). “Generalized Linear Mixed-Model (GLMM) Trees: A Flexible Decision-Tree Method for Multilevel and Longitudinal Data.” Psychotherapy Research, 31(3), 329–341. doi:10.1080/10503307.2020.1785037
Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. doi:10.3758/s13428-024-02389-1
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
if (require("betareg") && require("glmmTMB")) { ## load example data data("ReadingSkills", package = "betareg") ## add random noise (not associated with reading scores) set.seed(1071) ReadingSkills$x1 <- rnorm(nrow(ReadingSkills)) ReadingSkills$x2 <- runif(nrow(ReadingSkills)) ReadingSkills$x3 <- factor(rnorm(nrow(ReadingSkills)) > 0) ReadingSkills$gr <- factor(rep(letters[1:5], length.out = nrow(ReadingSkills))) ## Fit beta mixed-effects regression tree betamer_form <- accuracy ~ iq | gr | dyslexia + x1 + x2 + x3 bmertree <- betamertree(betamer_form, data = ReadingSkills, minsize = 10) VarCorr(bmertree) ranef(bmertree) fixef(bmertree) coef(bmertree) plot(bmertree) predict(bmertree, newdata = ReadingSkills[1:5,]) predict(bmertree) ## see ?predict.glmmmTMB for other arguments that can be passed residuals(bmertree) ## see ?residuals.glmmmTMB for other arguments that can be passed }
if (require("betareg") && require("glmmTMB")) { ## load example data data("ReadingSkills", package = "betareg") ## add random noise (not associated with reading scores) set.seed(1071) ReadingSkills$x1 <- rnorm(nrow(ReadingSkills)) ReadingSkills$x2 <- runif(nrow(ReadingSkills)) ReadingSkills$x3 <- factor(rnorm(nrow(ReadingSkills)) > 0) ReadingSkills$gr <- factor(rep(letters[1:5], length.out = nrow(ReadingSkills))) ## Fit beta mixed-effects regression tree betamer_form <- accuracy ~ iq | gr | dyslexia + x1 + x2 + x3 bmertree <- betamertree(betamer_form, data = ReadingSkills, minsize = 10) VarCorr(bmertree) ranef(bmertree) fixef(bmertree) coef(bmertree) plot(bmertree) predict(bmertree, newdata = ReadingSkills[1:5,]) predict(bmertree) ## see ?predict.glmmmTMB for other arguments that can be passed residuals(bmertree) ## see ?residuals.glmmmTMB for other arguments that can be passed }
coef
and fixef
methods for (g)lmertree
objects.
## S3 method for class 'lmertree' coef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'lmertree' fixef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'glmertree' coef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'glmertree' fixef(object, which = "tree", drop = FALSE, ...)
## S3 method for class 'lmertree' coef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'lmertree' fixef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'glmertree' coef(object, which = "tree", drop = FALSE, ...) ## S3 method for class 'glmertree' fixef(object, which = "tree", drop = FALSE, ...)
object |
an object of class |
which |
character; |
drop |
logical. Only used when |
... |
Additional arguments, curretnly not used. |
The code is still under development and might change in future versions.
If type = "local"
, returns a matrix of estimated local fixed-effects
coefficients, with a row for every terminal node and a column for every
fixed effect. If type = "global"
, returns a numeric vector of
estimated global fixed-effects coefficients.
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. doi:10.3758/s13428-017-0971-x
Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. doi:10.3758/s13428-024-02389-1
lmertree
, glmertree
,
party-plot
.
## load artificial example data data("DepressionDemo", package = "glmertree") ## fit LMM tree with local fixed effects only lt <- lmertree(depression ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) coef(lt) ## fit LMM tree including both local and global fixed effect lt <- lmertree(depression ~ treatment | (age + (1|cluster)) | anxiety + duration, data = DepressionDemo) coef(lt, which = "tree") # default behaviour coef(lt, which = "global") ## fit GLMM tree with local fixed effects only gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) coef(gt) ## fit GLMM tree including both local and global fixed effect gt <- glmertree(depression_bin ~ treatment | (age + (1|cluster)) | anxiety + duration, data = DepressionDemo) coef(gt, which = "tree") # default behaviour coef(gt, which = "global")
## load artificial example data data("DepressionDemo", package = "glmertree") ## fit LMM tree with local fixed effects only lt <- lmertree(depression ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) coef(lt) ## fit LMM tree including both local and global fixed effect lt <- lmertree(depression ~ treatment | (age + (1|cluster)) | anxiety + duration, data = DepressionDemo) coef(lt, which = "tree") # default behaviour coef(lt, which = "global") ## fit GLMM tree with local fixed effects only gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) coef(gt) ## fit GLMM tree including both local and global fixed effect gt <- glmertree(depression_bin ~ treatment | (age + (1|cluster)) | anxiety + duration, data = DepressionDemo) coef(gt, which = "tree") # default behaviour coef(gt, which = "global")
Performs cross-validation of a model-based recursive partition based on (generalized) linear mixed models. Using the tree or subgroup structure estimated from a training dataset, the full mixed-effects model parameters are re-estimated using a new set of test observations, providing valid computation of standard errors and valid inference. The approach is inspired by Athey & Imbens (2016), and "enables the construction of valid confidence intervals [...] whereby one sample is used to construct the partition and another to estimate [...] effects for each subpopulation."
cv.lmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...) cv.glmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...)
cv.lmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...) cv.glmertree(tree, newdata, reference = NULL, omit.intercept = FALSE, ...)
tree |
An object of class |
newdata |
A |
reference |
Numeric or character scalar, indicating the number of the terminal node of
which the intercept should be taken as a reference for intercepts in all other nodes.
If |
omit.intercept |
Logical scalar, indicating whether the intercept should be omitted from the model.
The default ( |
... |
Not currently used. |
The approach is inspired by Athey & Imbens (2016), and "enables the construction of valid confidence intervals [...] whereby one sample is used to construct the partition and another to estimate [...] effects for each subpopulation."
An object of with classes lmertree
and cv.lmertree
, or glmertree
and cv.glmertree
. It is the original (g)lmertree specified by argument tree
, but the parametric model model estimated based on the data specified by argument newdata
. The default S3 methods for classes lmertree
and glmertree
can be used to inspect the results: plot
, predict
, coef
, fixef
, ranef
and VarCorr
. In addition, there is a dedicated summary
method for classes cv.lmertree
and cv.glmertree
, which prints valid parameter estimates and standard errors, resulting from summary.merMod
. For objects of clas cv.lmertree
, hypothesis tests (i.e., p-values) can be obtained by loading package lmerTest
PRIOR to loading package(s) glmertree
(and lme4
), see examples.
Athey S, Imbens G (2016). “Recursive Partitioning for Heterogeneous Causal Effects.” Proceedings of the National Academy of Sciences, 113(27), 7353–7360. doi:10.1073/pnas.1510489113
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. doi:10.3758/s13428-017-0971-x
Fokkema M, Edbrooke-Childs J, Wolpert M (2021). “Generalized Linear Mixed-Model (GLMM) Trees: A Flexible Decision-Tree Method for Multilevel and Longitudinal Data.” Psychotherapy Research, 31(3), 329–341. doi:10.1080/10503307.2020.1785037
Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. doi:10.3758/s13428-024-02389-1
lmer
, glmer
,
lmertree
, glmertree
,
summary.merMod
require("lmerTest") ## load BEFORE lme4 and glmertree to obtain hypothesis tests / p-values ## Create artificial training and test datasets set.seed(42) train <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE) test <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE) ## Fit tree on training data tree1 <- lmertree(depression ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo[train, ]) ## Obtain honest estimates of parameters and standard errors using test data tree2 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ]) tree3 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ], reference = 7, omit.intercept = TRUE) summary(tree2) summary(tree3) coef(tree1) coef(tree2) coef(tree3) plot(tree1, which = "tree") plot(tree2, which = "tree") plot(tree3, which = "tree") predict(tree1, newdata = DepressionDemo[1:5, ]) predict(tree2, newdata = DepressionDemo[1:5, ])
require("lmerTest") ## load BEFORE lme4 and glmertree to obtain hypothesis tests / p-values ## Create artificial training and test datasets set.seed(42) train <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE) test <- sample(1:nrow(DepressionDemo), size = 200, replace = TRUE) ## Fit tree on training data tree1 <- lmertree(depression ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo[train, ]) ## Obtain honest estimates of parameters and standard errors using test data tree2 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ]) tree3 <- cv.lmertree(tree1, newdata = DepressionDemo[test, ], reference = 7, omit.intercept = TRUE) summary(tree2) summary(tree3) coef(tree1) coef(tree2) coef(tree3) plot(tree1, which = "tree") plot(tree2, which = "tree") plot(tree3, which = "tree") predict(tree1, newdata = DepressionDemo[1:5, ]) predict(tree2, newdata = DepressionDemo[1:5, ])
Simulated dataset of a randomized clinical trial (N = 150) to illustrate fitting of (G)LMM trees.
data("DepressionDemo")
data("DepressionDemo")
A data frame containing 150 observations on 6 variables:
numeric. Continuous treatment outcome variable (range: 3-16, M = 9.12, SD = 2.66).
factor. Binary treatment variable.
factor. Indicator for cluster with 10 levels.
numeric. Continuous partitioning variable (range: 18-69, M = 45, SD = 9.56).
numeric. Continuous partitioning variable (range: 3-18, M = 10.26, SD = 3.05).
numeric. Continuous partitioning variable (range: 1-17, M = 6.97, SD = 2.90).
factor. Binarized treatment outcome variable (0 = recovered, 1 = not recovered).
The data were generated such that the duration and anxiety covariates
characterized three subgroups with differences in treatment effects. The
cluster
variable was used to introduce a random intercept that should
be accounted for. The treatment outcome is an index of depressive symptomatology.
data("DepressionDemo", package = "glmertree") summary(DepressionDemo) lt <- lmertree(depression ~ treatment | cluster | anxiety + duration + age, data = DepressionDemo) plot(lt) gt <- glmertree(depression_bin ~ treatment | cluster | anxiety + duration + age, data = DepressionDemo) plot(gt)
data("DepressionDemo", package = "glmertree") summary(DepressionDemo) lt <- lmertree(depression ~ treatment | cluster | anxiety + duration + age, data = DepressionDemo) plot(lt) gt <- glmertree(depression_bin ~ treatment | cluster | anxiety + duration + age, data = DepressionDemo) plot(gt)
Model-based recursive partitioning based on (generalized) linear mixed models.
lmertree(formula, data, weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, REML = TRUE, lmer.control = lmerControl(), ...) glmertree(formula, data, family = "binomial", weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, nAGQ = 1L, glmer.control = glmerControl(), ...)
lmertree(formula, data, weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, REML = TRUE, lmer.control = lmerControl(), ...) glmertree(formula, data, family = "binomial", weights = NULL, cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE, abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, plot = FALSE, nAGQ = 1L, glmer.control = glmerControl(), ...)
formula |
formula specifying the response variable and a three-part right-hand-side describing the regressors, random effects, and partitioning variables, respectively. For details see below. |
data |
data.frame to be used for estimating the model tree. |
family |
family specification for |
weights |
numeric. An optional numeric vector of weights. Can be a
name of a column in data or a vector of length |
cluster |
optional vector of cluster IDs to be employed for clustered
covariances in the parameter stability tests. Can be a name of a column
in |
ranefstart |
|
offset |
optional numeric vector to be included in the linear predictor
with a coeffcient of one. Note that |
joint |
logical. Should the fixed effects from the tree be (re-)estimated jointly along with the random effects? |
abstol |
numeric. The convergence criterion used for estimation of the model.
When the difference in log-likelihoods of the random-effects model from two
consecutive iterations is smaller than |
maxit |
numeric. The maximum number of iterations to be performed in estimation of the model tree. |
dfsplit |
logical or numeric. |
verbose |
Should the log-likelihood value of the estimated random-effects model be printed for every iteration of the estimation? |
plot |
Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function. |
REML |
logical scalar. Should the fixed-effects estimates be chosen to
optimize the REML criterion (as opposed to the log-likelihood)? Will be
passed to funtion |
nAGQ |
integer scalar. Specifies the number of points per axis for evaluating
the adaptive Gauss-Hermite approximation to the log-likelihood, to be passed
to function |
lmer.control , glmer.control
|
list. An optional list with control
parameters to be passed to |
... |
Additional arguments to be passed to |
(G)LMM trees learn a tree where each terminal node is associated with different fixed-effects regression coefficients while adjusting for global random effects (such as a random intercept). This allows for detection of subgroups with different fixed-effects parameter estimates, keeping the random effects constant throughout the tree (i.e., random effects are estimated globally). The estimation algorithm iterates between (1) estimation of the tree given an offset of random effects, and (2) estimation of the random effects given the tree structure. See Fokkema et al. (2018) for a detailed introduction.
To specify all variables in the model a formula
such as
y ~ x1 + x2 | random | z1 + z2 + z3
is used, where y
is the
response, x1
and x2
are the regressors in every node of the
tree, random
is the random effects, and z1
to z3
are
the partitioning variables considered for growing the tree. If random
is only a single variable such as id
a random intercept with respect
to id
is used. Alternatively, it may be an explicit random-effects
formula such as (1 | id)
or a more complicated formula such as
((1+time) | id)
. (Note that in the latter two formulas, the brackets
are necessary to protect the pipes in the random-effects formulation.)
In the random-effects model from step (2), two strategies are available:
Either the fitted values from the tree can be supplied as an offset
(joint = FALSE
) so that only the random effects are estimated.
Or the fixed effects are (re-)estimated along with the random effects
using a nesting factor with nodes from the tree (joint = TRUE
).
In the former case, the estimation of each random-effects model is typically
faster, but more iterations are required.
The code is still under development and might change in future versions.
The function returns a list with the following objects:
tree |
The final |
lmer |
The final |
ranef |
The corresponding random effects of |
varcorr |
The corresponding |
variance |
The corresponding |
data |
The dataset specified with the |
loglik |
The log-likelihood value of the last iteration. |
iterations |
The number of iterations used to estimate the |
maxit |
The maximum number of iterations specified with the |
ranefstart |
The random effects used as an offset, as specified with
the |
formula |
The formula as specified with the |
randomformula |
The formula as specified with the |
abstol |
The prespecified value for the change in log-likelihood to evaluate
convergence, as specified with the |
mob.control |
A list containing control parameters passed to
|
lmer.control |
A list containing control parameters passed to
|
joint |
Whether the fixed effects from the tree were (re-)estimated jointly along
with the random effects, specified with the |
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. doi:10.3758/s13428-017-0971-x
Fokkema M, Edbrooke-Childs J, Wolpert M (2021). “Generalized Linear Mixed-Model (GLMM) Trees: A Flexible Decision-Tree Method for Multilevel and Longitudinal Data.” Psychotherapy Research, 31(3), 329–341. doi:10.1080/10503307.2020.1785037
Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. doi:10.3758/s13428-024-02389-1
plot.lmertree
, plot.glmertree
,
cv.lmertree
, cv.glmertree
,
GrowthCurveDemo
,
lmer
, glmer
,
lmtree
, glmtree
## artificial example data data("DepressionDemo", package = "glmertree") ## fit normal linear regression LMM tree for continuous outcome lt <- lmertree(depression ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) print(lt) plot(lt, which = "all") # default behavior, may also be "tree" or "ranef" coef(lt) ranef(lt) predict(lt, type = "response") # default behavior, may also be "node" predict(lt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod residuals(lt) VarCorr(lt) # see lme4::VarCorr ## fit logistic regression GLMM tree for binary outcome gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) print(gt) plot(gt, which = "all") # default behavior, may also be "tree" or "ranef" coef(gt) ranef(gt) predict(gt, type = "response") # default behavior, may also be "node" or "link" predict(gt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod residuals(gt) VarCorr(gt) # see lme4::VarCorr ## Alternative specification for binomial family: no. of successes and failures DepressionDemo$failures <- as.numeric(DepressionDemo$depression_bin) - 1 DepressionDemo$successes <- 1 - DepressionDemo$failures gt <- glmertree(cbind(failures, successes) ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo, ytype = "matrix") ## see also ?partykit::mob_control
## artificial example data data("DepressionDemo", package = "glmertree") ## fit normal linear regression LMM tree for continuous outcome lt <- lmertree(depression ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) print(lt) plot(lt, which = "all") # default behavior, may also be "tree" or "ranef" coef(lt) ranef(lt) predict(lt, type = "response") # default behavior, may also be "node" predict(lt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod residuals(lt) VarCorr(lt) # see lme4::VarCorr ## fit logistic regression GLMM tree for binary outcome gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo) print(gt) plot(gt, which = "all") # default behavior, may also be "tree" or "ranef" coef(gt) ranef(gt) predict(gt, type = "response") # default behavior, may also be "node" or "link" predict(gt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod residuals(gt) VarCorr(gt) # see lme4::VarCorr ## Alternative specification for binomial family: no. of successes and failures DepressionDemo$failures <- as.numeric(DepressionDemo$depression_bin) - 1 DepressionDemo$successes <- 1 - DepressionDemo$failures gt <- glmertree(cbind(failures, successes) ~ treatment | cluster | age + anxiety + duration, data = DepressionDemo, ytype = "matrix") ## see also ?partykit::mob_control
Artificial dataset to illustrate fitting of LMM trees with growth curve models in the terminal nodes.
data("GrowthCurveDemo")
data("GrowthCurveDemo")
A data frame containing 1250 repeated observations on 250 persons. x1 - x8 are time-invariant partitioning variables. Thus, they are measurements on the person (i.e., cluster) level, not on the individual observation level.
numeric. Indicator linking repeated measurements to persons.
factor. Indicator for timepoint.
numeric. Response variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
numeric. Potential partitioning variable.
Data were generated so that x1
, x2
and x3
are
true partitioning variables, x4
through x8
are noise
variables. The (potential) partitioning variables are time invariant.
Time-varying covariates can also be included in the model. For partitioning
growth curves these should probably not be potential partitioning variables,
as this could result in observations from the same person ending up in
different terminal nodes. Thus, time-varying covariates are probably
best included as predictors in the node-specific regression model. E.g.:
y ~ time + timevarying_cov | person | x1 + x2 + x3 + x4
.
Fokkema M & Zeileis A (2024). Subgroup detection in linear growth curve models with generalized linear mixed model (GLMM) trees. Behavior Research Methods. doi:10.3758/s13428-024-02389-1
data("GrowthCurveDemo", package = "glmertree") head(GrowthCurveDemo) ## Fit LMM tree with a random intercept w.r.t. person: form <- y ~ time | person | x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 lt.default <- lmertree(form, data = GrowthCurveDemo) plot(lt.default, which = "tree") ## yields too large tree VarCorr(lt.default) ## Account for measurement level of the partitioning variables: lt.cluster <- lmertree(form, cluster = person, data = GrowthCurveDemo) plot(lt.cluster, which = "tree") ## yields correct tree plot(lt.cluster, which = "growth") ## plot individual growth curves not datapoints coef(lt.cluster) ## node-specific fixed effects VarCorr(lt.cluster) ## with smaller trees random effects explain more variance ## Fit LMM tree with random intercept and random slope of time w.r.t. person: form.s <- y ~ time | (1 + time | person) | x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 lt.s.cluster <- lmertree(form.s, cluster = person, data = GrowthCurveDemo) plot(lt.s.cluster, which = "tree") ## same tree as before coef(lt.cluster) VarCorr(lt.s.cluster)
data("GrowthCurveDemo", package = "glmertree") head(GrowthCurveDemo) ## Fit LMM tree with a random intercept w.r.t. person: form <- y ~ time | person | x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 lt.default <- lmertree(form, data = GrowthCurveDemo) plot(lt.default, which = "tree") ## yields too large tree VarCorr(lt.default) ## Account for measurement level of the partitioning variables: lt.cluster <- lmertree(form, cluster = person, data = GrowthCurveDemo) plot(lt.cluster, which = "tree") ## yields correct tree plot(lt.cluster, which = "growth") ## plot individual growth curves not datapoints coef(lt.cluster) ## node-specific fixed effects VarCorr(lt.cluster) ## with smaller trees random effects explain more variance ## Fit LMM tree with random intercept and random slope of time w.r.t. person: form.s <- y ~ time | (1 + time | person) | x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 lt.s.cluster <- lmertree(form.s, cluster = person, data = GrowthCurveDemo) plot(lt.s.cluster, which = "tree") ## same tree as before coef(lt.cluster) VarCorr(lt.s.cluster)
Artificial dataset of treatment outcomes (N = 3739) of 13 mental-health services to illustrate fitting of (G)LMM trees with constant fits in terminal nodes.
data("MHserviceDemo")
data("MHserviceDemo")
A data frame containing 3739 observations on 8 variables:
numeric. Variable representing age in years (range: 4.8 - 23.6, M = 11.46).
numeric. Continuous variable representing severity of and impairment due to mental-health problems at baseline. Higher values indicate higher severity and impairment.
factor. Indicator for gender.
factor. Indicator for presence of emotional disorder at baseline.
factor. Indicator for presence of autistic disorder at baseline.
factor. Indicator for mental-health service provider.
factor. Binarized treatment outcome variable (0 = recovered, 1 = not recovered.
numeric. Variable representing treatment outcome as measured by a total mental-health difficulties score assessed about 6 months after baseline, corrected for the baseline assessment. Higher values indicate poorer outcome.
Dataset was modelled after Edbrooke-Childs et al. (2017), who analyzed a sample of $N = 3,739$ young people who received treatment at one of 13 mental-health service providers in the UK. Note that the data were artificially generated and do not reflect actual empirical findings.
Fokkema M, Edbrooke-Childs J & Wolpert M (2021). “Generalized linear mixed-model (GLMM) trees: A flexible decision-tree method for multilevel and longitudinal data.” Psychotherapy Research, 31(3), 329-341. doi:10.1080/10503307.2020.1785037
data("MHserviceDemo", package = "glmertree") summary(MHserviceDemo) lt <- lmertree(outcome ~ 1 | cluster_id | age + gender + emotional + autism + impact + conduct, data = MHserviceDemo) plot(lt) gt <- glmertree(factor(outcome > 0) ~ 1 | cluster_id | age + gender + emotional + autism + impact + conduct, data = MHserviceDemo, family = "binomial") plot(gt)
data("MHserviceDemo", package = "glmertree") summary(MHserviceDemo) lt <- lmertree(outcome ~ 1 | cluster_id | age + gender + emotional + autism + impact + conduct, data = MHserviceDemo) plot(lt) gt <- glmertree(factor(outcome > 0) ~ 1 | cluster_id | age + gender + emotional + autism + impact + conduct, data = MHserviceDemo, family = "binomial") plot(gt)
plot
method for (g)lmertree
objects.
## S3 method for class 'lmertree' plot(x, which = "all", nodesize_level = 1L, cluster = NULL, ask = TRUE, type = "extended", observed = TRUE, fitted = "combined", tp_args = list(), drop_terminal = TRUE, terminal_panel = NULL, dotplot_args = list(), ...) ## S3 method for class 'glmertree' plot(x, which = "all", nodesize_level = 1L, cluster = NULL, ask = TRUE, type = "extended", observed = TRUE, fitted = "combined", tp_args = list(), drop_terminal = TRUE, terminal_panel = NULL, dotplot_args = list(), ...)
## S3 method for class 'lmertree' plot(x, which = "all", nodesize_level = 1L, cluster = NULL, ask = TRUE, type = "extended", observed = TRUE, fitted = "combined", tp_args = list(), drop_terminal = TRUE, terminal_panel = NULL, dotplot_args = list(), ...) ## S3 method for class 'glmertree' plot(x, which = "all", nodesize_level = 1L, cluster = NULL, ask = TRUE, type = "extended", observed = TRUE, fitted = "combined", tp_args = list(), drop_terminal = TRUE, terminal_panel = NULL, dotplot_args = list(), ...)
x |
an object of class |
which |
character; |
nodesize_level |
numeric. At which grouping level should sample size
printed above each terminal node be computed? Defaults to 1, which
is the lowest level of observation. If a value of 2 is specified,
sample size at the cluster level will be printed above each terminal node.
This only works if |
cluster |
vector of cluster ids. Only used if |
ask |
logical. Should user be asked for input, before a new figure is drawn? |
type |
character; |
observed |
logical. Should observed datapoints be plotted in the tree?
Defaults to |
fitted |
character. |
tp_args |
list of arguments to be passed to panel generating function
|
drop_terminal |
logical. Should all terminal nodes be plotted at the bottom of the plot? |
terminal_panel |
an optional panel generating function to be passed to
|
dotplot_args |
Optional list of additional arguments to be passed to
|
... |
Additional arguments to be passed to |
If the node-specific model of the (g)lmertree
object specified by
argument x
is an intercept-only model, observed data distributions
will be plotted in the terminal nodes of the tree (using
node_barplot
(for categorical responses) or
node_boxplot
(for numerical responses). Otherwise,
fitted values will be plotted, in addition to observed datapoints, using a
function taking similar arguments as node_bivplot
.
Exceptions:
If fitted = "marginal"
, fitted values will be plotted by assuming the
mean (continuous predictors) or mode (categorical predictors) for all
predictor variables, except the variable on the x-axis of the current plot.
If which = "growth"
, individual growth curves will be plotted as thin grey
lines in the terminal nodes, while the node-specific fixed effect will be plotted
on top of that as a thicker red curve.
If which = "tree.coef"
), caterpillar plot(s) are created for the local
(node-specific) fixed effects. These depict the estimated fixed-effects
coefficients with 95% confidence intervals, but note that these CIs do not
account for the searching of the tree structure and are therefore likely
too narrow. There is currently no way to adjust CIs for searching of the tree
structure, but the CIs can be useful to obtain an indication of the variability
of the coefficient estimates, not for statistical significance testing.
If which = "ranef"
or "all"
, caterpillar plot(s) for the random
effect(s) created, depicting the predicted random effects with 95% confidence
intervals. See also ranef
for more info. Note that the CIs
do not account for the searching of the tree structure and may be too narrow.
If users want to specify custom panel generating functions, it might be best to
not use the plotting method for (g)lmertrees. Instead, extract the (g)lmtree from
the fitted (g)lmertree object (which is a list containing, amongst others, an element
$tree
). On this tree, most of the customization options from
party-plot
can then be applied.
The code is still under development and might change in future versions.
Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016–2034. doi:10.3758/s13428-017-0971-x
Fokkema M, Zeileis A (2024). “Subgroup Detection in Linear Growth Curve Models with Generalized Linear Mixed Model (GLMM) Trees.” Behavior Research Methods, 56(7), 6759–6780. doi:10.3758/s13428-024-02389-1
lmertree
, glmertree
,
party-plot
.
## load artificial example data data("DepressionDemo", package = "glmertree") ## fit linear regression LMM tree for continuous outcome lt <- lmertree(depression ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) plot(lt) plot(lt, type = "simple") plot(lt, which = "tree", fitted = "combined") plot(lt, which = "tree", fitted = "none") plot(lt, which = "tree", observed = FALSE) plot(lt, which = "tree.coef") plot(lt, which = "ranef") ## fit logistic regression GLMM tree for binary outcome gt <- glmertree(depression_bin ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) plot(gt) plot(gt, type = "simple") plot(gt, which = "tree", fitted = "combined") plot(gt, which = "tree", fitted = "none") plot(gt, which = "tree.coef") plot(gt, which = "ranef")
## load artificial example data data("DepressionDemo", package = "glmertree") ## fit linear regression LMM tree for continuous outcome lt <- lmertree(depression ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) plot(lt) plot(lt, type = "simple") plot(lt, which = "tree", fitted = "combined") plot(lt, which = "tree", fitted = "none") plot(lt, which = "tree", observed = FALSE) plot(lt, which = "tree.coef") plot(lt, which = "ranef") ## fit logistic regression GLMM tree for binary outcome gt <- glmertree(depression_bin ~ treatment + age | cluster | anxiety + duration, data = DepressionDemo) plot(gt) plot(gt, type = "simple") plot(gt, which = "tree", fitted = "combined") plot(gt, which = "tree", fitted = "none") plot(gt, which = "tree.coef") plot(gt, which = "ranef")