The package HLMdiag
was created in order to provide a
unified framework for analysts to diagnose hierarchical linear models
(HLMs). When HLMdiag
was created in 2014, it made
diagnostic procedures available for HLMs that had not yet been
implemented in statistical software. Over the past 6 years, other
packages have gradually implemented some of these procedures; however,
diagnosing a model still often requires multiple packages, each of which
has its own syntax and quirks. HLMdiag
provides diagnostic
tools targeting all aspects and levels of hierarchical linear models in
a single package. In this update, HLMdiag
focuses on
usability in its functions. Drawing inspiration from the
augment()
function in the broom
package, the
new functions hlm_resid()
, hlm_influence()
,
and hlm_augment()
append diagnostic information to a
model’s data frame, allowing the analyst to easily work with and see how
residuals and influence diagnostics relate to the original
variables.
This vignette introduces new functions concerning residual
diagnostics in HLMdiag
, specifically
hlm_resid()
, pull_resid()
, and
hlm_augment()
. The main focus of this vignette,
hlm_resid()
, is intended to replace HLMresid()
in functionality, and works more robustly with 3-level models as well as
models fit with nlme
. Additionally,
hlm_resid()
appends residuals and fitted values to a
model’s data frame in the form of a tibble. Tibbles help the analyst
diagnose models by simplifying the creation of diagnostic plots,
especially within ggplot2
, and working seamlessly with
sorting functions in dplyr
.
For information about influence diagnostics such as Cook’s distance
and leverage in HLMs, the reader should see the vignette for
hlm_influence()
.
The function hlm_resid()
takes a hierarchical linear
model fit as a lmerMod
or lme
object and
extracts residuals and predicted values from the model using both Least
Squares (LS) and Empirical Bayes (EB) methods in estimating parameters.
Whereas the old function, HLMresid()
, would return a vector
with a single type of residual, hlm_resid()
appends
specified residuals to the data frame of the model in the form of a
tibble. The use of a tibble allows the analyst to easily plot residuals
against explanatory variables and identify possible outliers or model
deficiencies. The function draws heavy inspiration from the
augment
function of the broom
package, however
offers more options for the types of residuals that the analyst may want
to use.
The presence of multiple sources of variability in hierarchical
linear models results in numerous quantities defining residuals. All
functions in HLMdiag
follow the classification by
Hilden-Minton (1995) and define three types of
residuals:
The definitions of level-1 and higher-level residuals lead to
different residuals depending on how the fixed and random model
coefficients are estimated. hlm_resid()
implements two
estimation methods: Least Squares estimation and Empirical Bayes
estimation. For a comprehensive discussion of these two methods and how
they are implemented, we direct the reader to Loy and Hoffman
(2014).
LS and EB residuals each have their own advantages and disadvantages. Level-1 LS residuals are calculated by fitting separate linear models to each group and using LS to estimate the fixed and random effects. Because they rely only on the first level of the hierarchy, they are unconfounded with higher-level residuals and a useful first step in diagnosing a model; however, for small group sample sizes, LS residuals are unreliable. Level-1 EB residuals are defined as the conditional modes of the random effects given the data and the estimated parameter values, which are calculated with (restricted) maximum likelihood. They are confounded with higher-level residuals, but more robust with small sample sizes. For higher-level residuals, coefficients can again be estimated using either LS or EB, but EB residuals are generally preferred due to small group sizes.
To illustrate the use of hlm_resid()
, we make use of
data on exam scores of 4,059 students in 65 inner London schools. This
data set is distributed as part of the R package mlmRev
(Bates, Maechler, Bolker 2013), which makes well known
multilevel modeling data sets available in R.
data("Exam", package = "mlmRev")
head(Exam)
#> school normexam schgend schavg vr intake standLRT sex type
#> 1 1 0.2613242 mixed 0.1661752 mid 50% bottom 25% 0.6190592 F Mxd
#> 2 1 0.1340672 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
#> 3 1 -1.7238820 mixed 0.1661752 mid 50% top 25% -1.3645760 M Mxd
#> 4 1 0.9675862 mixed 0.1661752 mid 50% mid 50% 0.2058022 F Mxd
#> 5 1 0.5443412 mixed 0.1661752 mid 50% mid 50% 0.3711052 F Mxd
#> 6 1 1.7348992 mixed 0.1661752 mid 50% bottom 25% 2.1894372 M Mxd
#> student
#> 1 143
#> 2 145
#> 3 142
#> 4 141
#> 5 138
#> 6 155
For each student, the data consist of their gender (sex
)
and two standardized exam scores— an intake score on the London Reading
Test (LRT) at age 11 (standLRT
) and a score on the General
Certificate of Secondary Education (GCSE) examination at age 16
(normexam
). Additionally, the students’ LRT scores were
used to segment students into three categories (bottom 25%, middle 50%,
and top 25%) based on their verbal reasoning subscore (vr
)
and overall score (intake)
. At the school level, the data
contain the average intake score for the school (schavg
)
and type based on school gender (schgend
, coded as mixed,
boys, or girls).
We illustrate usage for hlm_resid()
below using the exam
data and fitting an initial two-level HLM using a student’s standardized
London Reading Test intake score (standLRT) to explain their GCSE exam
score, allowing for a random intercept for each school:
fm1 <- lme4::lmer(normexam ~ standLRT + (1 | school), Exam, REML = FALSE)
fm1
#> Linear mixed model fit by maximum likelihood ['lmerMod']
#> Formula: normexam ~ standLRT + (1 | school)
#> Data: Exam
#> AIC BIC logLik deviance df.resid
#> 9365.243 9390.478 -4678.622 9357.243 4055
#> Random effects:
#> Groups Name Std.Dev.
#> school (Intercept) 0.3035
#> Residual 0.7522
#> Number of obs: 4059, groups: school, 65
#> Fixed Effects:
#> (Intercept) standLRT
#> 0.002391 0.563371
hlm_resid()
takes 5 arguments:
object
- the model fit as a lmerMod
or
lme
object.
level
- the level at which residuals should be
extracted. By default, level = 1
, returning both EB and LS
level-1 residuals as well as marginal residuals. level
can
also be set to a grouping factor as defined in
names(object@flist)
in lme4
or in
names(object$groups)
in nlme
. When set to a
grouping factor, hlm_resid()
returns both EB and LS
higher-level residuals, otherwise known as the random effects for each
group.
standardize
- a logical indicating if the returned
residuals should be standardized. By default,
standardize = FALSE
, returning unstandardized residuals.
When standardize = TRUE
, residuals are divided by the
estimated standard error. For marginal residuals, when
standardize = TRUE
, the Cholesky residuals are
returned.
include.ls
- a logical indicating if LS residuals
should be returned. By default, include.ls = TRUE
. The LS
method of estimating residuals is more computationally intensive than
the EB method. Consequently, setting include.ls = FALSE
substantially decreases the runtime on models with many
observations.
data
- only necessary if
na.action = na.exclude
for a lmerMod
model
object. data
should specify the data frame used to fit the
model containing observations with NA
values.
To extract unstandardized level-1 residuals and marginal residuals, we use the following call.
library(HLMdiag)
hlm_resid(fm1)
#> # A tibble: 4,059 × 10
#> id normexam standLRT school .resid .fitted .ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.464 0.725 -0.561 0.822
#> 2 2 0.134 0.206 1 -0.358 0.492 -0.395 0.529
#> 3 3 -1.72 -1.36 1 -1.33 -0.393 -1.14 -0.585
#> 4 4 0.968 0.206 1 0.475 0.492 0.438 0.529
#> 5 5 0.544 0.371 1 -0.0409 0.585 -0.102 0.647
#> 6 6 1.73 2.19 1 0.125 1.61 -0.201 1.94
#> 7 7 1.04 -1.12 1 1.29 -0.253 1.45 -0.409
#> 8 8 -0.129 -1.03 1 0.0773 -0.206 0.221 -0.350
#> 9 9 -0.939 -0.538 1 -1.01 0.0730 -0.941 0.00167
#> 10 10 -1.22 -1.45 1 -0.780 -0.439 -0.576 -0.643
#> # ℹ 4,049 more rows
#> # ℹ 2 more variables: .mar.resid <dbl>, .mar.fitted <dbl>
hlm_resid()
appends 6 columns to the model frame:
.resid
the unstandardized Empirical Bayes (EB) level-1
residuals for each observation.fitted
the Empirical Bayes (EB) fitted values.ls.resid
the unstandardized Least Squares (LS) level-1
residuals.ls.fitted
the Least Squares fitted values.mar.resid
the unstandardized marginal residuals.mar.fitted
the marginal fitted valuesNote that the columns containing the Empirical Bayes residuals are
simply named .resid
and .fitted
. This is
because both lme4::resid()
and nlme::resid()
use the EB method in estimating residuals, thus the analyst is likely
more familiar with this type of residual.
We can alter the output of the level-1 residuals using the
standardize
and include.ls
parameters
introduced above. The following call returns both the level-1 and
marginal residuals divided by the estimated standard error and excludes
the least squares residuals.
hlm_resid(fm1, standardize = TRUE, include.ls = FALSE)
#> # A tibble: 4,059 × 8
#> id normexam standLRT school .std.resid .fitted .chol.mar.resid .mar.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.616 0.725 -0.111 0.351
#> 2 2 0.134 0.206 1 -0.476 0.492 0.0353 0.118
#> 3 3 -1.72 -1.36 1 -1.77 -0.393 -1.19 -0.766
#> 4 4 0.968 0.206 1 0.632 0.492 1.21 0.118
#> 5 5 0.544 0.371 1 -0.0544 0.585 0.445 0.211
#> 6 6 1.73 2.19 1 0.167 1.61 0.618 1.24
#> 7 7 1.04 -1.12 1 1.72 -0.253 2.06 -0.627
#> 8 8 -0.129 -1.03 1 0.103 -0.206 0.352 -0.580
#> 9 9 -0.939 -0.538 1 -1.35 0.0730 -1.07 -0.301
#> 10 10 -1.22 -1.45 1 -1.04 -0.439 -0.705 -0.813
#> # ℹ 4,049 more rows
hlm_resid()
appends the EB and marginal residual columns
to the mode frame. The names of the columns containing residuals now
have the prefix .std
to reflect the fact that they are
standardized. Note that standardized marginal residuals are equivalent
to the Cholesky residuals of a HLM and thus are named
.chol.mar.resid
.
To access higher-level residuals, we can set the level
parameter to a grouping factor as defined in
names(object@flist)
in lme4
or in
names(object$groups)
in nlme
.
To extract the school-level residuals, otherwise known as the predicted random effects for school, we use the following call.
hlm_resid(fm1, level = "school")
#> # A tibble: 65 × 3
#> school .ranef.intercept .ls.intercept
#> <chr> <dbl> <dbl>
#> 1 1 0.374 0.451
#> 2 2 0.502 0.550
#> 3 3 0.504 0.626
#> 4 4 0.0181 0.0719
#> 5 5 0.240 0.329
#> 6 6 0.541 0.671
#> 7 7 0.379 0.467
#> 8 8 -0.0262 0.0429
#> 9 9 -0.135 -0.169
#> 10 10 -0.337 -0.257
#> # ℹ 55 more rows
In the tibble returned by hlm_resid()
, each row now
represents one of the groups specified by the level
parameter. In this example, there were 65 schools from which data was
collected, and the resulting tibble has 65 rows. If we had included
school-level variables in the model, such as the average intake score
for a school (schavg
), those variables would also be
included in the output. The final two columns contain the random effects
for intercept at the school.
.ranef.intercept
the random effects for intercept,
using Empirical Bayes estimation.ls.intercept
the random effects for intercept, using
Least Squares estimationHad we included other random effect terms in the model, the EB and LS
predictions of their random effects would also be included in the
output. Again, the EB random effects are simply named
.ranef
because it is assumed that the analyst is most
familiar with this type of random effect The parameters
standardize
and include.ls
work the same way
with higher-level residuals and with both lmerMod
and
lme
objects. Setting include.ls = FALSE
is
often recommended with higher-level residuals, as calculating LS
residuals is computationally intensive and the resulting residuals are
often unreliable due to small group sizes.
The final parameter, data
, is an argument that becomes
required when na.action = na.exclude
for an
lmerMod
model object. Model frames in lme4
always use na.omit
. Thus, a data
parameter is
needed to retain NA
values in the model frame that
hlm_resid()
returns. To demonstrate this requirement, we
create a duplicate data set of Exam
with the standardized
LRT scores set to NA
for observations 1, 2, and 5.
# make copy of Exam data set
Exam2 <- Exam
# set standLRT to NA for 3 observations
Exam2[c(1,2,5),7] <- NA
# refit model with na.exclude
fm1.NA <- lme4::lmer(normexam ~ standLRT + (1 | school), Exam2,
na.action = na.exclude)
In the first example, when we try to call hlm_resid()
on
our model object, it throws an informative error asking for the data set
used to fit the model. We provide this using the data
parameter in the second example.
# incorrect:
hlm_resid(fm1.NA)
#> Error in hlm_resid.lmerMod(fm1.NA): Please provide the data frame used to fit the model. This is necessary when the na.action is set to na.exclude
# correct:
hlm_resid(fm1.NA, data = Exam2)
#> # A tibble: 4,059 × 10
#> id school normexam standLRT .resid .fitted .ls.resid .ls.fitted
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.261 NA NA NA NA NA
#> 2 2 1 0.134 NA NA NA NA NA
#> 3 3 1 -1.72 -1.36 -1.34 -0.381 -1.15 -0.575
#> 4 4 1 0.968 0.206 0.464 0.504 0.423 0.545
#> 5 5 1 0.544 NA NA NA NA NA
#> 6 6 1 1.73 2.19 0.113 1.62 -0.224 1.96
#> 7 7 1 1.04 -1.12 1.28 -0.241 1.44 -0.398
#> 8 8 1 -0.129 -1.03 0.0653 -0.194 0.210 -0.339
#> 9 9 1 -0.939 -0.538 -1.02 0.0850 -0.954 0.0142
#> 10 10 1 -1.22 -1.45 -0.792 -0.427 -0.585 -0.634
#> # ℹ 4,049 more rows
#> # ℹ 2 more variables: .mar.resid <dbl>, .mar.fitted <dbl>
Note that for the same model fit in nlme
, we do not need
to include the data
parameter.
fm1.NA.lme <- nlme::lme(normexam ~ standLRT, random = ~1 | school, Exam2,
na.action = na.exclude)
hlm_resid(fm1.NA.lme)
#> # A tibble: 4,059 × 10
#> id normexam standLRT school .resid .fitted .ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 NA 1 NA NA NA NA
#> 2 2 0.134 NA 1 NA NA NA NA
#> 3 3 -1.72 -1.36 1 -1.34 -0.381 -1.15 -0.575
#> 4 4 0.968 0.206 1 0.464 0.504 0.423 0.545
#> 5 5 0.544 NA 1 NA NA NA NA
#> 6 6 1.73 2.19 1 0.113 1.62 -0.224 1.96
#> 7 7 1.04 -1.12 1 1.28 -0.241 1.44 -0.398
#> 8 8 -0.129 -1.03 1 0.0653 -0.194 0.210 -0.339
#> 9 9 -0.939 -0.538 1 -1.02 0.0850 -0.954 0.0142
#> 10 10 -1.22 -1.45 1 -0.792 -0.427 -0.585 -0.634
#> # ℹ 4,049 more rows
#> # ℹ 2 more variables: .mar.resid <dbl>, .mar.fitted <dbl>
In three-level models, how we specify middle-level residuals changes
depending on whether a model is a lmerMod
or
lme
object. This is due to differences in how
lme4
and nlme
store their grouping factors To
demonstrate this, we use a classroom data set from West, Welch,
Galecki (2006). The data contains math testing scores from 1190
children in 312 different classroom within 107 unique schools. All
classrooms are nested within schools, and all students are nested within
a single classroom.
data("classroom", package = "WWGbook")
head(classroom)
#> sex minority mathkind mathgain ses yearstea mathknow housepov mathprep
#> 1 1 1 448 32 0.46 1 NA 0.082 2.00
#> 2 0 1 460 109 -0.27 1 NA 0.082 2.00
#> 3 1 1 511 56 -0.03 1 NA 0.082 2.00
#> 4 0 1 449 83 -0.38 2 -0.11 0.082 3.25
#> 5 0 1 425 53 -0.03 2 -0.11 0.082 3.25
#> 6 1 1 450 65 0.76 2 -0.11 0.082 3.25
#> classid schoolid childid
#> 1 160 1 1
#> 2 160 1 2
#> 3 160 1 3
#> 4 217 1 4
#> 5 217 1 5
#> 6 217 1 6
The data set contains 12 columns; however, we are only interested in
the response variable, mathgain
, and the grouping
variables, classid
and schoolid
. We fit the
simple unconditional means model in both lme4
and
nlme
below.
# lme4
fm2.lmer <- lme4::lmer(mathgain ~ 1 + (1|schoolid/classid), data = classroom)
# nlme
fm2.lme <- nlme::lme(mathgain ~ 1, random = ~1|schoolid/classid, data = classroom)
For both models, classroom is nested within school. We can access the grouping factors for each model with the following:
# lme4
names(fm2.lmer@flist)
#> [1] "classid:schoolid" "schoolid"
# nlme
names(fm2.lme$groups)
#> [1] "schoolid" "classid"
Note how the orders of the grouping factors are opposite for
lme4
and nlme
. In lme4
,
convention is to start at the lowest level of hierarchy and move upwards
(in this case classid:schoolid
, followed by
schoolid
). Grouping factors in nlme
are given
starting at the highest grouping and move down (schoolid
,
followed by classid
). The order of classid
and
schoolid
in the returned tibble is based on these
conventions.
To extract the classroom-level random effects for
fm2.lmer
, we set
level = "classid:schoolid"
.
# lme4
hlm_resid(fm2.lmer, level = "classid:schoolid")
#> # A tibble: 312 × 5
#> group classid schoolid .ranef.intercept .ls.intercept
#> <chr> <int> <int> <dbl> <dbl>
#> 1 160:1 160 1 1.64 4.15
#> 2 217:1 217 1 -0.433 -4.15
#> 3 197:2 197 2 -1.69 -12.9
#> 4 211:2 211 2 2.51 6.58
#> 5 307:2 307 2 2.44 6.33
#> 6 11:3 11 3 3.87 -4.38
#> 7 137:3 137 3 5.56 16.1
#> 8 145:3 145 3 8.10 6.62
#> 9 228:3 228 3 -0.0237 -18.4
#> 10 48:4 48 4 3.77 44
#> # ℹ 302 more rows
For the fm2.lme
, we set
level = "classid"
.
# nlme
hlm_resid(fm2.lme, level = "classid")
#> # A tibble: 312 × 5
#> group schoolid classid .ranef.intercept .ls.intercept
#> <chr> <int> <int> <dbl> <dbl>
#> 1 1/160 1 160 1.64 4.15
#> 2 1/217 1 217 -0.433 -4.15
#> 3 2/197 2 197 -1.69 -12.9
#> 4 2/211 2 211 2.51 6.58
#> 5 2/307 2 307 2.44 6.33
#> 6 3/11 3 11 3.87 -4.38
#> 7 3/137 3 137 5.56 16.1
#> 8 3/145 3 145 8.10 6.62
#> 9 3/228 3 228 -0.0235 -18.4
#> 10 4/48 4 48 3.77 44
#> # ℹ 302 more rows
Now that we have discussed how to extract all types of residuals for
hierarchical models using hlm_resid()
, we illustrate how to
use those residuals in context to evaluate a model. We will diagnose the
earlier model, fm1
, fit on the Exam
data set
above. We use hlm_resid()
to extract the standardized
level-1 and marginal residuals.
resid1_fm1 <- hlm_resid(fm1, level = 1, standardize = TRUE)
resid1_fm1
#> # A tibble: 4,059 × 10
#> id normexam standLRT school .std.resid .fitted .std.ls.resid .ls.fitted
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.261 0.619 1 -0.616 0.725 -0.682 0.822
#> 2 2 0.134 0.206 1 -0.476 0.492 -0.480 0.529
#> 3 3 -1.72 -1.36 1 -1.77 -0.393 -1.40 -0.585
#> 4 4 0.968 0.206 1 0.632 0.492 0.532 0.529
#> 5 5 0.544 0.371 1 -0.0544 0.585 -0.124 0.647
#> 6 6 1.73 2.19 1 0.167 1.61 -0.251 1.94
#> 7 7 1.04 -1.12 1 1.72 -0.253 1.78 -0.409
#> 8 8 -0.129 -1.03 1 0.103 -0.206 0.271 -0.350
#> 9 9 -0.939 -0.538 1 -1.35 0.0730 -1.15 0.00167
#> 10 10 -1.22 -1.45 1 -1.04 -0.439 -0.712 -0.643
#> # ℹ 4,049 more rows
#> # ℹ 2 more variables: .chol.mar.resid <dbl>, .mar.fitted <dbl>
Utilizing the tibble output of hlm_resid()
, we can
easily plot the standardized LS residuals against explanatory variables
such as the standardized LRT score. We use the LS residuals at level-1
because they are unconfounded of higher-level residuals.
library(ggplot2)
ggplot(data = resid1_fm1, aes(x = standLRT, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(y = "LS level-1 residuals",
title = "LS residuals against standardized LRT score")
The smoother shows that standardized LRT scores might not be linearly
related to GCSE exam scores. Likelihood ratio tests (not shown) confirm
that quadratic and cubic terms for standLRT
contribute
significantly in describing GCSE exam scores, so we incorporate these
terms in the updated model, fm1b
. We then reassess the
level-1 LS residuals.
fm1b <- update(fm1, .~. + I(standLRT^2) + I(standLRT^3))
resid1_fm1b <- hlm_resid(fm1b, level = 1, standardize = TRUE)
ggplot(data = resid1_fm1b, aes(x = standLRT, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(y = "LS level-1 residuals", title = "LS residuals against standardized LRT score")
The addition of the quadratic and cubic terms seems to have improved
the level-1 residuals. Additionally, there don’t appear to be any large
outliers that might need to be removed. We double check the LS level-1
residuals for fm1
, plotting them against the LS fitted
values appended to the model frame by hlm_resid()
.
ggplot(data = resid1_fm1b, aes(x = .ls.fitted, y = .std.ls.resid)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "loess", se = FALSE) +
labs(x = "LS fitted values",
y = "LS level-1 residuals",
title = "LS residuals against LS fitted values")
The LS level-1 residuals give no indication that the model violates
any assumptions of an HLM. Because there are no glaring issues, we move
on to the higher-level residuals, again plotting the output of
hlm_resid()
.
resid2_fm1b <- hlm_resid(fm1b, level = "school", include.ls = FALSE)
resid2_fm1b
#> # A tibble: 65 × 2
#> school .ranef.intercept
#> <chr> <dbl>
#> 1 1 0.374
#> 2 2 0.498
#> 3 3 0.502
#> 4 4 0.0211
#> 5 5 0.247
#> 6 6 0.538
#> 7 7 0.390
#> 8 8 -0.0220
#> 9 9 -0.156
#> 10 10 -0.331
#> # ℹ 55 more rows
ggplot(data = resid2_fm1b, aes(x = school, y = .ranef.intercept)) +
geom_point() +
labs(y = "Random effects - intercept",
title = "Intercept random effects against school") +
coord_flip()
The school-level random effects again show no large outliers. We
would now check influence diagnostics such as Cook’s distance and
leverage to ensure our model is appropriate. For a discussion of
influence diagnostics within HLMdiag
, we direct the reader
to the vignette for hlm_influence()
.
The function pull_resid()
is an efficient way to pull a
single type of residual as a vector. Whereas hlm_resid()
spends time calculating all types of residuals and fitted values,
pull_resid()
only calculates the type of residual specified
by the user, saving time over using hlm_resid()
and
indexing for a specific column. This is especially useful when used in a
loop, such as when using simulation-based diagnostics.
pull_resid()
takes three parameters,
object
, type
, and standardize
.
The parameters object
and standardize
work
identically as they do in hlm_resid()
, and the new
parameter, type
, can be set to:
"ls"
to return LS level-1 residuals"eb"
to return EB level-1 residuals"marginal"
to return marginal residualsThe example below illustrates the output of pull_resid()
being used to extract level-1 standardized LS residuals.
head(pull_resid(fm1, type = "ls", standardize = TRUE))
#> [1] -0.6824458 -0.4800891 -1.4044892 0.5323373 -0.1242091 -0.2512477
The function pull_resid()
is only implemented for
level-1 residuals. If the analyst wants to extract random effects for a
model, they should use the ranef()
functions from either
lme4
or nlme
.
The hlm_augment()
function combines the
hlm_resid()
and hlm_influence()
functions to
return a tibble containing information about the residuals and the
influence diagnostics appended to the data. hlm_augment()
has three parameters, object
, level
, and
include.ls
, all of which have the same functionality as in
hlm_resid()
. The syntax is the same for the two functions.
For example, we can extract both level-1 residual and influence
diagnostics from the model fm1
with the following call.
fm1.aug <- hlm_augment(fm1)
tibble::glimpse(fm1.aug)
#> Rows: 4,059
#> Columns: 15
#> $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
#> $ normexam <dbl> 0.2613242, 0.1340672, -1.7238820, 0.9675862, 0.544341…
#> $ standLRT <dbl> 0.6190592, 0.2058022, -1.3645760, 0.2058022, 0.371105…
#> $ school <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ .resid <dbl> -0.46358741, -0.35802733, -1.33127074, 0.47549167, -0…
#> $ .fitted <dbl> 0.72491161, 0.49209453, -0.39261126, 0.49209453, 0.58…
#> $ .ls.resid <dbl> -0.56113518, -0.39525182, -1.13926654, 0.43826718, -0…
#> $ .ls.fitted <dbl> 0.822459376, 0.529319021, -0.584615462, 0.529319021, …
#> $ .mar.resid <dbl> -0.089826659, 0.015733418, -0.957509986, 0.849252418,…
#> $ .mar.fitted <dbl> 0.35115086, 0.11833378, -0.76637201, 0.11833378, 0.21…
#> $ cooksd <dbl> 1.503345e-05, 2.075760e-06, 1.042793e-03, 3.661260e-0…
#> $ mdffits <dbl> 1.503227e-05, 2.075722e-06, 1.042108e-03, 3.661194e-0…
#> $ covtrace <dbl> 7.814095e-05, 1.809065e-05, 6.568974e-04, 1.809065e-0…
#> $ covratio <dbl> 1.000078, 1.000018, 1.000657, 1.000018, 1.000031, 1.0…
#> $ leverage.overall <dbl> 0.01271288, 0.01265360, 0.01328391, 0.01265360, 0.012…
This is useful for inspecting residuals values and influence
diagnostics values at the same time. However, hlm_augment()
lacks some of the functionality that hlm_influence()
and
hlm_resid()
have. The delete
and
approx
parameters available for
hlm_influence()
are not available in
hlm_augment()
, so the function will always use a one step
approximation and delete all observations or groups instead of a
selected few. The standardize
parameter from
hlm_resid()
is also not included, so unstandardized
residuals will always be returned. If additional functionality is
required, hlm_influence()
or hlm_resid()
should be used instead. hlm_augment()
is especially useful
for inspecting influence diagnostics of observations with relatively
high residuals, or vice versa. For more information about available
functionality in hlm_influence()
, see the
hlm_influence()
vignette.
Adam Loy, Heike Hofmann (2014). HLMdiag: A Suite of Diagnostics for Hierarchical Linear Models in R. Journal of Statistical Software, 56(5), 1-28. DOI
Brady West, Kathleen Welch, Andrzej Galecki (2006) Linear Mixed Models: A Practical Guide Using Statistical Software. First Edition. Chapman Hall / CRC Press. ISBN 1584884800
David Robinson, Alex Hayes (2020). broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.6. https://CRAN.R-project.org/package=broom
Douglas Bates, Martin Maechler, Ben Bolker (2020). mlmRev: Examples from Multilevel Modelling Software Review. R package version 1.0-8. https://CRAN.R-project.org/package=mlmRev
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. DOI
J.A. Hilden-Minton (1995). Multilevel Diagnostics for Mixed and Hierarchical Linear Models. Ph.D. thesis, University of California Los Angeles.
Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, R Core Team (2020). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-148, https://CRAN.R-project.org/package=nlme.