---
title: "Residual Diagnostics"
authors: "Jack Moran, Jaylin Lowe, Adam Loy"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Residual Diagnostics}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown_notangle}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
             "_R_CHECK_LICENSE_") %in% names(Sys.getenv()))

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.show = "hide",
  eval = !is_check
)
```

# HLMdiag: a diagnostic tool for hierarchical (multilevel) linear models

The package `HLMdiag` was created in order to provide a unified framework for analysts to diagnose hierarchical linear models (HLMs). When `HLMdiag` was created in 2014, it made diagnostic procedures available for HLMs that had not yet been implemented in statistical software. Over the past 6 years, other packages have gradually implemented some of these procedures; however, diagnosing a model still often requires multiple packages, each of which has its own syntax and quirks. `HLMdiag` provides diagnostic tools targeting all aspects and levels of hierarchical linear models in a single package. In this update, `HLMdiag` focuses on usability in its functions. Drawing inspiration from the `augment()` function in the `broom` package, the new functions `hlm_resid()`, `hlm_influence()`, and `hlm_augment()` append diagnostic information to a model's data frame, allowing the analyst to easily work with and see how residuals and influence diagnostics relate to the original variables.

This vignette introduces new functions concerning residual diagnostics in `HLMdiag`, specifically `hlm_resid()`, `pull_resid()`, and `hlm_augment()`. The main focus of this vignette, `hlm_resid()`, is intended to replace `HLMresid()` in functionality, and works more robustly with 3-level models as well as models fit with `nlme`. Additionally, `hlm_resid()` appends residuals and fitted values to a model's data frame in the form of a tibble. Tibbles help the analyst diagnose models by simplifying the creation of diagnostic plots, especially within `ggplot2`, and working seamlessly with sorting functions in `dplyr`. 

For information about influence diagnostics such as Cook's distance and leverage in HLMs, the reader should see the vignette for `hlm_influence()`. 

## hlm_resid() function

The function `hlm_resid()` takes a hierarchical linear model fit as a `lmerMod` or `lme` object and extracts residuals and predicted values from the model using both Least Squares (LS) and Empirical Bayes (EB) methods in estimating parameters. Whereas the old function, `HLMresid()`, would return a vector with a single type of residual, `hlm_resid()` appends specified residuals to the data frame of the model in the form of a tibble. The use of a tibble allows the analyst to easily plot residuals against explanatory variables and identify possible outliers or model deficiencies. The function draws heavy inspiration from the `augment` function of the `broom` package, however offers more options for the types of residuals that the analyst may want to use.

## Types of residuals in multilevel models

The presence of multiple sources of variability in hierarchical linear models results in numerous quantities defining residuals. All functions in `HLMdiag` follow the classification by **Hilden-Minton (1995)** and define three types of residuals:

1. level-1 (conditional) residuals
1. higher-level (random effects) residuals
1. marginal (composite) residuals

The definitions of level-1 and higher-level residuals lead to different residuals depending on how the fixed and random model coefficients are estimated. `hlm_resid()` implements two estimation methods: Least Squares estimation and Empirical Bayes estimation. For a comprehensive discussion of these two methods and how they are implemented, we direct the reader to **Loy and Hoffman (2014)**. 

LS and EB residuals each have their own advantages and disadvantages. Level-1 LS residuals are calculated by fitting separate linear models to each group and using LS to estimate the fixed and random effects. Because they rely only on the first level of the hierarchy, they are unconfounded with higher-level residuals and a useful first step in diagnosing a model; however, for small group sample sizes, LS residuals are unreliable. Level-1 EB residuals are defined as the conditional modes of the random effects given the data and the estimated parameter values, which are calculated with (restricted) maximum likelihood. They are confounded with higher-level residuals, but more robust with small sample sizes. For higher-level residuals, coefficients can again be estimated using either LS or EB, but EB residuals are generally preferred due to small group sizes.

## Example Data

To illustrate the use of `hlm_resid()`, we make use of data on exam scores of 4,059 students in 65 inner London schools. This data set is distributed as part of the R package `mlmRev` (**Bates, Maechler, Bolker 2013**), which makes well known multilevel modeling data sets available in R.
```{r}
data("Exam", package = "mlmRev")
head(Exam)
```

For each student, the data consist of their gender (`sex`) and two standardized exam scores— an intake score on the London Reading Test (LRT) at age 11 (`standLRT`) and a score on the General Certificate of Secondary Education (GCSE) examination at age 16 (`normexam`). Additionally, the students' LRT scores were used to segment students into three categories (bottom 25%, middle 50%, and top 25%) based on their verbal reasoning subscore (`vr`) and overall score (`intake)`. At the school level, the data contain the average intake score for the school (`schavg`) and type based on school gender (`schgend`, coded as mixed, boys, or girls).

We illustrate usage for `hlm_resid()` below using the exam data and fitting an initial two-level HLM using a student’s standardized London Reading Test intake score (standLRT) to explain their GCSE exam score, allowing for a random intercept for each school:
```{r}
fm1 <- lme4::lmer(normexam ~ standLRT + (1 | school), Exam, REML = FALSE)
fm1
```

## hlm_resid() usage

`hlm_resid()` takes 5 arguments:

* `object` - the model fit as a `lmerMod` or `lme` object.

* `level` - the level at which residuals should be extracted. By default, `level = 1`, returning both EB and LS level-1  residuals as well as marginal residuals. `level` can also be set to a grouping factor as defined in `names(object@flist)` in `lme4` or in `names(object$groups)` in `nlme`. When set to a grouping factor, `hlm_resid()` returns both EB and LS higher-level residuals, otherwise known as the random effects for each group. 

* `standardize` - a logical indicating if the returned residuals should be standardized. By default, `standardize = FALSE`, returning unstandardized residuals. When `standardize = TRUE`, residuals are divided by the estimated standard error. For marginal residuals, when `standardize = TRUE`, the Cholesky residuals are returned.

* `include.ls` - a logical indicating if LS residuals should be returned. By default, `include.ls = TRUE`. The LS method of estimating residuals is more computationally intensive than the EB method. Consequently, setting `include.ls = FALSE` substantially decreases the runtime on models with many observations.

* `data` - only necessary if `na.action = na.exclude` for a `lmerMod` model object. `data` should specify the data frame used to fit the model containing observations with `NA` values.

### General usage

To extract unstandardized level-1 residuals and marginal residuals, we use the following call.
```{r, message = FALSE} 
library(HLMdiag)
hlm_resid(fm1)
```

`hlm_resid()` appends 6 columns to the model frame:

* `.resid`  the unstandardized Empirical Bayes (EB) level-1 residuals for each observation
* `.fitted` the Empirical Bayes (EB) fitted values
* `.ls.resid`  the unstandardized Least Squares (LS) level-1 residuals
* `.ls.fitted` the Least Squares fitted values
* `.mar.resid`  the unstandardized marginal residuals
* `.mar.fitted` the marginal fitted values

Note that the columns containing the Empirical Bayes residuals are simply named `.resid` and `.fitted`. This is because both `lme4::resid()` and `nlme::resid()` use the EB method in estimating residuals, thus the analyst is likely more familiar with this type of residual.

### standardize and include.ls

We can alter the output of the level-1 residuals using the `standardize` and `include.ls` parameters introduced above. The following call returns both the level-1 and marginal residuals divided by the estimated standard error and excludes the least squares residuals.
```{r}
hlm_resid(fm1, standardize = TRUE, include.ls = FALSE)
```

`hlm_resid()` appends the EB and marginal residual columns to the mode frame. The names of the columns containing residuals now have the prefix `.std` to reflect the fact that they are standardized. Note that standardized marginal residuals are equivalent to the Cholesky residuals of a HLM and thus are named `.chol.mar.resid`.

### Higher-level residuals

To access higher-level residuals, we can set the `level` parameter to a grouping factor as defined in `names(object@flist)` in `lme4` or in `names(object$groups)` in `nlme`. 
```{r}
names(fm1@flist)
```

To extract the school-level residuals, otherwise known as the predicted random effects for school, we use the following call.
```{r}
hlm_resid(fm1, level = "school")
```

In the tibble returned by `hlm_resid()`, each row now represents one of the groups specified by the `level` parameter. In this example, there were 65 schools from which data was collected, and the resulting tibble has 65 rows. If we had included school-level variables in the model, such as the average intake score for a school (`schavg`), those variables would also be included in the output. The final two columns contain the random effects for intercept at the school.

* `.ranef.intercept` the random effects for intercept, using Empirical Bayes estimation
* `.ls.intercept` the random effects for intercept, using Least Squares estimation

Had we included other random effect terms in the model, the EB and LS predictions of their random effects would also be included in the output. Again, the EB random effects are simply named `.ranef` because it is assumed that the analyst is most familiar with this type of random effect The parameters `standardize` and `include.ls` work the same way with higher-level residuals and with both `lmerMod` and `lme` objects. Setting `include.ls = FALSE` is often recommended with higher-level residuals, as calculating LS residuals is computationally intensive and the resulting residuals are often unreliable due to small group sizes.

### na.action and data

The final parameter, `data`, is an argument that becomes required when `na.action = na.exclude` for an `lmerMod` model object. Model frames in `lme4` always use `na.omit`. Thus, a `data` parameter is needed to retain `NA` values in the model frame that `hlm_resid()` returns. To demonstrate this requirement, we create a duplicate data set of `Exam` with the standardized LRT scores set to `NA` for observations 1, 2, and 5.
```{r}
# make copy of Exam data set
Exam2 <- Exam              
# set standLRT to NA for 3 observations
Exam2[c(1,2,5),7] <- NA    
# refit model with na.exclude
fm1.NA <- lme4::lmer(normexam ~ standLRT + (1 | school), Exam2,  
               na.action = na.exclude)

```

In the first example, when we try to call `hlm_resid()` on our model object, it throws an informative error asking for the data set used to fit the model. We provide this using the `data` parameter in the second example.
```{r, error = TRUE}
# incorrect:
hlm_resid(fm1.NA) 

# correct:
hlm_resid(fm1.NA, data = Exam2) 
```

Note that for the same model fit in `nlme`, we do not need to include the `data` parameter.
```{r}
fm1.NA.lme <- nlme::lme(normexam ~ standLRT, random = ~1 | school, Exam2,  
               na.action = na.exclude)
hlm_resid(fm1.NA.lme)
```

### Three-level models

In three-level models, how we specify middle-level residuals changes depending on whether a model is a `lmerMod` or `lme` object. This is due to differences in how `lme4` and `nlme` store their grouping factors To demonstrate this, we use a classroom data set from **West, Welch, Galecki (2006)**. The data contains math testing scores from 1190 children in 312 different classroom within 107 unique schools. All classrooms are nested within schools, and all students are nested within a single classroom.
```{r}
data("classroom", package = "WWGbook")
head(classroom)
```

The data set contains 12 columns; however, we are only interested in the response variable, `mathgain`, and the grouping variables, `classid` and `schoolid`. We fit the simple unconditional means model in both `lme4` and `nlme` below.
```{r}
# lme4
fm2.lmer <- lme4::lmer(mathgain ~ 1 + (1|schoolid/classid), data = classroom)
# nlme
fm2.lme <- nlme::lme(mathgain ~ 1, random = ~1|schoolid/classid, data = classroom)
```

For both models, classroom is nested within school. We can access the grouping factors for each model with the following:
```{r}
# lme4
names(fm2.lmer@flist)
# nlme
names(fm2.lme$groups)
```

Note how the orders of the grouping factors are opposite for `lme4` and `nlme`. In `lme4`, convention is to start at the lowest level of hierarchy and move upwards (in this case `classid:schoolid`, followed by `schoolid`). Grouping factors in `nlme` are given starting at the highest grouping and move down (`schoolid`, followed by `classid`). The order of `classid` and `schoolid` in the returned tibble is based on these conventions.

To extract the classroom-level random effects for `fm2.lmer`, we set `level = "classid:schoolid"`. 
```{r}
# lme4
hlm_resid(fm2.lmer, level = "classid:schoolid")
```

For the `fm2.lme`, we set `level = "classid"`.
```{r}
# nlme
hlm_resid(fm2.lme, level = "classid")
```

## Using hlm_resid() in context

Now that we have discussed how to extract all types of residuals for hierarchical models using `hlm_resid()`, we illustrate how to use those residuals in context to evaluate a model. We will diagnose the earlier model, `fm1`, fit on the `Exam` data set above. We use `hlm_resid()` to extract the standardized level-1 and marginal residuals.
```{r}
resid1_fm1 <- hlm_resid(fm1, level = 1, standardize = TRUE)
resid1_fm1
```

Utilizing the tibble output of `hlm_resid()`, we can easily plot the standardized LS residuals against explanatory variables such as the standardized LRT score. We use the LS residuals at level-1 because they are unconfounded of higher-level residuals.
```{r, message = FALSE, warning = FALSE, fig.height = 5, fig.width = 5, fig.align = 'center'}
library(ggplot2)
ggplot(data = resid1_fm1, aes(x = standLRT, y = .std.ls.resid)) + 
  geom_point(alpha = 0.2) +
  geom_smooth(method = "loess", se = FALSE) + 
  labs(y = "LS level-1 residuals", 
       title = "LS residuals against standardized LRT score")
```

The smoother shows that standardized LRT scores might not be linearly related to GCSE exam scores. Likelihood ratio tests (not shown) confirm that quadratic and cubic terms for `standLRT` contribute significantly in describing GCSE exam scores, so we incorporate these terms in the updated model, `fm1b`. We then reassess the level-1 LS residuals.
```{r, message = FALSE, warning = FALSE, fig.height = 5, fig.width = 5, fig.align = 'center'}
fm1b <- update(fm1, .~. + I(standLRT^2) + I(standLRT^3))

resid1_fm1b <- hlm_resid(fm1b, level = 1, standardize = TRUE)

ggplot(data = resid1_fm1b, aes(x = standLRT, y = .std.ls.resid)) + 
  geom_point(alpha = 0.2) +
  geom_smooth(method = "loess", se = FALSE) + 
  labs(y = "LS level-1 residuals", title = "LS residuals against standardized LRT score")
```

The addition of the quadratic and cubic terms seems to have improved the level-1 residuals. Additionally, there don't appear to be any large outliers that might need to be removed. We double check the LS level-1 residuals for `fm1`, plotting them against the LS fitted values appended to the model frame by `hlm_resid()`. 
```{r, message = FALSE, warning = FALSE, fig.height = 5, fig.width = 5, fig.align = 'center'}
ggplot(data = resid1_fm1b, aes(x = .ls.fitted, y = .std.ls.resid)) + 
  geom_point(alpha = 0.2) +
  geom_smooth(method = "loess", se = FALSE) + 
  labs(x = "LS fitted values",
       y = "LS level-1 residuals", 
       title = "LS residuals against LS fitted values")
```

The LS level-1 residuals give no indication that the model violates any assumptions of an HLM. Because there are no glaring issues, we move on to the higher-level residuals, again plotting the output of `hlm_resid()`.
```{r, message = FALSE, warning = FALSE, fig.height = 6.5, fig.width = 5, fig.align = 'center'}
resid2_fm1b <- hlm_resid(fm1b, level = "school", include.ls = FALSE)
resid2_fm1b

ggplot(data = resid2_fm1b, aes(x = school, y = .ranef.intercept)) + 
  geom_point() +
  labs(y = "Random effects - intercept", 
       title = "Intercept random effects against school") + 
  coord_flip()
```

The school-level random effects again show no large outliers. We would now check influence diagnostics such as Cook's distance and leverage to ensure our model is appropriate. For a discussion of influence diagnostics within `HLMdiag`, we direct the reader to the vignette for `hlm_influence()`.

## pull_resid()

The function `pull_resid()` is an efficient way to pull a single type of residual as a vector. Whereas `hlm_resid()` spends time calculating all types of residuals and fitted values, `pull_resid()` only calculates the type of residual specified by the user, saving time over using `hlm_resid()` and indexing for a specific column. This is especially useful when used in a loop, such as when using simulation-based diagnostics.

`pull_resid()` takes three parameters, `object`, `type`, and `standardize`. The parameters `object` and `standardize` work identically as they do in `hlm_resid()`, and the new parameter, `type`, can be set to:

* `"ls"` to return LS level-1 residuals
* `"eb"` to return EB level-1 residuals
* `"marginal"` to return marginal residuals

The example below illustrates the output of `pull_resid()` being used to extract level-1 standardized LS residuals.
```{r}
head(pull_resid(fm1, type = "ls", standardize = TRUE))
```

The function `pull_resid()` is only implemented for level-1 residuals. If the analyst wants to extract random effects for a model, they should use the `ranef()` functions from either `lme4` or `nlme`.

## hlm_augment()

The `hlm_augment()` function combines the `hlm_resid()` and `hlm_influence()` functions to return a tibble containing information about the residuals and the influence diagnostics appended to the data. `hlm_augment()` has three parameters, `object`, `level`, and `include.ls`, all of which have the same functionality as in `hlm_resid()`. The syntax is the same for the two functions. For example, we can extract both level-1 residual and influence diagnostics from the model `fm1` with the following call.
```{r}
fm1.aug <- hlm_augment(fm1)
tibble::glimpse(fm1.aug)
```

This is useful for inspecting residuals values and influence diagnostics values at the same time. However, `hlm_augment()` lacks some of the functionality that `hlm_influence()` and `hlm_resid()` have. The `delete` and `approx` parameters available for `hlm_influence()` are not available in `hlm_augment()`, so the function will always use a one step approximation and delete all observations or groups instead of a selected few. The `standardize` parameter from `hlm_resid()` is also not included, so unstandardized residuals will always be returned. If additional functionality is required, `hlm_influence()` or `hlm_resid()` should be used instead. `hlm_augment()` is especially useful for inspecting influence diagnostics of observations with relatively high residuals, or vice versa. For more information about available functionality in `hlm_influence()`, see the `hlm_influence()` vignette. 

## References

Adam Loy, Heike Hofmann (2014). HLMdiag: A Suite of Diagnostics for Hierarchical
  Linear Models in R. Journal of Statistical Software, 56(5), 1-28. DOI
  \doi{10.18637/jss.v056.i05}

Brady West, Kathleen Welch, Andrzej Galecki (2006) Linear Mixed Models: A Practical Guide
  Using Statistical Software. First Edition. Chapman Hall / CRC Press. ISBN
  1584884800

David Robinson, Alex Hayes (2020). broom: Convert Statistical Analysis Objects
  into Tidy Tibbles. R package version 0.5.6.
  https://CRAN.R-project.org/package=broom

Douglas Bates, Martin Maechler, Ben Bolker (2020). mlmRev: Examples from
  Multilevel Modelling Software Review. R package version 1.0-8.
  https://CRAN.R-project.org/package=mlmRev
  
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear
  Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
  DOI \doi{10.18637/jss.v067.i01}

J.A. Hilden-Minton (1995). Multilevel Diagnostics for Mixed and Hierarchical Linear 
  Models. Ph.D. thesis, University of California Los Angeles.

Jose Pinheiro, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, R Core Team (2020). 
  nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-148, 
  https://CRAN.R-project.org/package=nlme.