pwlmm

library(pwlmm)

Probability Weighted Iterative Generalised Least Squares for Two-level Multilevel Model and Two-level Multivariate Multilevel Model

1. Introduction

Hierarchical models, also known as multilevel models, have been widely applied in analyses where elementary units of a finite population are affected by the social groups or contexts to which they belong to. Longitudinal data, or repeated measurements data, can be analysed under this approach, as measurements at different time points (occasions) occur within the same individual. Thus, considering the simplest case, a two-level model could be fitted with individuals at the highest level and the series of repeated measurements at the lowest level (Hox et al., 2017). In addition, it may also be of interest to investigate how correlated are the individual measurements at the different points in time. A tool used to deal with this particularity is the multivariate hierarchical model.

Pfeffermann et al. (1998) presented a method for incorporating the sampling weights when fitting two-level random coefficients models. Veiga et al. (2014) presented an extension of these methods, for the multivariate multilevel model in a longitudinal data analysis that incorporates the sampling weights, adapting to the rotating panels of Brazilian labour force survey (BR-LFS). The objective of this package is to adapt the methodology presented in Pfeffermann et al. (1998) and in Veiga et al. (2014) under the R environment through improved performance, given the high computational demand arising from the estimation process.

2. Estimation method for multilevel complex survey data

Pfeffermann et al. (1998) presented the probability weighted iterative generalized least square (PWIGLS) estimation method base on the already established iterative generalized least square (IGLS) algorithm Goldstein, 2011) for the two-level random coefficients model. In a nutshell, the proposed approach consists of replacing each of the sums of levels 1 and 2 of the original algorithm with weighted sums, where each level weight corresponds to the the inverse of the selection probability at that stage. Veiga et al. (2014) extended this method incorporating not only the extra level arising from the longitudinal structure, but also the complex covariance error structure, therefore accommodating the estimation of a multivariate multilevel model. Different error covariance linear structures were accommodated.

2.1 PWIGLS estimation for two-level random coefficients models

Let yij, xij and zij be the observed values for the ith level 1 unit within the jth level 2 unit. Therefore, defining yj = (y1j, ..., ynjj)T as a nj × 1 vector, Xj = (x1j, ..., xnjj)T as a nj × p matrix and Zj = (z1j, ..., znjj)T as a nj × q matrix. The model is represented by yj = Xjβ + rj    rj ∼ N(0, Vj) where the composite error is rj = Zjuj + ej and Vj = ZjΣuZjT + Injσe2, where Inj is an identity matrix size nj × nj.

For the estimation procedure, Vj is expressed as a linear function of θ, which is the row vector formed with the s distinct elements of Σu and σe, such that: $$\boldsymbol{V}_{j}=\sum_{k=1}^{s} \theta_{k} \boldsymbol{G}_{kj}= \sum_{k=1}^{s} \theta_{k} ( \boldsymbol{Z}_{j} \boldsymbol{H}_{k j} \boldsymbol{Z}_{j}^{T} + \boldsymbol{I}_{n_{j}} \delta_{k s} ).$$

Note that, hereafter the notation for sample size, such as m instead of M and nj instead of Nj, are used.

2.2 Estimation of fixed effects

Fixed effects estimation, considering the general case with q ≥ 1, where q is the number of random effects at level two, is taken by β̂(r) = (r)−1(r). Given that $$\boldsymbol{V}_{jr}^{-1}=\hat{\sigma}^{-2}_e \boldsymbol{D}_{j}^{-1}-\hat{\sigma}^{-2}_e \boldsymbol{D}_{j}^{-1} \boldsymbol{Z}_{j} \hat{\boldsymbol{A}}_j \boldsymbol{Z}_{j}^{T} \boldsymbol{D}_{j}^{-1}$$ is the inverse of Vj(θ̂(r − 1)), where r is for the iteration 1, 2, , and $$\hat{\boldsymbol{A}}_j=(\boldsymbol{Z}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{Z}_j +\hat{\sigma}_e^2\hat{\boldsymbol{\Sigma}}_u^{-1})^{-1}$$ we can write and where Dj−1 = diag(wi|j) is a nj × nj diagonal matrix with wi|j in the main diagonal, 1j = XjTDj−1Xj, 2j = XjTDj−1Zj, 3j = XjTDj−1yj, 4j = ZjTDj−1yj, 5j = ZjTDj−1Zj, and re-writing $\hat{\boldsymbol{A}}_j=(\boldsymbol{\hat{T}_{5j}}+\hat{\sigma}_e^2\hat{\boldsymbol{\Sigma}}_u^{-1})^{-1}$.

2.3 Estimation of Random effects

Taking the random part of the model to be represented by θ, the estimation of the random effects θ̂(r) follows the matrix multiplication: θ̂(r) = (r)−1(r).

The representation for both (r) and (r) sums are defined by expressions for each entry in the matrix/vector. We can write and where rj(r)is a s × s matrix, where s is the total number of parameters in θ, and the klth element is expressed as and sj(r) is a vector of length s and the kth element is where $\hat{\boldsymbol{e}}_j =\left(\boldsymbol{y}_j-\boldsymbol{X}_j\boldsymbol{\hat{\beta}}^{(r)}\right)$, δks = 1 if k = s (0, otherwise), $\hat{N}_j=\sum_{i=1}^{n_j}w_{i|j}$, $\hat{\boldsymbol{C}}_{kj}= -\delta_{ks}\hat{\boldsymbol{A}}_j+\hat{\boldsymbol{B}}_{kj}-\hat{\boldsymbol{B}}_{kj}\boldsymbol{\hat{T}_{5j}}\hat{\boldsymbol{A}}_j$, $\hat{\boldsymbol{B}}_{kj}=\hat{\sigma}_e^2\hat{\boldsymbol{A}}_j\hat{\boldsymbol{\Sigma}}_u^{-1}\boldsymbol{H}_{k}-\delta_{ks}\hat{\boldsymbol{A}}_j$, Hk (k = 1...s) is a known q × q matrix, and the abth element of this matrix is equal to 1 if the abth entry in $\hat{\boldsymbol{\Sigma}}_u$ corresponds to θ̂k(0, otherwise).

2.4 Initial estimates

Initial values for β̂ is given by

In order to calculate the initial estimate for θ̂, we have to split this parameter into level 2 and level 1 random effects, in order to get Σu and σe2 respectively. That enables the initial estimate to be applied separately to each part. With respect to Σu, the diagonal entries are filled by 0.5, which represents the level 2 variances, while the level 2 covariance(s) are represented by zero(s). The level 1 residual σe2 initial value is given by where $\hat{T}_{6j}^{(0)}=\boldsymbol{w}_{i|j}^T(\hat{\boldsymbol{e}}_j^{(0)}-\boldsymbol{\hat{u}_j}^{(0)})^2$, $\hat{\boldsymbol{e}}_j^{(0)}=\boldsymbol{y}_j-\boldsymbol{X}_j\hat{\boldsymbol{\beta}}^{(0)}$ and j(0)is a nj × 1 vector of repeated $\hat{u}_j^{(0)}=\frac{\boldsymbol{w}_{i|j}^T\boldsymbol{e}_j^{(0)}}{\sum_{i=1}^{n_j}w_{i|j}}$.

2.5 Variances

The variance estimators of $\hat{\boldsymbol{\beta}}$ can be expressed as where  = limr → ∞(r), $\boldsymbol{c}_j=\boldsymbol{X}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{e}_j - \boldsymbol{\hat{T}}_{2j} \hat{\boldsymbol{A}}_j \boldsymbol{Z}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{e}_j$ and ej = yj − Xjβ̂. The variance estimator of $\hat{\boldsymbol{\theta}}$ can be expressed as where  = limr → ∞(r), dj = sj − rjθ̂, sj = limr → ∞sj(r)and rj = limr → ∞rj(r).

2.6 Residuals

The level 2 residuals are estimated according to equations and expressions from Appendix 2.2 in Goldstein (2011) . The estimated residuals at level h (here the level 2) for the jth group is given by where $\boldsymbol{R}_{hj}=\boldsymbol{Z}_j\hat{\boldsymbol{\Sigma}}_u$ and Vj = RhjZjT + σ̂e2Inj. The variance estimator of j is

2.7 Scaled weights

In a two-stage sampling scheme, level 2 units are selected with sampling probability πj; at the second stage, level 1 units are selected within the jth level 2 unit with probability πi|j. The sampling weights used in the PWIGLS method are defined as wj = πj−1 and wi|j = πi|j−1.

In order to reduce sample bias, it is essential to scale the weights wj and wi|j. Pfeffermann et al. (1998) suggested their method called scale method 2, therefore transforming wj and wi|j respectively to and where $\tilde{w}=\sum_{j=1}^m w_j/m$ and $\tilde{w}_j=\sum_{i=1}^{n_j} w_{i|j}/n_j$. Hence, j becomes nj in equation 3, as explained by Pfeffermann et al. (1998).

The package applies this scaling method.

3. PWIGLS Estimation for multivariate multilevel models

Now, let the model be represented by Yj = Xjβ + rj    rj ∼ N(0, Vj) where rj is defined as follows,

The estimation process for the multivariate multilevel model with weights is fully described in Veiga et al. (2014). It is a similar process as the PWIGLS for the random coefficients model described above. The differences are in the definition of: therefore, changes are observed in:

Furthermore, we can write $$\hat{\boldsymbol{A}}_j =\left\{ \hat{\Sigma}_v^{-1} + \boldsymbol{Z}_j^{T} (\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1}) \boldsymbol{Z}_j \right\}^{-1} \;$$ and $$\hat{\boldsymbol{V}}_{jr}^{-1} = \boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1} - (\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1})\boldsymbol{Z}_j \hat{\boldsymbol{A}}_j \boldsymbol{Z}_j^{T}(\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1}).$$ The fixed effects are then, as previously, estimated by solving the equations 1 and 2 and the random part of the model is estimated by solving the equations 3 and 4.

4. Functions

The format adopted for the estimation of two-level hierarchical linear models with weights was based in the same format used by Bates et al. (2015) package, lme4. The format for the multivariate hierarchical linear models with weights was based on the basic functions of R for the estimation of linear models (lm, glm). The main difference in format between the two is that in the first, the use of random coefficients at the cluster level is allowed.

For estimating two-level linear models with weights, the command required is

pwigls2 (formula, data = NULL, wj, wi_j)

where pwigls2 is the name of the function; formula is the formula where the response and explanatory variables are placed in such as Y = X_1+ X_2 + ... + X_p + (1 | group), so that the variables with random effects are between the parentheses to the left of the vertical bar and the group identifier variable to the right; data is a data frame (optional) containing the variables used in formula; wj is the vector of weights corresponding to level 2 units, and wi_j is the vector of weights corresponding to level 1 units, conditional to their respective level 2 unit.

For estimating multivariate multilevel linear models with weights, the command required is

wmlmm(formula, data = NULL, ID3, ID2, ID1, wj, wi_j, type, rot = NULL) 

where wmlmm is the name of the function; formula is the formula where the response and explanatory variables are placed such as Y = X_1 + X_2 + ... + X_p. There is no need to reference the k points as explanatory variables; ID3, ID2 and ID1 are the identifiers for level 2 units, level 1 units, and time-dummies for level 1 units, respectively; type is the type of error covariance structure that the user wants to estimate, that can be: toep, for Toeplitz; uns for unstructured; and genlin for the general linear with dependency on lag; rot is a vector of 0’s and 1’s that identifies the occasions to automatically create the matrices A1, …, Ak of for type = genlin (see example below); and data and wj have the same functionality here as in the function pwigls2, while wi_j, for a longitudinal data, corresponds to the longitudinal weights.

5. References

Bates, D., M ̈achler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48

Goldstein, H. (2011). Multilevel statistical models, volume 922. John Wiley & Sons.

Hox, J. J., Moerbeek, M., and Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge.

Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., and Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society: series B (statistical methodology), 60(1):23–40.

Veiga, A., Smith, P. W., and Brown, J. J. (2014). The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(1):65–84