Hierarchical models, also known as multilevel models, have been widely applied in analyses where elementary units of a finite population are affected by the social groups or contexts to which they belong to. Longitudinal data, or repeated measurements data, can be analysed under this approach, as measurements at different time points (occasions) occur within the same individual. Thus, considering the simplest case, a two-level model could be fitted with individuals at the highest level and the series of repeated measurements at the lowest level (Hox et al., 2017). In addition, it may also be of interest to investigate how correlated are the individual measurements at the different points in time. A tool used to deal with this particularity is the multivariate hierarchical model.
Pfeffermann et al. (1998) presented a method for incorporating the sampling weights when fitting two-level random coefficients models. Veiga et al. (2014) presented an extension of these methods, for the multivariate multilevel model in a longitudinal data analysis that incorporates the sampling weights, adapting to the rotating panels of Brazilian labour force survey (BR-LFS). The objective of this package is to adapt the methodology presented in Pfeffermann et al. (1998) and in Veiga et al. (2014) under the R environment through improved performance, given the high computational demand arising from the estimation process.
Pfeffermann et al. (1998) presented the probability weighted iterative generalized least square (PWIGLS) estimation method base on the already established iterative generalized least square (IGLS) algorithm Goldstein, 2011) for the two-level random coefficients model. In a nutshell, the proposed approach consists of replacing each of the sums of levels 1 and 2 of the original algorithm with weighted sums, where each level weight corresponds to the the inverse of the selection probability at that stage. Veiga et al. (2014) extended this method incorporating not only the extra level arising from the longitudinal structure, but also the complex covariance error structure, therefore accommodating the estimation of a multivariate multilevel model. Different error covariance linear structures were accommodated.
Let yij, xij and zij be the observed values for the ith level 1 unit within the jth level 2 unit. Therefore, defining yj = (y1j, ..., ynjj)T as a nj × 1 vector, Xj = (x1j, ..., xnjj)T as a nj × p matrix and Zj = (z1j, ..., znjj)T as a nj × q matrix. The model is represented by yj = Xjβ + rj rj ∼ N(0, Vj) where the composite error is rj = Zjuj + ej and Vj = ZjΣuZjT + Injσe2, where Inj is an identity matrix size nj × nj.
For the estimation procedure, Vj is expressed as a linear function of θ, which is the row vector formed with the s distinct elements of Σu and σe, such that: $$\boldsymbol{V}_{j}=\sum_{k=1}^{s} \theta_{k} \boldsymbol{G}_{kj}= \sum_{k=1}^{s} \theta_{k} ( \boldsymbol{Z}_{j} \boldsymbol{H}_{k j} \boldsymbol{Z}_{j}^{T} + \boldsymbol{I}_{n_{j}} \delta_{k s} ).$$
Note that, hereafter the notation for sample size, such as m instead of M and nj instead of Nj, are used.
Fixed effects estimation, considering the general case with q ≥ 1, where q is the number of random effects at level two, is taken by β̂(r) = P̂(r)−1q̂(r). Given that $$\boldsymbol{V}_{jr}^{-1}=\hat{\sigma}^{-2}_e \boldsymbol{D}_{j}^{-1}-\hat{\sigma}^{-2}_e \boldsymbol{D}_{j}^{-1} \boldsymbol{Z}_{j} \hat{\boldsymbol{A}}_j \boldsymbol{Z}_{j}^{T} \boldsymbol{D}_{j}^{-1}$$ is the inverse of Vj(θ̂(r − 1)), where r is for the iteration 1, 2, , and $$\hat{\boldsymbol{A}}_j=(\boldsymbol{Z}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{Z}_j +\hat{\sigma}_e^2\hat{\boldsymbol{\Sigma}}_u^{-1})^{-1}$$ we can write and where Dj−1 = diag(wi|j) is a nj × nj diagonal matrix with wi|j in the main diagonal, T̂1j = XjTDj−1Xj, T̂2j = XjTDj−1Zj, t̂3j = XjTDj−1yj, t̂4j = ZjTDj−1yj, T̂5j = ZjTDj−1Zj, and re-writing $\hat{\boldsymbol{A}}_j=(\boldsymbol{\hat{T}_{5j}}+\hat{\sigma}_e^2\hat{\boldsymbol{\Sigma}}_u^{-1})^{-1}$.
Taking the random part of the model to be represented by θ, the estimation of the random effects θ̂(r) follows the matrix multiplication: θ̂(r) = R̂(r)−1ŝ(r).
The representation for both R̂(r) and ŝ(r) sums are defined by expressions for each entry in the matrix/vector. We can write and where T̂rj(r)is a s × s matrix, where s is the total number of parameters in θ, and the klth element is expressed as and t̂sj(r) is a vector of length s and the kth element is where $\hat{\boldsymbol{e}}_j =\left(\boldsymbol{y}_j-\boldsymbol{X}_j\boldsymbol{\hat{\beta}}^{(r)}\right)$, δks = 1 if k = s (0, otherwise), $\hat{N}_j=\sum_{i=1}^{n_j}w_{i|j}$, $\hat{\boldsymbol{C}}_{kj}= -\delta_{ks}\hat{\boldsymbol{A}}_j+\hat{\boldsymbol{B}}_{kj}-\hat{\boldsymbol{B}}_{kj}\boldsymbol{\hat{T}_{5j}}\hat{\boldsymbol{A}}_j$, $\hat{\boldsymbol{B}}_{kj}=\hat{\sigma}_e^2\hat{\boldsymbol{A}}_j\hat{\boldsymbol{\Sigma}}_u^{-1}\boldsymbol{H}_{k}-\delta_{ks}\hat{\boldsymbol{A}}_j$, Hk (k = 1...s) is a known q × q matrix, and the abth element of this matrix is equal to 1 if the abth entry in $\hat{\boldsymbol{\Sigma}}_u$ corresponds to θ̂k(0, otherwise).
Initial values for β̂ is given by
In order to calculate the initial estimate for θ̂, we have to split this parameter into level 2 and level 1 random effects, in order to get Σu and σe2 respectively. That enables the initial estimate to be applied separately to each part. With respect to Σu, the diagonal entries are filled by 0.5, which represents the level 2 variances, while the level 2 covariance(s) are represented by zero(s). The level 1 residual σe2 initial value is given by where $\hat{T}_{6j}^{(0)}=\boldsymbol{w}_{i|j}^T(\hat{\boldsymbol{e}}_j^{(0)}-\boldsymbol{\hat{u}_j}^{(0)})^2$, $\hat{\boldsymbol{e}}_j^{(0)}=\boldsymbol{y}_j-\boldsymbol{X}_j\hat{\boldsymbol{\beta}}^{(0)}$ and ûj(0)is a nj × 1 vector of repeated $\hat{u}_j^{(0)}=\frac{\boldsymbol{w}_{i|j}^T\boldsymbol{e}_j^{(0)}}{\sum_{i=1}^{n_j}w_{i|j}}$.
The variance estimators of $\hat{\boldsymbol{\beta}}$ can be expressed as where P̂ = limr → ∞P̂(r), $\boldsymbol{c}_j=\boldsymbol{X}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{e}_j - \boldsymbol{\hat{T}}_{2j} \hat{\boldsymbol{A}}_j \boldsymbol{Z}_j^T\boldsymbol{D}_j^{-1}\boldsymbol{e}_j$ and ej = yj − Xjβ̂. The variance estimator of $\hat{\boldsymbol{\theta}}$ can be expressed as where R̂ = limr → ∞R̂(r), dj = t̂sj − T̂rjθ̂, t̂sj = limr → ∞t̂sj(r)and T̂rj = limr → ∞T̂rj(r).
The level 2 residuals are estimated according to equations and expressions from Appendix 2.2 in Goldstein (2011) . The estimated residuals at level h (here the level 2) for the jth group is given by where $\boldsymbol{R}_{hj}=\boldsymbol{Z}_j\hat{\boldsymbol{\Sigma}}_u$ and Vj = RhjZjT + σ̂e2Inj. The variance estimator of ûj is
In a two-stage sampling scheme, level 2 units are selected with sampling probability πj; at the second stage, level 1 units are selected within the jth level 2 unit with probability πi|j. The sampling weights used in the PWIGLS method are defined as wj = πj−1 and wi|j = πi|j−1.
In order to reduce sample bias, it is essential to scale the weights wj and wi|j. Pfeffermann et al. (1998) suggested their method called scale method 2, therefore transforming wj and wi|j respectively to and where $\tilde{w}=\sum_{j=1}^m w_j/m$ and $\tilde{w}_j=\sum_{i=1}^{n_j} w_{i|j}/n_j$. Hence, N̂j becomes nj in equation 3, as explained by Pfeffermann et al. (1998).
The package applies this scaling method.
Now, let the model be represented by Yj = Xjβ + rj rj ∼ N(0, Vj) where rj is defined as follows,
The estimation process for the multivariate multilevel model with weights is fully described in Veiga et al. (2014). It is a similar process as the PWIGLS for the random coefficients model described above. The differences are in the definition of: therefore, changes are observed in:Furthermore, we can write $$\hat{\boldsymbol{A}}_j =\left\{ \hat{\Sigma}_v^{-1} + \boldsymbol{Z}_j^{T} (\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1}) \boldsymbol{Z}_j \right\}^{-1} \;$$ and $$\hat{\boldsymbol{V}}_{jr}^{-1} = \boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1} - (\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1})\boldsymbol{Z}_j \hat{\boldsymbol{A}}_j \boldsymbol{Z}_j^{T}(\boldsymbol{W}_{j}\otimes \hat{\Sigma}_u^{-1}).$$ The fixed effects are then, as previously, estimated by solving the equations 1 and 2 and the random part of the model is estimated by solving the equations 3 and 4.
The format adopted for the estimation of two-level hierarchical
linear models with weights was based in the same format used by Bates et
al. (2015) package, lme4
. The format for the multivariate
hierarchical linear models with weights was based on the basic functions
of R for the estimation of
linear models (lm
, glm
). The main difference
in format between the two is that in the first, the use of random
coefficients at the cluster level is allowed.
For estimating two-level linear models with weights, the command required is
pwigls2 (formula, data = NULL, wj, wi_j)
where pwigls2
is the name of the function;
formula
is the formula where the response and explanatory
variables are placed in such as
Y = X_1+ X_2 + ... + X_p + (1 | group)
, so that the
variables with random effects are between the parentheses to the left of
the vertical bar and the group identifier variable to the right;
data
is a data frame
(optional) containing the
variables used in formula
; wj
is the vector of
weights corresponding to level 2 units, and wi_j
is the
vector of weights corresponding to level 1 units, conditional to their
respective level 2 unit.
For estimating multivariate multilevel linear models with weights, the command required is
wmlmm(formula, data = NULL, ID3, ID2, ID1, wj, wi_j, type, rot = NULL)
where wmlmm
is the name of the function;
formula
is the formula where the response and explanatory
variables are placed such as Y = X_1 + X_2 + ... + X_p
.
There is no need to reference the k points as explanatory variables;
ID3
, ID2
and ID1
are the
identifiers for level 2 units, level 1 units, and time-dummies for level
1 units, respectively; type
is the type of error covariance
structure that the user wants to estimate, that can be:
toep
, for Toeplitz; uns
for unstructured; and
genlin
for the general linear with dependency on
lag
; rot
is a vector of 0’s and 1’s that
identifies the occasions to automatically create the matrices A1, …, Ak
of for type = genlin
(see example below); and
data
and wj
have the same functionality here
as in the function pwigls2
, while wi_j
, for a
longitudinal data, corresponds to the longitudinal weights.
Bates, D., M ̈achler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48
Goldstein, H. (2011). Multilevel statistical models, volume 922. John Wiley & Sons.
Hox, J. J., Moerbeek, M., and Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge.
Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., and Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society: series B (statistical methodology), 60(1):23–40.
Veiga, A., Smith, P. W., and Brown, J. J. (2014). The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(1):65–84