Title: | Fit Structural Equation Models to Multiply Imputed Data |
---|---|
Description: | The primary purpose of 'lavaan.mi' is to extend the functionality of the R package 'lavaan', which implements structural equation modeling (SEM). When incomplete data have been multiply imputed, the imputed data sets can be analyzed by 'lavaan' using complete-data estimation methods, but results must be pooled across imputations (Rubin, 1987, <doi:10.1002/9780470316696>). The 'lavaan.mi' package automates the pooling of point and standard-error estimates, as well as a variety of test statistics, using a familiar interface that allows users to fit an SEM to multiple imputations as they would to a single data set using the 'lavaan' package. |
Authors: | Terrence D. Jorgensen [aut, cre]
|
Maintainer: | Terrence D. Jorgensen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1-0 |
Built: | 2025-03-10 16:22:45 UTC |
Source: | CRAN |
A version of the classic Holzinger and Swineford (1939) dataset, with
missing data imposed on variables x5
and x9
:
x5
is missing not at random (MNAR) by deleting the lowest 30% of
x5
values.
x9
is missing at random (MAR) conditional on age, by deleting x9
values for the youngest 30% of subjects in the data.
The data are then dichotomized using a median split, and imputed 5 times
using the syntax shown in the example. The data include only the 9 tests
(x1
through x9
) and school.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
The lavaan package.
Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, no. 48. Chicago, IL: University of Chicago Press.
lavaan::HolzingerSwineford1939
data(HolzingerSwineford1939, package = "lavaan") ## impose missing data for example HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""), "ageyr","agemo","school")] set.seed(123) HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5) age <- HSMiss$ageyr + HSMiss$agemo/12 HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9) ## median split HSbinary <- as.data.frame( lapply(HSMiss[ , paste0("x", 1:9)], FUN = cut, breaks = 2, labels = FALSE) ) HSbinary$school <- HSMiss$school ## impute binary missing data using mice package library(mice) set.seed(456) miceImps <- mice(HSbinary) ## save imputations in a list of data.frames binHS5imps <- list() for (i in 1:miceImps$m) binHS5imps[[i]] <- complete(miceImps, action = i)
data(HolzingerSwineford1939, package = "lavaan") ## impose missing data for example HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""), "ageyr","agemo","school")] set.seed(123) HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5) age <- HSMiss$ageyr + HSMiss$agemo/12 HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9) ## median split HSbinary <- as.data.frame( lapply(HSMiss[ , paste0("x", 1:9)], FUN = cut, breaks = 2, labels = FALSE) ) HSbinary$school <- HSMiss$school ## impute binary missing data using mice package library(mice) set.seed(456) miceImps <- mice(HSbinary) ## save imputations in a list of data.frames binHS5imps <- list() for (i in 1:miceImps$m) binHS5imps[[i]] <- complete(miceImps, action = i)
This is a utility function used to calculate the "D2" statistic for pooling
test statistics across multiple imputations. This function is called by
several functions used for lavaan.mi objects, such as
lavTestLRT.mi()
, lavTestWald.mi()
, and
lavTestScore.mi()
. But this function can be used for any general
scenario because it only requires a vector of statistics (one
from each imputation) and the degrees of freedom for the test statistic.
See Li, Meng, Raghunathan, & Rubin (1991) and Enders (2010, chapter 8) for
details about how it is calculated.
calculate.D2(w, DF = 0L, asymptotic = FALSE)
calculate.D2(w, DF = 0L, asymptotic = FALSE)
w |
|
DF |
degrees of freedom (df) of the |
asymptotic |
|
A numeric
vector containing the test statistic, df,
its p value, and 2 missing-data diagnostics: the relative invrease
in variance (RIV, or average for multiparameter tests: ARIV) and the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
lavTestLRT.mi()
, lavTestWald.mi()
,
lavTestScore.mi()
## generate a vector of chi-squared values, just for example DF <- 3 # degrees of freedom M <- 20 # number of imputations CHI <- rchisq(M, DF) ## pool the "results" calculate.D2(CHI, DF) # by default, an F statistic is returned calculate.D2(CHI, DF, asymptotic = TRUE) # asymptotically chi-squared ## generate standard-normal values, for an example of Wald z tests Z <- rnorm(M) calculate.D2(Z) # default DF = 0 will square Z to make chisq(DF = 1) ## F test is equivalent to a t test with the denominator DF
## generate a vector of chi-squared values, just for example DF <- 3 # degrees of freedom M <- 20 # number of imputations CHI <- rchisq(M, DF) ## pool the "results" calculate.D2(CHI, DF) # by default, an F statistic is returned calculate.D2(CHI, DF, asymptotic = TRUE) # asymptotically chi-squared ## generate standard-normal values, for an example of Wald z tests Z <- rnorm(M) calculate.D2(Z) # default DF = 0 will square Z to make chisq(DF = 1) ## F test is equivalent to a t test with the denominator DF
A version of the classic Holzinger and Swineford (1939) dataset, with
missing data imposed on variables x5
and x9
:
x5
is missing not at random (MNAR) by deleting the lowest 30% of
x5
values.
x9
is missing at random (MAR) conditional on age, by deleting x5
values for the youngest 30% of subjects in the data.
The data are imputed 20 times using the syntax shown in the example.
The data include only age and school variables, along with 9 tests
(x1
through x9
).
Terrence D. Jorgensen (University of Amsterdam; [email protected])
The lavaan package.
Holzinger, K., & Swineford, F. (1939). A study in factor analysis: The stability of a bifactor solution. Supplementary Educational Monograph, no. 48. Chicago, IL: University of Chicago Press.
lavaan::HolzingerSwineford1939
data(HolzingerSwineford1939, package = "lavaan") ## impose missing data for example HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""), "ageyr","agemo","school")] set.seed(123) HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5) age <- HSMiss$ageyr + HSMiss$agemo/12 HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9) ## impute missing data with Amelia library(Amelia) set.seed(456) HS.amelia <- amelia(HSMiss, m = 20, noms = "school", p2s = FALSE) HS20imps <- HS.amelia$imputations
data(HolzingerSwineford1939, package = "lavaan") ## impose missing data for example HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""), "ageyr","agemo","school")] set.seed(123) HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5) age <- HSMiss$ageyr + HSMiss$agemo/12 HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9) ## impute missing data with Amelia library(Amelia) set.seed(456) HS.amelia <- amelia(HSMiss, m = 20, noms = "school", p2s = FALSE) HS20imps <- HS.amelia$imputations
This function fits a lavaan model to a list of imputed data sets.
lavaan.mi(model, data, ...) cfa.mi(model, data, ...) sem.mi(model, data, ...) growth.mi(model, data, ...)
lavaan.mi(model, data, ...) cfa.mi(model, data, ...) sem.mi(model, data, ...) growth.mi(model, data, ...)
model |
The analysis model can be specified using lavaan
|
data |
A a |
... |
additional arguments to pass to |
A lavaan.mi object
This functionality was originally provided via runMI()
in the
semTools
package, but there are differences. See the README file
on the GitHub page for this package (find link in DESCRIPTION).
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
poolSat()
for a more efficient method to obtain SEM results
for multiple imputations
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) summary(fit, fit.measures = TRUE, fmi = TRUE) summary(fit, standardized = "std.all", rsquare = TRUE) ## You can pass other lavaanList() arguments, such as FUN=, which allows ## you to save any custom output from each imputation's fitted model. ## An example with ordered-categorical data: data(binHS5imps) # import a list of 5 imputed data sets ## Define a function to save a list with 2 custom outputs per imputation: ## - zero-cell tables ## - the obsolete "WRMR" fit index myCustomFunc <- function(object) { list(wrmr = lavaan::fitMeasures(object, "wrmr"), zeroCells = lavaan::lavInspect(object, "zero.cell.tables")) } ## fit the model catout <- cfa.mi(HS.model, data = binHS5imps, ordered = TRUE, FUN = myCustomFunc) ## pooled results summary(catout) ## extract custom output (per imputation) sapply(catout@funList, function(x) x$wrmr) # WRMR for each imputation catout@funList[[1]]$zeroCells # zero-cell tables for first imputation catout@funList[[2]]$zeroCells # zero-cell tables for second imputation ...
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) summary(fit, fit.measures = TRUE, fmi = TRUE) summary(fit, standardized = "std.all", rsquare = TRUE) ## You can pass other lavaanList() arguments, such as FUN=, which allows ## you to save any custom output from each imputation's fitted model. ## An example with ordered-categorical data: data(binHS5imps) # import a list of 5 imputed data sets ## Define a function to save a list with 2 custom outputs per imputation: ## - zero-cell tables ## - the obsolete "WRMR" fit index myCustomFunc <- function(object) { list(wrmr = lavaan::fitMeasures(object, "wrmr"), zeroCells = lavaan::lavInspect(object, "zero.cell.tables")) } ## fit the model catout <- cfa.mi(HS.model, data = binHS5imps, ordered = TRUE, FUN = myCustomFunc) ## pooled results summary(catout) ## extract custom output (per imputation) sapply(catout@funList, function(x) x$wrmr) # WRMR for each imputation catout@funList[[1]]$zeroCells # zero-cell tables for first imputation catout@funList[[2]]$zeroCells # zero-cell tables for second imputation ...
This class extends the lavaan::lavaanList class, created by fitting a lavaan model to a list of data sets. In this case, the list of data sets are multiple imputations of missing data.
## S4 method for signature 'lavaan.mi' show(object) ## S4 method for signature 'lavaan.mi' summary( object, header = TRUE, fit.measures = FALSE, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = TRUE, cat.check.pd = TRUE), estimates = TRUE, ci = FALSE, standardized = FALSE, std = standardized, cov.std = TRUE, rsquare = FALSE, fmi = FALSE, asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), remove.unused = TRUE, modindices = FALSE, nd = 3L, ... ) ## S4 method for signature 'lavaan.mi' nobs(object, total = TRUE) ## S4 method for signature 'lavaan.mi' coef(object, type = "free", labels = TRUE, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' vcov( object, type = c("pooled", "between", "within", "ariv"), scale.W = TRUE, omit.imps = c("no.conv", "no.se") ) ## S4 method for signature 'lavaan.mi' fitted(object, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' fitted.values(object, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' fitMeasures( object, fit.measures = "all", baseline.model = NULL, h1.model = NULL, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08, cat.check.pd = TRUE), output = "vector", omit.imps = c("no.conv", "no.se"), ... ) ## S4 method for signature 'lavaan.mi' fitmeasures( object, fit.measures = "all", baseline.model = NULL, h1.model = NULL, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08, cat.check.pd = TRUE), output = "vector", omit.imps = c("no.conv", "no.se"), ... )
## S4 method for signature 'lavaan.mi' show(object) ## S4 method for signature 'lavaan.mi' summary( object, header = TRUE, fit.measures = FALSE, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = TRUE, cat.check.pd = TRUE), estimates = TRUE, ci = FALSE, standardized = FALSE, std = standardized, cov.std = TRUE, rsquare = FALSE, fmi = FALSE, asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), remove.unused = TRUE, modindices = FALSE, nd = 3L, ... ) ## S4 method for signature 'lavaan.mi' nobs(object, total = TRUE) ## S4 method for signature 'lavaan.mi' coef(object, type = "free", labels = TRUE, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' vcov( object, type = c("pooled", "between", "within", "ariv"), scale.W = TRUE, omit.imps = c("no.conv", "no.se") ) ## S4 method for signature 'lavaan.mi' fitted(object, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' fitted.values(object, omit.imps = c("no.conv", "no.se")) ## S4 method for signature 'lavaan.mi' fitMeasures( object, fit.measures = "all", baseline.model = NULL, h1.model = NULL, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08, cat.check.pd = TRUE), output = "vector", omit.imps = c("no.conv", "no.se"), ... ) ## S4 method for signature 'lavaan.mi' fitmeasures( object, fit.measures = "all", baseline.model = NULL, h1.model = NULL, fm.args = list(standard.test = "default", scaled.test = "default", rmsea.ci.level = 0.9, rmsea.h0.closefit = 0.05, rmsea.h0.notclosefit = 0.08, robust = 0.08, cat.check.pd = TRUE), output = "vector", omit.imps = c("no.conv", "no.se"), ... )
object |
An object of class lavaan.mi |
header , fit.measures , fm.args , estimates , ci , standardized , std , cov.std , rsquare , remove.unused , modindices , nd , output
|
See descriptions of |
fmi |
|
asymptotic |
|
scale.W |
|
omit.imps |
|
... |
Additional arguments passed to |
total |
|
type |
The meaning of this argument varies depending on which method it
it used for. Find detailed descriptions in the Value section
under |
labels |
|
baseline.model , h1.model
|
coef |
|
vcov |
|
fitted.values |
|
fitted |
alias for |
nobs |
|
fitMeasures |
|
fitmeasures |
alias for |
show |
|
summary |
|
coefList
list
of estimated coefficients in matrix format (one
per imputation) as output by lavInspect(fit, "est")
phiList
list
of model-implied latent-variable covariance
matrices (one per imputation) as output by
lavInspect(fit, "cov.lv")
miList
list
of modification indices output by
lavaan::modindices()
lavListCall
call to lavaan::lavaanList()
used to fit the
model to the list of imputed data sets in @DataList
, stored as a
list
of arguments
convergence
list
of logical
vectors indicating whether,
for each imputed data set, (1) the model converged on a solution, (2)
SEs could be calculated, (3) the (residual) covariance matrix of
latent variables () is non-positive-definite, and (4) the
residual covariance matrix of observed variables (
) is
non-positive-definite.
version
Named character
vector indicating the lavaan
and
lavaan.mi
version numbers.
DataList
The list
of imputed data sets
SampleStatsList
List of output from
lavInspect(fit, "sampstat")
applied to each fitted model.
ParTableList,vcovList,testList,baselineList
h1List
See lavaan::lavaanList. An additional element is
added to the list
: $PT
is the "saturated" model's parameter
table, returned by lavaan::lav_partable_unrestricted()
.
call,Options,ParTable,pta,Data,Model,meta,timingList,CacheList,optimList,impliedList,loglikList,internalList,funList,external
By default, lavaan.mi()
does not populate the remaining @*List
slots
from the lavaan::lavaanList class. But they can be added to the call using
the store.slots=
argument (passed to lavaan::lavaanList()
via ...).
See the lavaan.mi()
function
for details. Wrapper functions include cfa.mi()
,
sem.mi()
, and growth.mi()
.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## vector of pooled coefficients coef(fit) ## their pooled asymptotic covariance matrix vcov(fit) ## which is the weighted sum of within- and between-imputation components vcov(fit, type = "within") vcov(fit, type = "between") ## covariance matrix of observed variables, ## as implied by pooled estimates fitted(fit) ## custom null model for CFI HS.parallel <- ' visual =~ x1 + 1*x2 + 1*x3 textual =~ x4 + 1*x5 + 1*x6 speed =~ x7 + 1*x8 + 1*x9 ' fit0 <- cfa.mi(HS.parallel, data = HS20imps, orthogonal = TRUE) fitMeasures(fit, baseline.model = fit0, fit.measures = "default", output = "text") ## See ?lavaan.mi help page for more examples
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## vector of pooled coefficients coef(fit) ## their pooled asymptotic covariance matrix vcov(fit) ## which is the weighted sum of within- and between-imputation components vcov(fit, type = "within") vcov(fit, type = "between") ## covariance matrix of observed variables, ## as implied by pooled estimates fitted(fit) ## custom null model for CFI HS.parallel <- ' visual =~ x1 + 1*x2 + 1*x3 textual =~ x4 + 1*x5 + 1*x6 speed =~ x7 + 1*x8 + 1*x9 ' fit0 <- cfa.mi(HS.parallel, data = HS20imps, orthogonal = TRUE) fitMeasures(fit, baseline.model = fit0, fit.measures = "default", output = "text") ## See ?lavaan.mi help page for more examples
This function calculates residuals for sample moments (e.g., means and (co)variances, means) from a lavaan model fitted to multiple imputed data sets, along with summary and inferential statistics about the residuals.
## S4 method for signature 'lavaan.mi' residuals(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...) ## S4 method for signature 'lavaan.mi' resid(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...) lavResiduals.mi(object, omit.imps = c("no.conv", "no.se"), ...)
## S4 method for signature 'lavaan.mi' residuals(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...) ## S4 method for signature 'lavaan.mi' resid(object, type = "raw", omit.imps = c("no.conv", "no.se"), ...) lavResiduals.mi(object, omit.imps = c("no.conv", "no.se"), ...)
object |
An object of class |
type |
|
omit.imps |
|
... |
Arguments passed to |
A list
of residuals and other information (see lavaan::lavResiduals()
).
The standard residuals()
(and resid()
alias) method simply calls
lavResiduals.mi(..., zstat=FALSE, summary=FALSE)
.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
lavaan::lavResiduals()
for details about other arguments.
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## default type = "cor.bentler" (standardized covariance residuals) lavResiduals.mi(fit, zstat = FALSE) ## SRMR is in the $summary ## correlation residuals lavResiduals.mi(fit, zstat = FALSE, type = "cor") ## CRMR is in the $summary ## raw covariance residuals lavResiduals.mi(fit, type = "raw") # zstat=TRUE by default ## RMR is in the $summary ## "normalized" residuals are in $cov.z ## The standard resid() and residuals() method simply call lavResiduals.mi() ## with arguments to display only the residuals ("raw" by default). resid(fit) residuals(fit, type = "cor.bollen") # same as type = "cor"
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## default type = "cor.bentler" (standardized covariance residuals) lavResiduals.mi(fit, zstat = FALSE) ## SRMR is in the $summary ## correlation residuals lavResiduals.mi(fit, zstat = FALSE, type = "cor") ## CRMR is in the $summary ## raw covariance residuals lavResiduals.mi(fit, type = "raw") # zstat=TRUE by default ## RMR is in the $summary ## "normalized" residuals are in $cov.z ## The standard resid() and residuals() method simply call lavResiduals.mi() ## with arguments to display only the residuals ("raw" by default). resid(fit) residuals(fit, type = "cor.bollen") # same as type = "cor"
Likelihood ratio test (LRT) for lavaan models fitted to multiple imputed data sets.
lavTestLRT.mi( object, ..., modnames = NULL, asANOVA = TRUE, pool.method = c("D4", "D3", "D2"), omit.imps = c("no.conv", "no.se"), asymptotic = FALSE, pool.robust = FALSE ) ## S4 method for signature 'lavaan.mi' anova(object, ...)
lavTestLRT.mi( object, ..., modnames = NULL, asANOVA = TRUE, pool.method = c("D4", "D3", "D2"), omit.imps = c("no.conv", "no.se"), asymptotic = FALSE, pool.robust = FALSE ) ## S4 method for signature 'lavaan.mi' anova(object, ...)
object |
An object of class lavaan.mi |
... |
Additional objects of class lavaan.mi, as
well as arguments passed to |
modnames |
Optional |
asANOVA |
|
pool.method |
Find additional details in Enders (2010, chapter 8). |
omit.imps |
|
asymptotic |
|
pool.robust |
|
The "D2"
method is available using any estimator and test statistic.
When using a likelihood-based estimator, 2 additional methods are available
to pool the LRT.
The Meng & Rubin (1992) method, commonly referred to as "D3"
.
This method has many problems, discussed in Chan & Meng (2022).
The Chan & Meng (2022) method, referred to as "D4"
by
Grund et al. (2023), resolves problems with "D3"
.
When "D2"
is not explicitly requested in situations it is the only
applicable method, (e.g., DWLS for categorical outcomes), users are notified
that pool.method
was set to "D2"
.
pool.method = "Mplus"
implies "D3"
and asymptotic = TRUE
(see Asparouhov & Muthen, 2010).
Note that the anova()
method simply calls lavTestLRT.mi()
.
When asANOVA=TRUE
, returns an object of class stats::anova with a
a test of model fit for a single model (object
) or test(s) of the
difference(s) in fit between nested models passed via ...
(either an
F
or statistic, depending on the
asymptotic
argument),
its degrees of freedom, its p value, and 2 missing-data diagnostics:
the relative increase in variance (RIV = FMI / (1 FMI)) and the
fraction of missing information (FMI = RIV / (1 + RIV)).
When asANOVA=FALSE
, returns a vector containing the LRT statistic for
a single model or comparison of a single pair of models, or a
data.frame
of multiple model comparisons. Robust statistics will also
include the average (across imputations) scaling factor and
(if relevant) shift parameter(s), unless pool.robust = TRUE
.
When using pool.method = "D3"
or "D4"
, the vector for a single
model also includes its average log-likelihood and information criteria.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Based on source code for lavaan::lavTestLRT()
by Yves Rosseel.
Asparouhov, T., & Muthen, B. (2010). Chi-square statistics with multiple imputation. Technical Report. Retrieved from http://www.statmodel.com/
Chan, K. W., & Meng, X. L. (2022). Multiple improvements of multiple imputation likelihood ratio tests. Statistica Sinica, 32, 1489–1514. doi:10.5705/ss.202019.0314
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Grund, S., Lüdtke, O., & Robitzsch, A. (2023). Pooling methods for likelihood ratio tests in multiply imputed data sets. Psychological Methods, 28(5), 1207–1221. doi:10.1037/met0000556
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Meng, X.-L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103–111. doi:10.2307/2337151
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
lavaan::lavTestLRT()
for arguments that can be passed via ...,
and use lavaan::fitMeasures()
to obtain fit indices calculated from pooled
test statistics.
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from ?lavaan::cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit1 <- cfa.mi(HS.model, data = HS20imps, estimator = "mlm") ## By default, pool.method = "D4". ## Must request an asymptotic chi-squared statistic ## in order to accommodate a robust correction. lavTestLRT.mi(fit1, asymptotic = TRUE) ## or anova(fit1, asymptotic = TRUE) ## Comparison with more constrained (nested) models: parallel indicators HS.parallel <- ' visual =~ x1 + 1*x2 + 1*x3 textual =~ x4 + 1*x5 + 1*x6 speed =~ x7 + 1*x8 + 1*x9 ' fitp <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm") ## Even more constrained model: orthogonal factors fit0 <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm", orthogonal = TRUE) ## Compare 3 models, and pass the lavTestLRT(method=) argument lavTestLRT.mi(fit1, fit0, fitp, asymptotic = TRUE, method = "satorra.bentler.2010") ## For a single model, you can request a vector instead of an anova-class ## table in order to see naive information criteria (only using D3 or D4), ## which are calculated using the average log-likelihood across imputations. lavTestLRT.mi(fit1, asANOVA = FALSE) ## When using a least-squares (rather than maximum-likelihood) estimator, ## only the D2 method is available. For example, ordered-categorical data: data(binHS5imps) # import a list of 5 imputed data sets ## fit model using default DWLS estimation fit1c <- cfa.mi(HS.model , data = binHS5imps, ordered = TRUE) fit0c <- cfa.mi(HS.parallel, data = binHS5imps, ordered = TRUE, orthogonal = TRUE) ## Using D2, you can either robustify the pooled naive statistic ... lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2") ## ... or pool the robust chi-squared statistic (NOT recommended) lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2", pool.robust = TRUE) ## When calculating fit indices, you can pass lavTestLRT.mi() arguments: fitMeasures(fit1c, output = "text", # lavTestLRT.mi() arguments: pool.method = "D2", pool.robust = TRUE)
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from ?lavaan::cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit1 <- cfa.mi(HS.model, data = HS20imps, estimator = "mlm") ## By default, pool.method = "D4". ## Must request an asymptotic chi-squared statistic ## in order to accommodate a robust correction. lavTestLRT.mi(fit1, asymptotic = TRUE) ## or anova(fit1, asymptotic = TRUE) ## Comparison with more constrained (nested) models: parallel indicators HS.parallel <- ' visual =~ x1 + 1*x2 + 1*x3 textual =~ x4 + 1*x5 + 1*x6 speed =~ x7 + 1*x8 + 1*x9 ' fitp <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm") ## Even more constrained model: orthogonal factors fit0 <- cfa.mi(HS.parallel, data = HS20imps, estimator = "mlm", orthogonal = TRUE) ## Compare 3 models, and pass the lavTestLRT(method=) argument lavTestLRT.mi(fit1, fit0, fitp, asymptotic = TRUE, method = "satorra.bentler.2010") ## For a single model, you can request a vector instead of an anova-class ## table in order to see naive information criteria (only using D3 or D4), ## which are calculated using the average log-likelihood across imputations. lavTestLRT.mi(fit1, asANOVA = FALSE) ## When using a least-squares (rather than maximum-likelihood) estimator, ## only the D2 method is available. For example, ordered-categorical data: data(binHS5imps) # import a list of 5 imputed data sets ## fit model using default DWLS estimation fit1c <- cfa.mi(HS.model , data = binHS5imps, ordered = TRUE) fit0c <- cfa.mi(HS.parallel, data = binHS5imps, ordered = TRUE, orthogonal = TRUE) ## Using D2, you can either robustify the pooled naive statistic ... lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2") ## ... or pool the robust chi-squared statistic (NOT recommended) lavTestLRT.mi(fit1c, fit0c, asymptotic = TRUE, pool.method = "D2", pool.robust = TRUE) ## When calculating fit indices, you can pass lavTestLRT.mi() arguments: fitMeasures(fit1c, output = "text", # lavTestLRT.mi() arguments: pool.method = "D2", pool.robust = TRUE)
Score test (or "Lagrange multiplier" test) for lavaan models fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices pooled across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin's (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
lavTestScore.mi( object, add = NULL, release = NULL, pool.method = c("D2", "D1"), scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), asymptotic = is.null(add), univariate = TRUE, cumulative = FALSE, epc = FALSE, standardized = epc, cov.std = epc, verbose = FALSE, warn = TRUE, information = "expected" )
lavTestScore.mi( object, add = NULL, release = NULL, pool.method = c("D2", "D1"), scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), asymptotic = is.null(add), univariate = TRUE, cumulative = FALSE, epc = FALSE, standardized = epc, cov.std = epc, verbose = FALSE, warn = TRUE, information = "expected" )
object |
An object of class lavaan.mi. |
add |
Either a |
release |
Vector of |
pool.method |
|
scale.W |
|
omit.imps |
|
asymptotic |
|
univariate |
|
cumulative |
|
epc |
|
standardized |
If |
cov.std |
|
verbose |
|
warn |
|
information |
|
A list containing at least one data.frame
:
$test
: The total score test, with columns for the score
test statistic (X2
), its degrees of freedom (df
), its
p value under the distribution (
p.value
),
and if asymptotic=FALSE
, the average relative invrease in
variance (ARIV) used to calculate the denominator df is also
returned as a missing-data diagnostic, along with the fraction missing
information (FMI = ARIV / (1 + ARIV)).
$uni
: Optional (if univariate=TRUE
).
Each 1-df score test, equivalent to modification indices. Also
includes EPCs if epc=TRUE
, and RIV and FMI if
asymptotic=FALSE
.
$cumulative
: Optional (if cumulative=TRUE
).
Cumulative score tests, with ARIV and FMI if asymptotic=FALSE
.
$epc
: Optional (if epc=TRUE
). Parameter estimates,
expected parameter changes, and expected parameter values if ALL
the tested constraints were freed.
See lavaan::lavTestScore()
for details.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Based on source code for lavaan::lavTestScore()
by Yves Rosseel
pool.method = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
[email protected])
Bentler, P. M., & Chou, C.-P. (1992). Some new covariance structure model improvement statistics. Sociological Methods & Research, 21(2), 259–282. doi:10.1177/0049124192021002006
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' speed =~ c(L1, L1)*x7 + c(L1, L1)*x8 + c(L1, L1)*x9 ' fit <- cfa.mi(HS.model, data = HS20imps, group = "school", std.lv = TRUE) ## Mode 1: Score test for releasing equality constraints ## default test: Li et al.'s (1991) "D2" method lavTestScore.mi(fit, cumulative = TRUE) ## Li et al.'s (1991) "D1" method, ## adapted for score tests by Mansolf et al. (2020) lavTestScore.mi(fit, pool.method = "D1") ## Mode 2: Score test for adding currently fixed-to-zero parameters lavTestScore.mi(fit, add = 'x7 ~~ x8 + x9')
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' speed =~ c(L1, L1)*x7 + c(L1, L1)*x8 + c(L1, L1)*x9 ' fit <- cfa.mi(HS.model, data = HS20imps, group = "school", std.lv = TRUE) ## Mode 1: Score test for releasing equality constraints ## default test: Li et al.'s (1991) "D2" method lavTestScore.mi(fit, cumulative = TRUE) ## Li et al.'s (1991) "D1" method, ## adapted for score tests by Mansolf et al. (2020) lavTestScore.mi(fit, pool.method = "D1") ## Mode 2: Score test for adding currently fixed-to-zero parameters lavTestScore.mi(fit, add = 'x7 ~~ x8 + x9')
Wald test for testing a linear hypothesis about the parameters of lavaan models fitted to multiple imputed data sets. Statistics for constraining one or more free parameters in a model can be calculated from the pooled point estimates and asymptotic covariance matrix of model parameters using Rubin's (1987) rules, or by pooling the Wald test statistics across imputed data sets (Li, Meng, Raghunathan, & Rubin, 1991).
lavTestWald.mi( object, constraints = NULL, pool.method = c("D1", "D2"), asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), verbose = FALSE, warn = TRUE )
lavTestWald.mi( object, constraints = NULL, pool.method = c("D1", "D2"), asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), verbose = FALSE, warn = TRUE )
object |
An object of class lavaan.mi. |
constraints |
A |
pool.method |
|
asymptotic |
|
scale.W |
|
omit.imps |
|
verbose |
|
warn |
|
The constraints are specified using the "=="
operator.
Both the left-hand side and the right-hand side of the equality can contain
a linear combination of model parameters, or a constant (like zero).
The model parameters must be specified by their user-specified labels from
the link[lavaan]{model.syntax}
. Names of defined parameters
(using the ":=" operator) can be included too.
A vector containing the Wald test statistic (either an F
or
statistic, depending on the
asymptotic
argument),
the degrees of freedom (numerator and denominator, if
asymptotic = FALSE
), and a p value. If
asymptotic = FALSE
, the relative invrease in variance (RIV, or
average for multiparameter tests: ARIV) used to calculate the denominator
df is also returned as a missing-data diagnostic, along with the
fraction missing information (FMI = ARIV / (1 + ARIV)).
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Based on source code for lavaan::lavTestWald()
by Yves Rosseel
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + b1*x2 + x3 textual =~ x4 + b2*x5 + x6 speed =~ x7 + b3*x8 + x9 ' fit <- cfa.mi(HS.model, data = HS20imps) ## Testing whether a single parameter equals zero yields the 'chi-square' ## version of the Wald z statistic from the summary() output, or the ## 'F' version of the t statistic from the summary() output, depending ## whether asymptotic = TRUE or FALSE lavTestWald.mi(fit, constraints = "b1 == 0") # default D1 statistic lavTestWald.mi(fit, constraints = "b1 == 0", pool.method = "D2") # D2 statistic ## The real advantage is simultaneously testing several equality ## constraints, or testing more complex constraints: con <- ' 2*b1 == b3 b2 - b3 == 0 ' lavTestWald.mi(fit, constraints = con) # default F statistic lavTestWald.mi(fit, constraints = con, asymptotic = TRUE) # chi-squared
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + b1*x2 + x3 textual =~ x4 + b2*x5 + x6 speed =~ x7 + b3*x8 + x9 ' fit <- cfa.mi(HS.model, data = HS20imps) ## Testing whether a single parameter equals zero yields the 'chi-square' ## version of the Wald z statistic from the summary() output, or the ## 'F' version of the t statistic from the summary() output, depending ## whether asymptotic = TRUE or FALSE lavTestWald.mi(fit, constraints = "b1 == 0") # default D1 statistic lavTestWald.mi(fit, constraints = "b1 == 0", pool.method = "D2") # D2 statistic ## The real advantage is simultaneously testing several equality ## constraints, or testing more complex constraints: con <- ' 2*b1 == b3 b2 - b3 == 0 ' lavTestWald.mi(fit, constraints = con) # default F statistic lavTestWald.mi(fit, constraints = con, asymptotic = TRUE) # chi-squared
Modification indices (1-df Lagrange multiplier tests) from a latent variable model fitted to multiple imputed data sets. Statistics for releasing one or more fixed or constrained parameters in model can be calculated by pooling the gradient and information matrices across imputed data sets in a method proposed by Mansolf, Jorgensen, & Enders (2020)—analogous to the "D1" Wald test proposed by Li, Meng, Raghunathan, & Rubin (1991)—or by pooling the complete-data score-test statistics across imputed data sets (i.e., "D2"; Li et al., 1991).
modindices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL ) modificationIndices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL ) modificationindices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL )
modindices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL ) modificationIndices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL ) modificationindices.mi( object, pool.method = c("D2", "D1"), omit.imps = c("no.conv", "no.se"), standardized = TRUE, cov.std = TRUE, information = "expected", power = FALSE, delta = 0.1, alpha = 0.05, high.power = 0.75, sort. = FALSE, minimum.value = 0, maximum.number = nrow(LIST), na.remove = TRUE, op = NULL )
object |
An object of class lavaan.mi |
pool.method |
|
omit.imps |
|
standardized |
|
cov.std |
|
information |
|
power |
|
delta |
The value of the effect size, as used in the post-hoc power
computation, currently using the unstandardized metric of the |
alpha |
The significance level used for deciding if the modification index is statistically significant or not. |
high.power |
If the computed power is higher than this cutoff value,
the power is considered 'high'. If not, the power is considered 'low'.
This affects the values in the |
sort. |
|
minimum.value |
|
maximum.number |
|
na.remove |
|
op |
|
A data.frame
containing modification indices and (S)EPCs.
When pool.method = "D2"
, each (S)EPC will be pooled by taking its
average across imputations. When pool.method = "D1"
, EPCs will be
calculated in the standard way using the pooled gradient and information,
and SEPCs will be calculated by standardizing the EPCs using model-implied
(residual) variances.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Based on source code for lavaan::modindices()
by Yves Rosseel
pool.method = "D1"
method proposed by
Maxwell Mansolf (University of California, Los Angeles;
[email protected])
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data.Statistica Sinica, 1(1), 65–92. Retrieved from https://www.jstor.org/stable/24303994
Mansolf, M., Jorgensen, T. D., & Enders, C. K. (2020). A multiple imputation score test for model modification in structural equation models. Psychological Methods, 25(4), 393–411. doi:10.1037/met0000243
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- cfa.mi(HS.model, data = HS20imps) modindices.mi(fit) # default: Li et al.'s (1991) "D2" method ## Li et al.'s (1991) "D1" method, ## adapted for score tests by Mansolf et al. (2020) modindices.mi(fit, pool.method = "D1")
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- cfa.mi(HS.model, data = HS20imps) modindices.mi(fit) # default: Li et al.'s (1991) "D2" method ## Li et al.'s (1991) "D1" method, ## adapted for score tests by Mansolf et al. (2020) modindices.mi(fit, pool.method = "D1")
This function pools parameter estimates from a lavaan model fitted to multiple imputed data sets.
parameterEstimates.mi( object, se = TRUE, zstat = se, pvalue = zstat, ci = TRUE, level = 0.95, fmi = FALSE, standardized = FALSE, cov.std = TRUE, rsquare = FALSE, asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), remove.system.eq = TRUE, remove.eq = TRUE, remove.ineq = TRUE, remove.def = FALSE, remove.nonfree = FALSE, remove.unused = FALSE, output = "data.frame", header = FALSE )
parameterEstimates.mi( object, se = TRUE, zstat = se, pvalue = zstat, ci = TRUE, level = 0.95, fmi = FALSE, standardized = FALSE, cov.std = TRUE, rsquare = FALSE, asymptotic = FALSE, scale.W = !asymptotic, omit.imps = c("no.conv", "no.se"), remove.system.eq = TRUE, remove.eq = TRUE, remove.ineq = TRUE, remove.def = FALSE, remove.nonfree = FALSE, remove.unused = FALSE, output = "data.frame", header = FALSE )
object |
An object of class |
se , zstat , pvalue , ci , level , standardized , cov.std , rsquare , remove.system.eq , remove.eq , remove.ineq , remove.def , remove.nonfree , remove.unused , output , header
|
|
fmi |
Thus, RIV = FMI / (1 |
asymptotic |
|
scale.W |
|
omit.imps |
|
A data.frame
, analogous to lavaan::parameterEstimates()
, but estimates,
SEs, and tests are pooled across imputations.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. doi:10.1002/9780470316696
standardizedSolution.mi()
to obtain inferential statistics for pooled
standardized parameter estimates.
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## pooled estimates, with various optional features: parameterEstimates.mi(fit, asymptotic = TRUE, rsquare = TRUE) parameterEstimates.mi(fit, ci = FALSE, fmi = TRUE, output = "text") parameterEstimates.mi(fit, standardized = "std.all", se = FALSE)
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) ## pooled estimates, with various optional features: parameterEstimates.mi(fit, asymptotic = TRUE, rsquare = TRUE) parameterEstimates.mi(fit, ci = FALSE, fmi = TRUE, output = "text") parameterEstimates.mi(fit, standardized = "std.all", se = FALSE)
lavaan
Model to Multiple Imputed Data SetsThis function fits a saturated model to a list of imputed data sets, and returns a list of pooled summary statistics to treat as data.
poolSat( data, ..., return.fit = FALSE, scale.W = TRUE, omit.imps = c("no.conv", "no.se") )
poolSat( data, ..., return.fit = FALSE, scale.W = TRUE, omit.imps = c("no.conv", "no.se") )
data |
A |
... |
Additional arguments passed to |
return.fit |
|
scale.W |
|
omit.imps |
|
If return.fit=TRUE
, a lavaan.mi object.
Otherwise, an object of class lavMoments
, which is a list
that contains at least $sample.cov
and $sample.nobs
,
potentially also $sample.mean
, $sample.th
, $NACOV
,
and $WLS.V
. Also contains $lavOptions
that will be passed
to lavaan(...)
.
The $lavOptions
list will always set fixed.x=FALSE
and
conditional.x=FALSE
. Users should not override those options when
calling lavaan::lavaan()
because doing so would yield
incorrect SEs and test statistics. Computing the correct
$NACOV
argument would depend on which specific variables are
treated as fixed, which would require an argument to poolSat()
for
users to declare names of exogenous variables. This has not yet been
programmed, but that feature may be added in the future in order to reduce
the number of parameters to estimate.
However, if "exogenous" predictors were incomplete and imputed, then they
are not truly fixed (i.e., unvarying across samples), so treating them as
fixed would be illogical and yield biased SEs and test statistics.
The information returned by poolSat()
must assume that any fitted
SEM will include all the variables in $sample.cov
and (more
importantly) in $NACOV
. Although lavaan
can drop unused
rows/columns from $sample.cov
, it cannot be expected to drop the
corresponding sampling variances of those eliminated (co)variances from
$NACOV
. Thus, it is necessary to use poolSat()
to obtain
the appropriate summary statistics for any particular SEM (see Examples).
Terrence D. Jorgensen (University of Amsterdam; [email protected])
Lee, T., & Cai, L. (2012). Alternative multiple imputation inference for mean and covariance structure modeling. Journal of Educational and Behavioral Statistics, 37(6), 675–702. doi:10.3102/1076998612458320
Chung, S., & Cai, L. (2019). Alternative multiple imputation inference for categorical structural equation modeling, Multivariate Behavioral Research, 54(3), 323–337. doi:10.1080/00273171.2018.1523000
lavaan.mi()
for traditional method (fit SEM to each imputation,
pool results afterward).
data(HS20imps) # import a list of 20 imputed data sets ## fit saturated model to imputations, pool those results impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables (prePooledData <- poolSat(impSubset1)) ## Note: no means were returned (default lavOption() is meanstructure=FALSE) (prePooledData <- poolSat(impSubset1, meanstructure = TRUE)) ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to summary statistics in "prePooledData" fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE) ## By default, the "Scaled" column provides a "scaled.shifted" test ## statistic that maintains an approximately nominal Type I error rate. summary(fit, fit.measures = TRUE, standardized = "std.all") ## Note that this scaled statistic does NOT account for deviations from ## normality, because the default normal-theory standard errors were ## requested when running poolSat(). See below about non-normality. ## Alternatively, "Browne's residual-based (ADF) test" is also available: lavTest(fit, test = "browne.residual.adf", output = "text") ## Optionally, save the saturated-model lavaan.mi object, which ## could be helpful for diagnosing convergence problems per imputation. satFit <- poolSat(impSubset1, return.fit = TRUE) ## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES ## If you only want to analyze a subset of these variables, mod.vis <- 'visual =~ x1 + x2 + x3' ## you will get an error: try( fit.vis <- cfa(mod.vis, data = prePooledData) # error ) ## As explained in the "Note" section, you must use poolSat() again for ## this subset of variables impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables visData <- poolSat(impSubset3) fit.vis <- cfa(mod.vis, data = visData) # no problem ## OTHER lavaan OPTIONS ## fit saturated MULIPLE-GROUP model to imputations impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school")) (prePooledData2 <- poolSat(impSubset2, group = "school", ## request standard errors that are ROBUST ## to violations of the normality assumption: se = "robust.sem")) ## Nonnormality-robust standard errors are implicitly incorporated into the ## pooled weight matrix (NACOV= argument), so they are ## AUTOMATICALLY applied when fitting the model: fit.config <- cfa(HS.model, data = prePooledData2, group = "school", std.lv = TRUE) ## standard errors and chi-squared test of fit both robust to nonnormality summary(fit.config) ## CATEGORICAL OUTCOMES ## discretize the imputed data, for an example of 3-category data HS3cat <- lapply(impSubset1, function(x) { as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) ) }) ## pool polychoric correlations and thresholds (prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9))) fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE) summary(fitc) ## Optionally, use unweighted least-squares estimation. However, ## you must first REMOVE the pooled weight matrix (WLS.V= argument) ## or replace it with an identity matrix of the same dimensions: prePooledData4 <- prePooledData3 prePooledData4$WLS.V <- NULL ## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V)) fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS") ## Note that the SEs and test were still appropriately corrected: summary(fitcu)
data(HS20imps) # import a list of 20 imputed data sets ## fit saturated model to imputations, pool those results impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables (prePooledData <- poolSat(impSubset1)) ## Note: no means were returned (default lavOption() is meanstructure=FALSE) (prePooledData <- poolSat(impSubset1, meanstructure = TRUE)) ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to summary statistics in "prePooledData" fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE) ## By default, the "Scaled" column provides a "scaled.shifted" test ## statistic that maintains an approximately nominal Type I error rate. summary(fit, fit.measures = TRUE, standardized = "std.all") ## Note that this scaled statistic does NOT account for deviations from ## normality, because the default normal-theory standard errors were ## requested when running poolSat(). See below about non-normality. ## Alternatively, "Browne's residual-based (ADF) test" is also available: lavTest(fit, test = "browne.residual.adf", output = "text") ## Optionally, save the saturated-model lavaan.mi object, which ## could be helpful for diagnosing convergence problems per imputation. satFit <- poolSat(impSubset1, return.fit = TRUE) ## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES ## If you only want to analyze a subset of these variables, mod.vis <- 'visual =~ x1 + x2 + x3' ## you will get an error: try( fit.vis <- cfa(mod.vis, data = prePooledData) # error ) ## As explained in the "Note" section, you must use poolSat() again for ## this subset of variables impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables visData <- poolSat(impSubset3) fit.vis <- cfa(mod.vis, data = visData) # no problem ## OTHER lavaan OPTIONS ## fit saturated MULIPLE-GROUP model to imputations impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school")) (prePooledData2 <- poolSat(impSubset2, group = "school", ## request standard errors that are ROBUST ## to violations of the normality assumption: se = "robust.sem")) ## Nonnormality-robust standard errors are implicitly incorporated into the ## pooled weight matrix (NACOV= argument), so they are ## AUTOMATICALLY applied when fitting the model: fit.config <- cfa(HS.model, data = prePooledData2, group = "school", std.lv = TRUE) ## standard errors and chi-squared test of fit both robust to nonnormality summary(fit.config) ## CATEGORICAL OUTCOMES ## discretize the imputed data, for an example of 3-category data HS3cat <- lapply(impSubset1, function(x) { as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) ) }) ## pool polychoric correlations and thresholds (prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9))) fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE) summary(fitc) ## Optionally, use unweighted least-squares estimation. However, ## you must first REMOVE the pooled weight matrix (WLS.V= argument) ## or replace it with an identity matrix of the same dimensions: prePooledData4 <- prePooledData3 prePooledData4$WLS.V <- NULL ## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V)) fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS") ## Note that the SEs and test were still appropriately corrected: summary(fitcu)
This function calculates pooled parameter estimates from a lavaan model fitted to multiple imputed data sets, then transforms the pooled estimates and their SEs using the delta method.
standardizedSolution.mi( object, return.vcov = FALSE, omit.imps = c("no.conv", "no.se"), ... )
standardizedSolution.mi( object, return.vcov = FALSE, omit.imps = c("no.conv", "no.se"), ... )
object |
An object of class |
return.vcov |
|
omit.imps |
|
... |
Arguments passed to |
A data.frame
containing standardized model parameters, analogous to
lavaan::standardizedSolution()
. Delta-method SEs and CIs rely on
asymptotic theory, so only Wald z tests are available, analogous to
setting parameterEstimates.mi(fit, asymptotic = TRUE)
.
Terrence D. Jorgensen (University of Amsterdam; [email protected])
parameterEstimates.mi()
for pooling unstandardized parameter estimates,
which can also add standardized point estimates to indicate effect size.
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) standardizedSolution.mi(fit) # default: type = "std.all" ## only standardize latent variables: standardizedSolution.mi(fit, type = "std.lv", output = "text") # display like summary()
data(HS20imps) # import a list of 20 imputed data sets ## specify CFA model from lavaan's ?cfa help page HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' ## fit model to 20 imputed data sets fit <- cfa.mi(HS.model, data = HS20imps) standardizedSolution.mi(fit) # default: type = "std.all" ## only standardize latent variables: standardizedSolution.mi(fit, type = "std.lv", output = "text") # display like summary()