Title: | Estimates and Plots Single-Level and Multilevel Latent Class Models |
---|---|
Description: | Efficiently estimates single- and multilevel latent class models with covariates, allowing for output visualization in all specifications. For more technical details, see Lyrvall et al (2023) <doi:10.48550/arXiv.2305.07276>. |
Authors: | Roberto Di Mari [aut, cre], Johan Lyrvall [aut], Zsuzsa Bakk [ctb], Jennifer Oser [ctb], Jouni Kuha [ctb] |
Maintainer: | Roberto Di Mari <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5.2 |
Built: | 2024-12-12 06:52:18 UTC |
Source: | CRAN |
Efficiently estimates single- and multilevel latent class models with covariates, allowing for output visualization in all specifications. For more technical details, see Lyrvall et al (2023) <doi:10.48550/arXiv.2305.07276>.
For estimating latent class models, see multiLCA
.
For plotting latent class models, see plot.multiLCA
Roberto Di Mari and Johan Lyrvall.
Maintainer: Roberto Di Mari <[email protected]>
Bakk, Z., & Kuha, J. (2018). Two-step estimation of models between latent classes and external variables. Psychometrika, 83, 871-892.
Bakk, Z., Di Mari, R., Oser, J., & Kuha, J. (2022). Two-stage multilevel latent class analysis with covariates in the presence of direct effects. Structural Equation Modeling: A Multidisciplinary Journal, 29(2), 267-277.
Di Mari, Bakk, Z., R., Oser, J., & Kuha, J. (2023). A two-step estimator for multilevel latent class analysis with covariates. Psychometrika.
Lukociene, O., Varriale, R., & Vermunt, J. K. (2010). The simultaneous decision(s) about the number of lower-and higher-level classes in multilevel latent class analysis. Sociological Methodology, 40(1), 247-283.
data = dataIEA Y = colnames(dataIEA)[4+1:12] out = multiLCA(data = data, Y = Y, iT = 2) out plot(out, horiz = FALSE)
data = dataIEA Y = colnames(dataIEA)[4+1:12] out = multiLCA(data = data, Y = Y, iT = 2) out plot(out, horiz = FALSE)
Data set from the International Civic and Citizenship Education Study 2016 (Schulz et al., 2018). As part of a comprehensive evaluation of education systems, the IEA conducted surveys in 1999, 2009 and 2016 in school classes of 14-year olds to investigate civic education with the same scientific rigor as the evaluation of more traditional educational skills of language and mathematics. The present study focuses on the third wave of the survey that was conducted in 2016.
Questions regarding citizenship norms in all three waves asked respondents to explain their understanding of what a good adult citizen is or does. The survey then lists a variety of activities for respondents to rate in terms of how important these activities are in order to be considered a good adult citizen. The twelve items range from obeying the law and voting in elections, to protecting the environment and defending human rights.
Covariates included are customary determinants of citizenship norms from the literature at the individual-level of socio-economic measures and country-level measure of gross domestic product (GDP) per capita.
data("dataIEA")
data("dataIEA")
A data frame with 90221 observations on the following 28 variables.
ICCS_year
Year of survey
COUNTRY
Country
IDSTUD
Study ID
TOTWGTS
Study weight
obey
Always obeying the law
rights
Taking part in activities promoting human rights
local
Participating in activities to benefit people in the local community
work
Working hard
envir
Taking part in activities to protect the environment
vote
Voting in every national election
history
Learning about the country's history
respect
Showing respect for government representatives
news
Following political issues in the newspaper, on the radio, on TV, or on the Internet
protest
Participating in peaceful protests against laws believed to be unjust
discuss
Engaging in political discussions
party
Joining a political party
female
Female
books
Number of books at home
edexp
Educational expectations
ed_mom
Mother education
ed_dad
Father education
nonnat_born
Non-native born
immigrantfam
Immigrant family
nonnat_lang
Non-native language level
gdp_constant
GDP
log_gdp_constant
Log GDP
gdp_currentusd
GDP in USD
log_gdp_currentusd
Log GDP in USD
Schulz, W., Ainley, J., Fraillon, J., Losito, B., Agrusti, G., & Friedman, T. (2018). Becoming citizens in a changing world: IEA International Civic and Citizenship Education Study 2016 international report. Springer.
Artificial multilevel data set.
data("dataTOY")
data("dataTOY")
A data frame with 3000 observations on the following 13 variables.
id_high
High-level id
Y_1
Indicator n.1
Y_2
Indicator n.2
Y_3
Indicator n.3
Y_4
Indicator n.4
Y_5
Indicator n.5
Y_6
Indicator n.6
Y_7
Indicator n.7
Y_8
Indicator n.8
Y_9
Indicator n.9
Y_10
Indicator n.10
Z_low
Continuous low-level covariate
Z_high
Continuous high-level covariate
Di Mari, Bakk, Z., R., Oser, J., & Kuha, J. (2023). A two-step estimator for multilevel latent class analysis with covariates. Under review. Available from https://arxiv.org/abs/2303.06091.
The multiLCA
function in the multilevLCA
package estimates single- and multilevel measurement and structural latent class models. Moreover, the function performs two different strategies for model selection. Methodological details can be found in Bakk et al. (2022), Bakk and Kuha (2018), and Di Mari et al. (2023).
Different output visualization tools are available for all model specifications. See, e.g., plot.multiLCA
.
multiLCA( data, Y, iT, id_high = NULL, iM = NULL, Z = NULL, Zh = NULL, extout = FALSE, dataout = TRUE, kmea = TRUE, sequential = TRUE, numFreeCores = 2, maxIter = 1e3, tol = 1e-8, reord = TRUE, fixedpars = 1, NRmaxit = 100, NRtol = 1e-6, verbose = TRUE )
multiLCA( data, Y, iT, id_high = NULL, iM = NULL, Z = NULL, Zh = NULL, extout = FALSE, dataout = TRUE, kmea = TRUE, sequential = TRUE, numFreeCores = 2, maxIter = 1e3, tol = 1e-8, reord = TRUE, fixedpars = 1, NRmaxit = 100, NRtol = 1e-6, verbose = TRUE )
data |
Input matrix or dataframe. |
Y |
Names of |
iT |
Number of lower-level latent classes. |
id_high |
Name of |
iM |
Number of higher-level latent classes. Default: |
Z |
Names of |
Zh |
Names of |
extout |
Whether to output extensive model and estimation information. Default: |
dataout |
Whether to match class predictions to the observed data. Default: |
kmea |
Whether to compute starting values for single-level model using |
sequential |
Whether to perform sequential model selection ( |
numFreeCores |
If performing parallelized model selection, the number of CPU cores to keep free. Default: |
maxIter |
Maximum number of iterations for EM algorithm. Default: |
tol |
Tolerance for EM algorithm. Default: |
reord |
Whether to (re)order classes in decreasing order according to probability of scoring yes on all items. Default: |
fixedpars |
One-step estimator ( |
NRmaxit |
Maximum number of iterations for Newton-Raphson algorithm. Default: |
NRtol |
Tolerance for Newton-Raphson algorithm. Default: |
verbose |
Whether to print estimation progress. Default: |
The indicator columns may be coded as as consecutive sequence of integers from 0, or as characters.
To directly estimate a latent class model, iT
and (optionally) iM
should be specified as a single positive integer. To perform model selection over range of consecutive positive integers as the number of latent classes, iT
and/or iM
may be specified in the form iT_min:iT_max
and/or iM_min:iM_max
. It is possible to specify iT = iT_min:iT_max
with either iM = NULL
or iM
equal to a single positive integer, iM = iM_min:iM_max
with iT
equal to a single positive integer, or iT = iT_min:iT_max
with iM = iM_min:iM_max
. All model selection procedures return the output of the optimal model based on the BIC.
In the case where both iT
and iM
are defined as a range of consecutive positive integers, model selection can be performed using the sequential three-stage approach (Lukociene et al., 2010) or a simultaneous approach. The sequential approach involves (first step) estimating iT_min:iT_max
single-level models and identifying the optimal alternative iT_opt1
based on the BIC, (second step) estimating iM_min:iM_max|iT = iT_opt1
multilevel models and identifying the optimal alternative iM_opt2
based on the higher-level BIC, and (third step) estimating iT_min:iT_max|iM = iM_opt2
multilevel models and identifying the optimal alternative iT_opt3
based on the lower-level BIC. The simultaneous approach involves devoting multiple CPU cores on the local machine to estimate all combinations in iT = iT_min:iT_max, iM = iM_min:iM_max
and identifying the optimal alternative based on the lower-level BIC.
Single-level model estimation returns (if extout = TRUE
, a subset):
vPi |
Class proportions |
mPhi |
Response probabilities given the latent classes |
mU |
Matrix of posterior class assignment (proportional assignment) |
mU_modal |
Matrix of posterior class assignment (modal assignment) |
vU_modal |
Vector of posterior class assignment (modal assignment) |
mClassErr |
Expected number of classification errors |
mClassErrProb |
Expected proportion of classification errors |
AvgClassErrProb |
Average of |
R2entr |
Entropy-based R |
BIC |
Bayesian Information Criterion (BIC) |
AIC |
Akaike Information Criterion (AIC) |
vGamma |
Intercepts in logistic parametrization for class proportions |
mBeta |
Intercepts in logistic parametrization for response probabilities |
parvec |
Vector of logistic parameters |
SEs |
Standard errors |
Varmat |
Variance-covariance matrix |
iter |
Number of iterations for EM algorithm |
eps |
Difference between last two elements of log-likelihood sequence for EM algorithm |
LLKSeries |
Full log-likelihood series for EM algorithm |
mScore |
Contributions to log-likelihood score |
spec |
Model specification |
Single-level model estimation with covariates returns (if extout = TRUE
, a subset):
mPi |
Class proportions given the covariates |
vPi_avg |
Sample average of |
mPhi |
Response probabilities given the latent classes |
mU |
Matrix of posterior class assignment (proportional assignment) |
mClassErr |
Expected number of classification errors |
mClassErrProb |
Expected proportion of classification errors |
AvgClassErrProb |
Average of |
R2entr |
Entropy-based R |
BIC |
Bayesian Information Criterion (BIC) |
AIC |
Akaike Information Criterion (AIC) |
cGamma |
Intercept and slope parameters in logistic models for conditional class membership |
mBeta |
Intercepts in logistic parametrization for response probabilities |
parvec |
Vector of logistic parameters |
SEs_unc |
Uncorrected standard errors |
SEs_cor |
Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
SEs_cor_gamma |
Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
mQ |
Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
Varmat_unc |
Uncorrected variance-covariance matrix |
Varmat_cor |
Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
mV2 |
Inverse of information matrix for structural model |
iter |
Number of iterations for EM algorithm |
eps |
Difference between last two elements of log-likelihood sequence for EM algorithm |
LLKSeries |
Full log-likelihood series for EM algorithm |
spec |
Model specification |
Multilevel model estimation returns (if extout = TRUE
, a subset):
vOmega |
Higher-level class proportions |
mPi |
Lower-level class proportions given the higher-level latent classes |
mPhi |
Response probabilities given the lower-level latent classes |
cPMX |
Posterior joint class assignment (proportional assignment) |
cLogPMX |
Log of |
cPX |
Posterior lower-level class assignment given high-level class membership (proportional assignment) |
cLogPX |
Log of |
mSumPX |
Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment) |
mPW |
Posterior higher-level class assignment for higher-level units (proportional assignment) |
mlogPW |
Log of |
mPW_N |
Posterior higher-level class assignment for lower-level units (proportional assignment) |
mPMsumX |
Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment) |
R2entr_low |
Lower-level entropy-based R |
R2entr_high |
Higher-level entropy-based R |
BIClow |
Lower-level Bayesian Information Criterion (BIC) |
BIChigh |
Higher-level Bayesian Information Criterion (BIC) |
ICL_BIClow |
Lower-level BIC-type approximation the integrated complete likelihood |
ICL_BIChigh |
Higher-level BIC-type approximation the integrated complete likelihood |
AIC |
Akaike Information Criterion (AIC) |
vAlpha |
Intercepts in logistic parametrization for higher-level class proportions |
mGamma |
Intercepts in logistic parametrization for conditional lower-level class proportions |
mBeta |
Intercepts in logistic parametrization for response probabilities |
parvec |
Vector of logistic parameters |
SEs |
Standard errors |
Varmat |
Variance-covariance matrix |
Infomat |
Expected information matrix |
iter |
Number of iterations for EM algorithm |
eps |
Difference between last two elements of log-likelihood sequence for EM algorithm |
LLKSeries |
Full log-likelihood series for EM algorithm |
vLLK |
Current log-likelihood for higher-level units |
mScore |
Contributions to log-likelihood score |
spec |
Model specification |
Multilevel model estimation with lower-level covariates returns (if extout = TRUE
, a subset):
vOmega |
Higher-level class proportions |
mPi |
Lower-level class proportions given the higher-level latent classes and the covariates |
mPi_avg |
Sample average of |
mPhi |
Response probabilities given the lower-level latent classes |
cPMX |
Posterior joint class assignment (proportional assignment) |
cLogPMX |
Log of |
cPX |
Posterior lower-level class assignment given high-level class membership (proportional assignment) |
cLogPX |
Log of |
mSumPX |
Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment) |
mPW |
Posterior higher-level class assignment for higher-level units (proportional assignment) |
mlogPW |
Log of |
mPW_N |
Posterior higher-level class assignment for lower-level units (proportional assignment) |
mPMsumX |
Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment) |
R2entr_low |
Lower-level entropy-based R |
R2entr_high |
Higher-level entropy-based R |
BIClow |
Lower-level Bayesian Information Criterion (BIC) |
BIChigh |
Higher-level Bayesian Information Criterion (BIC) |
ICL_BIClow |
Lower-level BIC-type approximation the integrated complete likelihood |
ICL_BIChigh |
Higher-level BIC-type approximation the integrated complete likelihood |
AIC |
Akaike Information Criterion (AIC) |
vAlpha |
Intercepts in logistic parametrization for higher-level class proportions |
cGamma |
Intercept and slope parameters in logistic models for conditional lower-level class membership |
mBeta |
Intercepts in logistic parametrization for response probabilities |
parvec |
Vector of logistic parameters |
SEs_unc |
Uncorrected standard errors |
SEs_cor |
Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
SEs_cor_gamma |
Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
mQ |
Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
Varmat_unc |
Uncorrected variance-covariance matrix |
Varmat_cor |
Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
Infomat |
Expected information matrix |
cGamma_Info |
Expected information matrix only for the gammas |
mV2 |
Inverse of information matrix for structural model |
iter |
Number of iterations for EM algorithm |
eps |
Difference between last two elements of log-likelihood sequence for EM algorithm |
LLKSeries |
Full log-likelihood series for EM algorithm |
vLLK |
Current log-likelihood for higher-level units |
mScore |
Contributions to log-likelihood score |
mGamma_Score |
Contributions to log-likelihood score only for the gammas |
spec |
Model specification |
Multilevel model estimation with lower- and higher-level covariates returns (if extout = TRUE
, a subset):
mOmega |
Higher-level class proportions given the covariates |
vOmega_avg |
Higher-level class proportions averaged over higher-level units |
mPi |
Lower-level class proportions given the higher-level latent classes and the covariates |
mPi_avg |
Sample average of |
mPhi |
Response probabilities given the lower-level latent classes |
cPMX |
Posterior joint class assignment (proportional assignment) |
cLogPMX |
Log of |
cPX |
Posterior lower-level class assignment given high-level class membership (proportional assignment) |
cLogPX |
Log of |
mSumPX |
Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment) |
mPW |
Posterior higher-level class assignment for higher-level units (proportional assignment) |
mlogPW |
Log of |
mPW_N |
Posterior higher-level class assignment for lower-level units (proportional assignment) |
mPMsumX |
Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment) |
R2entr_low |
Lower-level entropy-based R |
R2entr_high |
Higher-level entropy-based R |
BIClow |
Lower-level Bayesian Information Criterion (BIC) |
BIChigh |
Higher-level Bayesian Information Criterion (BIC) |
ICL_BIClow |
Lower-level BIC-type approximation the integrated complete likelihood |
ICL_BIChigh |
Higher-level BIC-type approximation the integrated complete likelihood |
AIC |
Akaike Information Criterion (AIC) |
mAlpha |
Intercept and slope parameters in logistic models for conditional higher-level class membership |
cGamma |
Intercept and slope parameters in logistic models for conditional lower-level class membership |
mBeta |
Intercepts in logistic parametrization for response probabilities |
parvec |
Vector of logistic parameters |
SEs_unc |
Uncorrected standard errors |
SEs_cor |
Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
SEs_cor_alpha |
Corrected standard errors only for the alphas (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
SEs_cor_gamma |
Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
mQ |
Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
Varmat_unc |
Uncorrected variance-covariance matrix |
Varmat_cor |
Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023) |
Infomat |
Expected information matrix |
cAlpha_Info |
Expected information matrix only for the alphas |
cGamma_Info |
Expected information matrix only for the gammas |
mV2 |
Inverse of information matrix for structural model |
iter |
Number of iterations for EM algorithm |
eps |
Difference between last two elements of log-likelihood sequence for EM algorithm |
LLKSeries |
Full log-likelihood series for EM algorithm |
vLLK |
Current log-likelihood for higher-level units |
mScore |
Contributions to log-likelihood score |
mAlpha_Score |
Contributions to log-likelihood score only for the alphas |
mGamma_Score |
Contributions to log-likelihood score only for the gammas |
spec |
Model specification |
Bakk, Z., & Kuha, J. (2018). Two-step estimation of models between latent classes and external variables. Psychometrika, 83, 871-892.
Bakk, Z., Di Mari, R., Oser, J., & Kuha, J. (2022). Two-stage multilevel latent class analysis with covariates in the presence of direct effects. Structural Equation Modeling: A Multidisciplinary Journal, 29(2), 267-277.
Di Mari, Bakk, Z., R., Oser, J., & Kuha, J. (2023). A two-step estimator for multilevel latent class analysis with covariates. Psychometrika.
Lukociene, O., Varriale, R., & Vermunt, J. K. (2010). The simultaneous decision(s) about the number of lower-and higher-level classes in multilevel latent class analysis. Sociological Methodology, 40(1), 247-283.
# Use the artificial data set data = dataTOY # Define vector with names of columns with items Y = colnames(data)[1+1:10] # Define name of column with higher-level id id_high = "id_high" # Define vector with names of columns with lower-level covariates Z = c("Z_low") # Define vector with names of columns with higher-level covariates Zh = c("Z_high") # Single-level 3-class LC model with covariates out = multiLCA(data, Y, 3, Z = Z, verbose = FALSE) out # Multilevel LC model out = multiLCA(data, Y, 3, id_high, 2, verbose = FALSE) out # Multilevel LC model lower-level covariates out = multiLCA(data, Y, 3, id_high, 2, Z, verbose = FALSE) out # Multilevel LC model lower- and higher-level covariates out = multiLCA(data, Y, 3, id_high, 2, Z, Zh, verbose = FALSE) out # Model selection over single-level models with 1-3 classes out = multiLCA(data, Y, 1:3, verbose = FALSE) out # Model selection over multilevel models with 1-3 lower-level classes and # 2 higher-level classes out = multiLCA(data, Y, 1:3, id_high, 2, verbose = FALSE) out # Model selection over multilevel models with 3 lower-level classes and # 1-2 higher-level classes out = multiLCA(data, Y, 3, id_high, 1:2, verbose = FALSE) out # Model selection over multilevel models with 1-3 lower-level classes and # 1-2 higher-level classes using the default sequential approach out = multiLCA(data, Y, 1:3, id_high, 1:2, verbose = FALSE) out
# Use the artificial data set data = dataTOY # Define vector with names of columns with items Y = colnames(data)[1+1:10] # Define name of column with higher-level id id_high = "id_high" # Define vector with names of columns with lower-level covariates Z = c("Z_low") # Define vector with names of columns with higher-level covariates Zh = c("Z_high") # Single-level 3-class LC model with covariates out = multiLCA(data, Y, 3, Z = Z, verbose = FALSE) out # Multilevel LC model out = multiLCA(data, Y, 3, id_high, 2, verbose = FALSE) out # Multilevel LC model lower-level covariates out = multiLCA(data, Y, 3, id_high, 2, Z, verbose = FALSE) out # Multilevel LC model lower- and higher-level covariates out = multiLCA(data, Y, 3, id_high, 2, Z, Zh, verbose = FALSE) out # Model selection over single-level models with 1-3 classes out = multiLCA(data, Y, 1:3, verbose = FALSE) out # Model selection over multilevel models with 1-3 lower-level classes and # 2 higher-level classes out = multiLCA(data, Y, 1:3, id_high, 2, verbose = FALSE) out # Model selection over multilevel models with 3 lower-level classes and # 1-2 higher-level classes out = multiLCA(data, Y, 3, id_high, 1:2, verbose = FALSE) out # Model selection over multilevel models with 1-3 lower-level classes and # 1-2 higher-level classes using the default sequential approach out = multiLCA(data, Y, 1:3, id_high, 1:2, verbose = FALSE) out
Visualizes conditional response probabilities estimated by the multiLCA
function. The method works for both single- and multilevel models.
Let out
denote the list object returned by the multiLCA
function. Executing plot(out)
visualizes the conditional response probabilities given by the mPhi
matrix in out
.
## S3 method for class 'multiLCA' plot(x, horiz = FALSE, clab = NULL, ...)
## S3 method for class 'multiLCA' plot(x, horiz = FALSE, clab = NULL, ...)
x |
The object returned by the |
horiz |
Whether item labels should be oriented horizontally ( |
clab |
A character vector with user-specified class labels, if available, in the order "Class 1", "Class 2", ... under the default settings, i.e. top-to-bottom. Default |
... |
Additional plotting arguments |
No return value
# Use IEA data data = dataIEA # Define vector with names of columns with items Y = colnames(data)[4+1:12] # Define number of (low-level) classes iT = 3 # Estimate single-level measurement model out = multiLCA(data = data, Y = Y, iT = iT) out # Plot conditional response probabilities with default settings plot(out) # Plot with vertical item labels and custom class labels plot(out, horiz = FALSE, clab = c("Maximal", "Engaged", "Subject"))
# Use IEA data data = dataIEA # Define vector with names of columns with items Y = colnames(data)[4+1:12] # Define number of (low-level) classes iT = 3 # Estimate single-level measurement model out = multiLCA(data = data, Y = Y, iT = iT) out # Plot conditional response probabilities with default settings plot(out) # Plot with vertical item labels and custom class labels plot(out, horiz = FALSE, clab = c("Maximal", "Engaged", "Subject"))