Title: | Stochastic EM Algorithms for Latent Variable Models with a High-Dimensional Latent Space |
---|---|
Description: | Provides stochastic EM algorithms for latent variable models with a high-dimensional latent space. So far, we provide functions for confirmatory item factor analysis based on the multidimensional two parameter logistic (M2PL) model and the generalized multidimensional partial credit model. These functions scale well for problems with many latent traits (e.g., thirty or even more) and are virtually tuning-free. The computation is facilitated by multiprocessing 'OpenMP' API. For more information, please refer to: Zhang, S., Chen, Y., & Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. <doi:10.1111/bmsp.12153>. |
Authors: | Siliang Zhang [aut, cre], Yunxiao Chen [aut], Jorge Nocedal [cph], Naoaki Okazaki [cph] |
Maintainer: | Siliang Zhang <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2024-11-25 06:39:58 UTC |
Source: | CRAN |
The dataset contains the simulation setting and the response data.
data_sim_mirt
data_sim_mirt
An object of class list
of length 9.
The dataset contains the simulation setting and the response data.
data_sim_pcirt
data_sim_pcirt
An object of class list
of length 10.
Stochastic EM algorithm for solving multivariate item response theory model
StEM_mirt(response, Q, A0, d0, theta0, sigma0, m = 200, TT = 20, max_attempt = 40, tol = 1.5, precision = 0.01, parallel = FALSE)
StEM_mirt(response, Q, A0, d0, theta0, sigma0, m = 200, TT = 20, max_attempt = 40, tol = 1.5, precision = 0.01, parallel = FALSE)
response |
N by J matrix containing 0/1 responses, where N is the number of respondents and J is the number of items. |
Q |
J by K matrix containing 0/1 entries, where J is the number of items and K is the number of latent traits. Each entry indicates whether an item measures a certain latent trait. |
A0 |
J by K matrix, the initial value of loading matrix, satisfying the constraints given by Q. |
d0 |
Length J vector, the initial value of intercept parameters. |
theta0 |
N by K matrix, the initial value of latent traits for each respondent. |
sigma0 |
K by K matrix, the initial value of correlations among the latent traits. |
m |
The length of Markov chain window for choosing burn-in size with a default value 200. |
TT |
The batch size with a default value 20. |
max_attempt |
The maximum number of batches before stopping. |
tol |
The tolerance of geweke statistic used for determining burn-in size with a default value 1.5. |
precision |
The precision value for determining the stopping of the algorithm with a default value 1e-2. |
parallel |
Whether or not enable the parallel computing with a default value FALSE. |
The function returns a list with the following components:
The estimated loading matrix.
The estimated value of intercept parameters.
The estimated value of correlation matrix of latent traits.
The length of burn-in size.
Zhang, S., Chen, Y. and Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-Scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. To appear. D.C. Liu and J. Nocedal. On the Limited Memory Method for Large Scale Optimization (1989), Mathematical Programming B, 45, 3, pp. 503-528.
# run a toy example based on the M2PL model # load a simulated dataset attach(data_sim_mirt) # generate starting values for the algorithm A0 <- Q d0 <- rep(0, J) theta0 <- matrix(rnorm(N*K, 0, 1),N) sigma0 <- diag(1, K) # do the confirmatory MIRT analysis # to enable multicore processing, set parallel = T mirt_res <- StEM_mirt(response, Q, A0, d0, theta0, sigma0)
# run a toy example based on the M2PL model # load a simulated dataset attach(data_sim_mirt) # generate starting values for the algorithm A0 <- Q d0 <- rep(0, J) theta0 <- matrix(rnorm(N*K, 0, 1),N) sigma0 <- diag(1, K) # do the confirmatory MIRT analysis # to enable multicore processing, set parallel = T mirt_res <- StEM_mirt(response, Q, A0, d0, theta0, sigma0)
Stochastic EM algorithm for solving generalized partial credit model
StEM_pcirt(response, Q, A0, D0, theta0, sigma0, m = 200, TT = 20, max_attempt = 40, tol = 1.5, precision = 0.015, parallel = F)
StEM_pcirt(response, Q, A0, D0, theta0, sigma0, m = 200, TT = 20, max_attempt = 40, tol = 1.5, precision = 0.015, parallel = F)
response |
N by J matrix containing 0,1,...,M-1 responses, where N is the number of respondents and J is the number of items. |
Q |
J by K matrix containing 0/1 entries, where J is the number of items and K is the number of latent traits. Each entry indicates whether an item measures a certain latent trait. |
A0 |
J by K matrix, the initial value of loading matrix. |
D0 |
J by M matrix containing the initial value of intercept parameters, where M is the number of response categories. |
theta0 |
N by K matrix, the initial value of latent traits for each respondent |
sigma0 |
K by K matrix, the initial value of correlations among latent traits. |
m |
The length of Markov chain window for choosing burn-in size with a default value 200. |
TT |
The batch size with a default value 20. |
max_attempt |
The maximum attampt times if the precision criterion is not meet. |
tol |
The tolerance of geweke statistic used for determining burn in size, default value is 1.5. |
precision |
The pre-set precision value for determining the length of Markov chain, default value is 0.015. |
parallel |
Whether or not enable the parallel computing with a default value FALSE. |
The function returns a list with the following components:
The estimated loading matrix
The estimated value of intercept parameters.
The estimated value of correlation matrix of latent traits.
The length of burn in size.
Zhang, S., Chen, Y. and Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-Scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. To appear. D.C. Liu and J. Nocedal. On the Limited Memory Method for Large Scale Optimization (1989), Mathematical Programming B, 45, 3, pp. 503-528.
# run a toy example based on the partial credit model # load a simulated dataset attach(data_sim_pcirt) # generate starting values for the algorithm A0 <- Q D0 <- matrix(1, J, M) D0[,1] <- 0 theta0 <- matrix(rnorm(N*K), N, K) sigma0 <- diag(1, K) # do the confirmatory partial credit model analysis # to enable multicore processing, set parallel = T pcirt_res <- StEM_pcirt(response, Q, A0, D0, theta0, sigma0)
# run a toy example based on the partial credit model # load a simulated dataset attach(data_sim_pcirt) # generate starting values for the algorithm A0 <- Q D0 <- matrix(1, J, M) D0[,1] <- 0 theta0 <- matrix(rnorm(N*K), N, K) sigma0 <- diag(1, K) # do the confirmatory partial credit model analysis # to enable multicore processing, set parallel = T pcirt_res <- StEM_pcirt(response, Q, A0, D0, theta0, sigma0)