We will first simulate data
library(OpenMx)
manifests <- c(paste0('x',1:8), paste0('y',1:5))
set.seed(12)
sim.mod <- mxModel(
"sim", type="RAM", manifestVars = manifests, latentVars = 'f1',
mxPath(paste0('x',1:8), 'f1', values=c(0,0,0,0,0,.2,.5,.8),
labels=paste0('c',1:8)),
mxPath('f1', paste0('y',1:5), values=1),
mxPath(paste0('x',1:8), arrows=2, connect = "unique.bivariate",
values=rnorm(8*7/2, sd=.2)),
mxPath(paste0('x',1:8), arrows=2, values=1),
mxPath(paste0('y',1:5), arrows=2, values=1),
mxPath('f1', arrows=2, values=1, free=FALSE),
mxPath('one', manifests),
mxPath('one', 'f1', free=FALSE))
dat.sim = mxGenerateData(sim.mod, nrows = 100)
And then run the model so we can better see the structure.
One of the difficult pieces in using regularization is that the penalty has to be calibrated for each particular problem. In running this code, I first tested the default, but this was too small given that there were some parameters > 0.4. After plotting this initial run, I saw that some parameters weren’t penalized enough, therefore, I increased the penalty step to 1.2 and with 41 different values. This tested a model that had most estimates as zero. In some cases, it isn’t necessary to test a sequence of penalties that would set “large” parameters to zero, as either the model could fail to converge then, or there is not uncertainty about those parameters inclusion.
regFit <- mxPenaltySearch(mxModel(
fit, mxPenaltyLASSO(paste0('c',1:8),"lasso",lambda.step=1.2),
mxMatrix('Full',1,1,free=TRUE,values=0, labels="lambda")))
A status code 6 warning is issued because the parameters affected by regularization have relatively large gradients.
The gradient check is only done for the model with the best EBIC. Next, we can get a summary of the models tested to check if there were any optimization failures.
Looks good. We can also look at summaries of some of the results,
library(reshape2)
library(ggplot2)
est <- detail[,c(paste0('c',1:8), 'lambda')]
ggplot(melt(est, id.vars = 'lambda')) +
geom_line(aes(x=lambda, y=value, color=variable)) +
geom_vline(aes(xintercept=coef(regFit)['lambda']),
linetype="dashed", alpha=.5)
Here we can see that we used a large enough penalty to set most parameter estimates to zero. The best fitting model is indicated by the dashed lines.
OpenMx uses EBIC to choose a final model. See what the best fitting parameter estimates are.
In this final model, we set the regression paths for x1, x2, x3, x4, x5, and x6 to zero. We also correctly identify x7 and x8 as true paths. Compare these results with the maximum likelihood estimates.