This vignette will demonstrate latent variable modeling and confirmatory analysis via the common factor model. Factor analysis is a rudimentary application of OpenMx, and hopefully will serve as an easily understandable introduction to the data structures and workflow patterns necessary to define and fit a model using OpenMx.
OpenMX handles model definition through the MxModel
class, instantiated via the mxModel()
function. Model
definition requires input data for analysis and the model specification
itself. Input data can be either in the form of a set of variable
observations or by having a covariance matrix and associated variable
means. The model specification can either be produced via specifying a
path diagram or, for advanced users, the matrices to define the desired
RAM or LISREL model.
We’ll walk through the construction of a single and multiple factor
model first through specifying the path diagram, and then an alternate
form of model specification manually defining the matrices in a RAM-type
model. Additionally, we will demonstrate how to specify data for
analysis using either raw data or a covariance matrix of manifest
variables. Scripts for a variety of
implementations are located within the /demo/
directory of
the OpenMx GitHub repository and are named in accordance with their
specific implementation (eg. “OneFactorModel_PathRaw.R”,
“TwoFactorModel_MatrixCov.R”, etc.)
The common factor model supposes that observed variables measure or indicate the same latent variable. While there are a number of exploratory approaches that do not assume the number of latent factor(s), this example uses a confirmatory approach that assumes a single factor. A path diagram verions of the common factor model for a set of variables x1 − x6 are given below.
The common factor model can be extended to include multiple latent variables. The model for any person and path diagram of the common factor model for a set of variables x1 − x3 and y1 − y3 are given below.
While 19 parameters are displayed in the equation and path diagram above (six manifest variances, six manifest means, six factor loadings and one factor variance), we must constrain either the factor variance or one factor loading to a constant to identify the model and scale the latent variable. As such, this model contains 18 parameters. The means and covariance matrix for six observed variables contain 27 degrees of freedom, and thus our model contains 9 degrees of freedom.
Our first step to running this model is to include the data to be
analyzed. The MxData
class instantiated through
mxData()
is used for representing data within the
MxModel
.
As mentioned, a raw set of observations on variables can be provided
as data. The data for this example is available from the
myFADataRaw
dataset, which contains 500 observations on
nine observed variables. In accordance with the path diagram above, we
specify the first six variables for analysis in the model.
library(OpenMx)
data(myFADataRaw)
dataRawOneFactor <- mxData( observed=myFADataRaw[,c("x1","x2","x3","x4","x5","x6")], type="raw" )
Alternatively, our data could be specificed via a covariance matrix
with variable means and number of observations. A type argument of
"cov"
is specified in the below example with a covariance
matrix (other inputs being the same).
myFADataCov<-matrix(
c(0.997, 0.642, 0.611, 0.672, 0.637, 0.677,
0.642, 1.025, 0.608, 0.668, 0.643, 0.676,
0.611, 0.608, 0.984, 0.633, 0.657, 0.626,
0.672, 0.668, 0.633, 1.003, 0.676, 0.665,
0.637, 0.643, 0.657, 0.676, 1.028, 0.654,
0.677, 0.676, 0.626, 0.665, 0.654, 1.020),
nrow=6,
dimnames=list(
c("x1","x2","x3","x4","x5","x6"),
c("x1","x2","x3","x4","x5","x6"))
)
myFADataMeans <- c(2.988, 3.011, 2.986, 3.053, 3.016, 3.010)
names(myFADataMeans) <- c("x1","x2","x3","x4","x5","x6")
dataCovOneFactor <- mxData( observed=myFADataCov, type="cov", numObs=500,
mean=myFADataMeans )
The only difference in setting up data for multiple factor analysis
is to label the manifest variables appropriately. The below code reads
in the first six variables from the myFADataRaw
dataset
once again, but this time the labels for yn observed
variables for the second factor, F2 are changed to match
the two factor path diagram above. Variable naming is arbitrary and does
not affect the model structure.
Setting up a covariance data input for multiple factors involves
similar changes from the single factor case. The below code block sets
up a covariance matrix of 9 observed variables, 6 of which are labeled
as xn, and
3 of which are yn. For the
MxData
selection, x1 − x3
and y1 − y3
are used in line with our desired path diagram.
twoFactorCov <- matrix(
c(0.997, 0.642, 0.611, 0.342, 0.299, 0.337,
0.642, 1.025, 0.608, 0.273, 0.282, 0.287,
0.611, 0.608, 0.984, 0.286, 0.287, 0.264,
0.342, 0.273, 0.286, 0.993, 0.472, 0.467,
0.299, 0.282, 0.287, 0.472, 0.978, 0.507,
0.337, 0.287, 0.264, 0.467, 0.507, 1.059),
nrow=6,
dimnames=list(
c("x1", "x2", "x3", "y1", "y2", "y3"),
c("x1", "x2", "x3", "y1", "y2", "y3")),
)
twoFactorMeans <- c(2.988, 3.011, 2.986, 2.955, 2.956, 2.967)
names(twoFactorMeans) <- c("x1", "x2", "x3", "y1", "y2", "y3")
# Prepare Data
# -----------------------------------------------------------------------------
dataCovTwoFactor <- mxData( observed=twoFactorCov, type="cov", numObs=500, means=twoFactorMeans )
One method of model specification is through defining the model via
the existing path diagram. In this instance, our model instantiation
will take as arguments the MxData
defined previously, as
well as lists of manifest variables & latent variables, and an
arbitrary number of MxPath
objects defining the paths
between these variables. First, we will start by defining our manifest
and latent variables in accordance to the path diagram.
Next, we define our various paths between variables. The
mxPath()
function has a number of parameters for
instantiating a similar set of paths in a single call, and all of these
can be grouped together within the mxModel()
call
later.
First off, we define the latent variable variance path. The
from
field without a specified to
value
evaluates as if the paths go back to their origin, as we want with our
variances. The arrows
field of 2
indicates a
double-headed path in this instance. This mxPath()
call
effectively represents one path, the variance on the latent variable
F1, which we
default pre-fitting to 1 and label "varF1"
The
free
field of TRUE
indicates that this value
will be optimized once we fit the model.
Additionally, we need residual variance paths for the observed
variables. These can all be created in one mxPath()
call by
specifying a list in from
, as well as initial values and
labels for these variances.
# residual variances
resVars <- mxPath( from=c("x1","x2","x3","x4","x5","x6"), arrows=2,
free=TRUE, values=c(1,1,1,1,1,1),
labels=c("e1","e2","e3","e4","e5","e6") )
Next come the factor loadings. These are specified as directed paths
(regressions) of the manifest variables on the latent variable
"F1"
. As we have to scale the latent variable, the first
factor loading has been given a fixed value of one by setting the first
elements of the free
and values
arguments to
FALSE
and 1
, respectively. Alternatively, the
latent variable could have been scaled by fixing the factor variance to
1 in the previous mxPath()
function and freely estimating
all factor loadings. The five factor loadings that are freely estimated
are all given starting values of 1
and labels
"l2"
through "l6"
.
# factor loadings
facLoads <- mxPath( from="F1", to=c("x1","x2","x3","x4","x5","x6"), arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE), values=c(1,1,1,1,1,1),
labels =c("l1","l2","l3","l4","l5","l6") )
Lastly, we must specify the mean structure for this model. As there are a total of seven variables in this model (six manifest and one latent), we have the potential for seven means. However, we must constrain at least one mean to a constant value, as there is not sufficient information to yield seven mean and intercept estimates from the six observed means. The six observed variables receive freely estimated intercepts, while the factor mean is fixed to a value of zero in the code below.
# means
means <- mxPath( from="one", to=c("x1","x2","x3","x4","x5","x6","F1"), arrows=1,
free=c(T,T,T,T,T,T,FALSE), values=c(1,1,1,1,1,1,0),
labels =c("meanx1","meanx2","meanx3","meanx4","meanx5","meanx6",NA) )
Finally, we insantiate our unfitted model using
mxModel()
. This model begins with a name (“Common Factor
Model Path Specification”) for the model and a type="RAM"
argument. The name for the model may be omitted, or may be specified in
any other place in the model using the name argument. Including
type="RAM"
allows the mxModel function to interpret the
mxPath functions that follow and turn those paths into the corresponding
RAM specification matrices. The raw data specification is included as
the model data.
We will focus on the changes this model makes to the
mxPath
functions and their relevant inputs. To start, the
last three variables of our manifestVars
argument have
changed from x4, x5, x6
to y1, y2, y3,
and latentVars
now includes a second variable,
F_2
. The mxPath
functions which detail
residual variances and intercepts accommodate the changes in manifest
and latent variables but carry out identical functions to the common
factor model.
manifestVars <- c("x1","x2","x3","y1","y2","y3")
latentVars <- c("F1","F2")
# residual variances
resVars <- mxPath( from=c("x1", "x2", "x3", "y1", "y2", "y3"), arrows=2,
free=TRUE, values=c(1,1,1,1,1,1),
labels=c("e1","e2","e3","e4","e5","e6") )
# means
means <- mxPath( from="one", to=c("x1","x2","x3","y1","y2","y3","F1","F2"),
arrows=1,
free=c(T,T,T,T,T,T,F,F), values=c(1,1,1,1,1,1,0,0),
labels=c("meanx1","meanx2","meanx3",
"meany1","meany2","meany3",NA,NA) )
The remaining mxPath
functions provide some changes to
the model. The latVars
mxPath
function
specifies the variances and covariance of the two latent variables. Like
previous examples, we’ve omitted the to
argument for this
set of two-headed paths. Unlike previous examples, we’ve set the
connect
argument to unique.pairs
, which
creates all unique paths between the variables. As omitting the
to
argument is identical to putting identical variables in
the from
and to
arguments, we are creating all
unique paths from and to our two latent variables. This results in three
paths: from F1 to
F1 (the variance of
F1), from F1 to F2 (the covariance of the
latent variables), and from F2 to F2 (the variance of F2).
# latent variances and covariance
latVars <- mxPath( from=c("F1","F2"), arrows=2, connect="unique.pairs",
free=TRUE, values=c(1,.5,1), labels=c("varF1","cov","varF2") )
You can check your understanding by printing out the paths to the console.
The final two mxPath
functions define the factor
loadings for each of the latent variables. We’ve split these loadings
into two functions, one for each latent variable. The first loading for
each latent variable is fixed to a value of one, just as in the previous
example.
# factor loadings for x variables
facLoadsX <- mxPath( from="F1", to=c("x1","x2","x3"), arrows=1,
free=c(F,T,T), values=c(1,1,1), labels=c("l1","l2","l3") )
# factor loadings for y variables
facLoadsY <- mxPath( from="F2", to=c("y1","y2","y3"), arrows=1,
free=c(F,T,T), values=c(1,1,1), labels=c("l4","l5","l6") )
Finally, we instantiate the mxModel
object. Specifying
the two factor model is virtually identical to the single factor case,
with the only change being our manifest and latent variable labels.
Alternatively to specifying the paths for an MxModel
, we
have the option of manually specifying the matrices in a RAM model. In
this case, MxModel
specification is carried out using
mxMatrix
functions to create matrices. In the following
section, we will go through the necessary setup to do manual model setup
for the same one and two factor examples as path specification.
The A matrix specifies all of the asymmetric paths or regressions in our model. In the common factor model, these parameters are the factor loadings. This matrix is square, and contains as many rows and columns as variables in the model (manifest and latent, typically in that order). Regressions are specified in the A matrix by placing a free parameter in the row of the dependent variable and the column of independent variable.
The common factor model requires that one parameter (typically either
a factor loading or factor variance) be constrained to a constant value.
In our model, we will constrain the first factor loading to a value of
1, and let all other loadings be freely estimated. All factor loadings
have a starting value of one and labels of "l1"
-
"l6"
.
# asymmetric paths
matrA <- mxMatrix( type="Full", nrow=7, ncol=7,
free= c(F,F,F,F,F,F,F,
F,F,F,F,F,F,T,
F,F,F,F,F,F,T,
F,F,F,F,F,F,T,
F,F,F,F,F,F,T,
F,F,F,F,F,F,T,
F,F,F,F,F,F,F),
values=c(0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,0),
labels=c(NA,NA,NA,NA,NA,NA,"l1",
NA,NA,NA,NA,NA,NA,"l2",
NA,NA,NA,NA,NA,NA,"l3",
NA,NA,NA,NA,NA,NA,"l4",
NA,NA,NA,NA,NA,NA,"l5",
NA,NA,NA,NA,NA,NA,"l6",
NA,NA,NA,NA,NA,NA,NA),
byrow=TRUE, name="A" )
The second matrix in a RAM model is the S matrix, which specifies the
symmetric or covariance paths in our model. This matrix
is symmetric and square, and contains as many rows and columns as
variables in the model (manifest and latent, typically in that order).
The symmetric paths in our model consist of six residual variances and
one factor variance. All of these variances are given starting values of
one and labels "e1"
- "e6"
and
"varF1"
.
# symmetric paths
matrS <- mxMatrix( type="Symm", nrow=7, ncol=7,
free= c(T,F,F,F,F,F,F,
F,T,F,F,F,F,F,
F,F,T,F,F,F,F,
F,F,F,T,F,F,F,
F,F,F,F,T,F,F,
F,F,F,F,F,T,F,
F,F,F,F,F,F,T),
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0,
0,0,0,0,0,0,1),
labels=c("e1",NA, NA, NA, NA, NA, NA,
NA, "e2", NA, NA, NA, NA, NA,
NA, NA, "e3", NA, NA, NA, NA,
NA, NA, NA, "e4", NA, NA, NA,
NA, NA, NA, NA, "e5", NA, NA,
NA, NA, NA, NA, NA, "e6", NA,
NA, NA, NA, NA, NA, NA, "varF1"),
byrow=TRUE, name="S" )
The third matrix in our RAM model is the F or filter matrix. Our data contains six observed variables, but the A and S matrices contain seven rows and columns. For our model to define the covariances present in our data, we must have some way of projecting the relationships defined in the A and S matrices onto our data. The F matrix filters the latent variables out of the expected covariance matrix, and can also be used to reorder variables.
The F matrix will always
contain the same number of rows as manifest variables and columns as
total (manifest and latent) variables. If the manifest variables in the
A and S matrices precede the latent
variables and are in the same order as the data, then the F matrix will be the horizontal
adhesion of an identity matrix and a zero matrix. This matrix never
contains free parameters, and is made with the mxMatrix
function below.
# filter matrix
matrF <- mxMatrix( type="Full", nrow=6, ncol=7,
free=FALSE,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0),
byrow=TRUE, name="F" )
The last matrix of our model is the M matrix, which defines the
means and intercepts for our model. This matrix
describes all of the regressions on the constant in a path model, or the
means conditional on the means of exogenous variables. This matrix
contains a single row, and one column for every manifest and latent
variable in the model. In our model, the latent variable has a
constrained mean of zero, while the manifest variables have freely
estimated means, labeled "meanx1"
through
"meanx6"
.
# means
matrM <- mxMatrix( type="Full", nrow=1, ncol=7,
free=c(T,T,T,T,T,T,F),
values=c(1,1,1,1,1,1,0),
labels=c("meanx1","meanx2","meanx3",
"meanx4","meanx5","meanx6",NA),
name="M" )
The final parts of this model are the expectation function and the fit function. The expectation defines how the specified matrices combine to create the expected covariance matrix of the data. The fit defines how the expectation is compared to the data to create a single scalar number that is minimized. In a RAM specified model, the expected covariance matrix is defined as:
The expected means are defined as:
The free parameters in the model can then be estimated using maximum
likelihood for covariance and means data, and full information maximum
likelihood for raw data. Although users may define their own expected
covariance matrices using mxExpectationNormal
and other
functions in OpenMx, the mxExpectationRAM
function computes
the expected covariance and means matrices when the A, S, F and M matrices are specified. The M matrix is required both for raw
data and for covariance data that includes a means vector. The
mxExpectationRAM
function takes four arguments, which are
the names of the A, S, F and M matrices in your model. The
mxFitFunctionML
yields maximum likelihood estimates of
structural equation models. It uses full information maximum likelihood
when the data are raw.
exp <- mxExpectationRAM("A","S","F","M",
dimnames=c(manifestVars, latentVars))
funML <- mxFitFunctionML()
We now have all of the prerequisite structures for defining our
model. As opposed to the path specification, the model now includes data
of observed variables (i.e., raw data or manifest variable covariances
in MxData
form), model matrices, an expectation function,
and a fit function. As such, the model has all the required elements to
define the expected covariance matrix and estimate parameters.
To account for the second latent variable F2, we must modify the constituent RAM matricies.
The A matrix, shown below,
is used to specify the regressions of the manifest variables on the
factors. The first three manifest variables
("x1"
-"x3"
) are regressed on
"F1"
, and the second three manifest variables
("y1"
-"y3"
) are regressed on
"F2"
. We must again constrain the model to identify and
scale the latent variables, which we do by constraining the first
loading for each latent variable to a value of one.
# asymmetric paths
matrA <- mxMatrix( type="Full", nrow=8, ncol=8,
free= c(F,F,F,F,F,F,F,F,
F,F,F,F,F,F,T,F,
F,F,F,F,F,F,T,F,
F,F,F,F,F,F,F,F,
F,F,F,F,F,F,F,T,
F,F,F,F,F,F,F,T,
F,F,F,F,F,F,F,F,
F,F,F,F,F,F,F,F),
values=c(0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0),
labels=c(NA,NA,NA,NA,NA,NA,"l1",NA,
NA,NA,NA,NA,NA,NA,"l2",NA,
NA,NA,NA,NA,NA,NA,"l3",NA,
NA,NA,NA,NA,NA,NA,NA,"l4",
NA,NA,NA,NA,NA,NA,NA,"l5",
NA,NA,NA,NA,NA,NA,NA,"l6",
NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,NA,NA,NA,NA),
byrow=TRUE, name="A" )
The S matrix has an additional row and column, and two additional parameters. For the two factor model, we must add a variance term for the second latent variable and a covariance between the two latent variables.
# symmetric paths
matrS <- mxMatrix( type="Symm", nrow=8, ncol=8,
free= c(T,F,F,F,F,F,F,F,
F,T,F,F,F,F,F,F,
F,F,T,F,F,F,F,F,
F,F,F,T,F,F,F,F,
F,F,F,F,T,F,F,F,
F,F,F,F,F,T,F,F,
F,F,F,F,F,F,T,T,
F,F,F,F,F,F,T,T),
values=c(1,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,
0,0,0,1,0,0,0,0,
0,0,0,0,1,0,0,0,
0,0,0,0,0,1,0,0,
0,0,0,0,0,0,1,.5,
0,0,0,0,0,0,.5,1),
labels=c("e1",NA, NA, NA, NA, NA, NA, NA,
NA, "e2", NA, NA, NA, NA, NA, NA,
NA, NA, "e3", NA, NA, NA, NA, NA,
NA, NA, NA, "e4", NA, NA, NA, NA,
NA, NA, NA, NA, "e5", NA, NA, NA,
NA, NA, NA, NA, NA, "e6", NA, NA,
NA, NA, NA, NA, NA, NA,"varF1","cov",
NA, NA, NA, NA, NA, NA,"cov","varF2"),
byrow=TRUE, name="S" )
The F and M matrices contain only minor changes. The F matrix is now of order 6x8, but the additional column is simply a column of zeros. The M matrix contains an additional column (with only a single row), which contains the mean of the second latent variable. As this model does not contain a parameter for that latent variable, this mean is constrained to zero.
matrF <- mxMatrix( type="Full", nrow=6, ncol=8,
free=FALSE,
values=c(1,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,
0,0,0,1,0,0,0,0,
0,0,0,0,1,0,0,0,
0,0,0,0,0,1,0,0),
byrow=TRUE, name="F" )
matrM <- mxMatrix( type="Full", nrow=1, ncol=8,
free=c(T,T,T,T,T,T,F,F),
values=c(1,1,1,1,1,1,0,0),
labels=c("meanx1","meanx2","meanx3",
"meanx4","meanx5","meanx6",NA,NA),
name="M" )
Finally, we specify our expectation and fit functions as well as
instantiating the MxModel
itself. These are completely
identical to the one factor model, only with the data variable changed
from dataRawOneFactor
to dataRawTwoFactor
.
Our model can now be run using the mxRun
function.
mxRun
returns an MxModel
structured on its
input MxModel
with the free parameters fitted. This fitting
does not modify the input model, and as such mxRun
must
assign its output to a new object.
summary(twoFactorMatrixFit)
will provide us with the
estimation and standard errors on all free paramaters, along with
several model statistics and criteria for assessing model fit.
If more detailed output is desired, accessing the
$output
field on an MxModel
object will
provide more detailed information including the fit matricies in raw
form, the derived Hessian matrix, and data about execution environment
and timing.