The Lavaan Model Syntax
Description
The lavaan model syntax describes a latent variable model. The
function lavaanify
turns it into a table that represents the full
model as specified by the user. We refer to this table as the parameter table.
Usage
lavaanify(model = NULL, meanstructure = FALSE, int.ov.free = FALSE,
int.lv.free = FALSE, marker.int.zero = FALSE,
orthogonal = FALSE, orthogonal.y = FALSE,
orthogonal.x = FALSE, orthogonal.efa = FALSE, std.lv = FALSE,
correlation = FALSE, effect.coding = "", conditional.x = FALSE,
fixed.x = FALSE, parameterization = "delta", constraints = NULL,
ceq.simple = FALSE, auto = FALSE, model.type = "sem",
auto.fix.first = FALSE, auto.fix.single = FALSE, auto.var = FALSE,
auto.cov.lv.x = FALSE, auto.cov.y = FALSE, auto.th = FALSE,
auto.delta = FALSE, auto.efa = FALSE,
varTable = NULL, ngroups = 1L, nthresholds = NULL,
group.equal = NULL, group.partial = NULL, group.w.free = FALSE,
debug = FALSE, warn = TRUE, as.data.frame. = TRUE)
lavParTable(model = NULL, meanstructure = FALSE, int.ov.free = FALSE,
int.lv.free = FALSE, marker.int.zero = FALSE,
orthogonal = FALSE, orthogonal.y = FALSE,
orthogonal.x = FALSE, orthogonal.efa = FALSE, std.lv = FALSE,
correlation = FALSE, effect.coding = "", conditional.x = FALSE,
fixed.x = FALSE, parameterization = "delta", constraints = NULL,
ceq.simple = FALSE, auto = FALSE, model.type = "sem",
auto.fix.first = FALSE, auto.fix.single = FALSE, auto.var = FALSE,
auto.cov.lv.x = FALSE, auto.cov.y = FALSE, auto.th = FALSE,
auto.delta = FALSE, auto.efa = FALSE,
varTable = NULL, ngroups = 1L, nthresholds = NULL,
group.equal = NULL, group.partial = NULL, group.w.free = FALSE,
debug = FALSE, warn = TRUE, as.data.frame. = TRUE)
lavParseModelString(model.syntax = '', as.data.frame. = FALSE,
parser = "old", warn = TRUE, debug = FALSE)
Arguments
model 
A description of the userspecified model. Typically, the model
is described using the lavaan model syntax; see details for more
information. Alternatively, a parameter table (e.g., the output of
lavParseModelString is also accepted.

model.syntax 
The model syntax specifying the model. Must be a literal
string.

meanstructure 
If TRUE , intercepts/means will be added to
the model both for both observed and latent variables.

int.ov.free 
If FALSE , the intercepts of the observed variables
are fixed to zero.

int.lv.free 
If FALSE , the intercepts of the latent variables
are fixed to zero.

marker.int.zero 
Logical. Only relevant if the metric of each latent
variable is set by fixing the first factor loading to unity.
If TRUE , it implies meanstructure = TRUE and
std.lv = FALSE , and it fixes the intercepts of the marker
indicators to zero, while freeing the means/intercepts of the latent
variables. Only works correcly for single group, single level models.

orthogonal 
If TRUE , all covariances among
latent variables are set to zero.

orthogonal.y 
If TRUE , all covariances among
endogenous latent variables only are set to zero.

orthogonal.x 
If TRUE , all covariances among
exogenous latent variables only are set to zero.

orthogonal.efa 
If TRUE , all covariances among
latent variables involved in rotation only are set to zero.

std.lv 
If TRUE , the metric of each latent variable is
determined by fixing their variances to 1.0. If FALSE , the metric
of each latent variable is determined by fixing the factor loading of the
first indicator to 1.0. If there are multiple
groups, std.lv = TRUE and "loadings" is included in
the group.label argument, then only the latent variances i
of the first group will be fixed to 1.0, while the latent
variances of other groups are set free.

correlation 
If TRUE , a correlation structure is fitted. For
continuous data, this implies that the (residual) variances are no
longer parameters of the model.

effect.coding 
Can be logical or character string. If
logical and TRUE , this implies
effect.coding = c("loadings", "intercepts") . If logical and
FALSE , it is set equal to the empty string.
If "loadings" is included, equality
constraints are used so that the average of the factor loadings (per
latent variable) equals 1. Note that this should not be used
together with std.lv = TRUE . If "intercepts" is
included, equality constraints are used so that the sum of the
intercepts (belonging to the indicators of a single latent variable)
equals zero.
As a result, the latent mean will be freely estimated and usually
equal the average of the means of the involved indicators.

conditional.x 
If TRUE , we set up the model conditional on
the exogenous ‘x’ covariates; the modelimplied sample statistics
only include the nonx variables. If FALSE , the exogenous ‘x’
variables are modeled jointly with the other variables, and the
modelimplied statistics refect both sets of variables.

fixed.x 
If TRUE , the exogenous ‘x’ covariates are considered
fixed variables and the means, variances and covariances of these variables
are fixed to their sample values. If FALSE , they are considered
random, and the means, variances and covariances are free parameters.

parameterization 
Currently only used if data is categorical. If
"delta" , the delta parameterization is used. If "theta" ,
the theta parameterization is used.

constraints 
Additional (in)equality constraints. See details for
more information.

ceq.simple 
If TRUE , and no other general constraints are
used in the model, simple equality constraints are represented in the
parameter table as duplicated free parameters (instead of extra rows
with op = "==" ).

auto 
If TRUE , the default values are used for the auto.*
arguments, depending on the value of model.type .

model.type 
Either "sem" or "growth" ; only used if
auto=TRUE .

auto.fix.first 
If TRUE , the factor loading of the first indicator
is set to 1.0 for every latent variable.

auto.fix.single 
If TRUE , the residual variance (if included)
of an observed indicator is set to zero if it is the only indicator of a
latent variable.

auto.var 
If TRUE , the (residual) variances of both observed
and latent variables are set free.

auto.cov.lv.x 
If TRUE , the covariances of exogenous latent
variables are included in the model and set free.

auto.cov.y 
If TRUE , the covariances of dependent variables
(both observed and latent) are included in the model and set free.

auto.th 
If TRUE , thresholds for limited dependent variables
are included in the model and set free.

auto.delta 
If TRUE , response scaling parameters for limited
dependent variables are included in the model and set free.

auto.efa 
If TRUE , the necessary constraints are
imposed to make the (unrotated) exploratory factor analysis blocks
identifiable: for each block, factor variances are set to 1, factor
covariances are constrained to be zero, and factor loadings are
constrained to follow an echelon pattern.

varTable 
The variable table containing information about the
observed variables in the model.

ngroups 
The number of (independent) groups.

nthresholds 
Either a single integer or a named vector of integers.
If nthresholds is a single integer, all endogenous
variables are assumed to be ordered with nthresholds indicating
the number of thresholds needed in the model. If nthresholds is a
named vector, it indicates the number of thresholds for these ordered
variables only. This argument should not be used in combination with
varTable.

group.equal 
A vector of character strings. Only used in
a multiple group analysis. Can be one or more of the following:
"loadings" , "intercepts" ,
"means" , "regressions" , "residuals" or
"covariances" , specifying the pattern of equality
constraints across multiple groups. When (in the model syntax) a vector
of labels is used as a modifier for a certain parameter, this will
override the group.equal setting if it applies to this parameter.
See also the Multiple groups section below for using modifiers in multiple
groups.

group.partial 
A vector of character strings containing the labels
of the parameters which should be free in all groups (thereby
overriding the group.equal argument for some specific parameters).

group.w.free 
Logical. If TRUE , the group frequencies are
considered to be free parameters in the model. In this case, a
Poisson model is fitted to estimate the group frequencies. If
FALSE (the default), the group frequencies are fixed to their
observed values.

as.data.frame. 
If TRUE , return the list of model parameters
as a data.frame .

parser 
Character. If "old" , use the original/classic parser.
If "new" , use the new/ldw parser. The default is "new" .

warn 
If TRUE , some (possibly harmless) warnings are printed
out.

debug 
If TRUE , debugging information is printed out.

Details
The model syntax consists of one or more formulalike expressions, each one
describing a specific part of the model. The model syntax can be read from
a file (using readLines
), or can be specified as a literal
string enclosed by single quotes as in the example below.
myModel < '
# 1. latent variable definitions
f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6
f3 =~ y7 + y8 +
y9 + y10
f4 =~ y11 + y12 + y13
! this is also a comment
# 2. regressions
f1 ~ f3 + f4
f2 ~ f4
y1 + y2 ~ x1 + x2 + x3
# 3. (co)variances
y1 ~~ y1
y2 ~~ y4 + y5
f1 ~~ f2
# 4. intercepts
f1 ~ 1; y5 ~ 1
# 5. thresholds
y11  t1 + t2 + t3
y12  t1
y13  t1 + t2
# 6. scaling factors
y11 ~*~ y11
y12 ~*~ y12
y13 ~*~ y13
# 7. formative factors
f5 <~ z1 + z2 + z3 + z4
'
Blank lines and comments can be used in between the formulas, and formulas can
be split over multiple lines. Both the sharp (#) and the exclamation (!)
characters can be used to start a comment. Multiple formulas can be placed
on a single line if they are separated by a semicolon (;).
There can be seven types of formulalike expressions in the model syntax:

Latent variable definitions: The "=~"
operator can be
used to define (continuous) latent variables. The name of the latent
variable is on the left of the "=~"
operator, while the terms
on the right, separated by "+"
operators, are the indicators
of the latent variable.
The operator "=~"
can be read as “is manifested by”.

Regressions: The "~"
operator specifies a regression.
The dependent variable is on the left of a "~"
operator and the
independent variables, separated by "+"
operators, are on the right.
These regression formulas are similar to the way ordinary linear regression
formulas are used in R, but they may include latent variables. Interaction
terms are currently not supported.

Variancecovariances: The "~~"
(‘double tilde’) operator specifies
(residual) variances of an observed or latent variable, or a set of
covariances between one variable, and several other variables (either
observed or latent). Several variables, separated by "+"
operators can appear on the right. This way, several pairwise
(co)variances involving the same lefthand variable can be expressed in a
single expression. The distinction between variances and residual variances
is made automatically.

Intercepts: A special case of a regression formula can be used to
specify an intercept (or a mean) of either an observed or a latent variable.
The variable name is on the left of a "~"
operator. On the right is
only the number "1"
representing the intercept. Including an intercept
formula in the model automatically implies meanstructure = TRUE
. The
distinction between intercepts and means is made automatically.

Thresholds: The ""
operator can be used to define the
thresholds of categorical endogenous variables (on the left hand side
of the operator). By convention, the
thresholds (on the right hand sided, separated by the "+"
operator,
are named "t1"
, "t2"
, etcetera.

Scaling factors: The "~*~"
operator defines a scale factor.
The variable name on the left hand side must be the same as the variable
name on the right hand side. Scale factors are used in the Delta
parameterization, in a multiple group analysis when factor indicators
are categorical.

Formative factors: The "<~"
operator can be used to define
a formative factor (on the right hand side of the operator), in a
similar way to how a reflexive factor is defined (using the "=~"
operator). This is just syntax sugar to define a phantom latent
variable (equivalent to using "f =~ 0"
). And in addition, the
(residual) variance of the formative factor is fixed to zero.
There are 4 additional operators, also with left and righthand sides, that can
be included in model syntax. Three of them are used to specify (in)equality
constraints on estimated parameters (==
, >
, and <
), and
those are demonstrated in a later section about
(In)equality constraints.
The final additional operator (:=
) can be used to define “new” parameters
that are functions of one or more other estimated parameters. The :=
operator is demonstrated in a section about Userdefined parameters.
Usually, only a single variable name appears on the left side of an
operator. However, if multiple variable names are specified,
separated by the "+"
operator, the formula is repeated for each
element on the left side (as for example in the third regression
formula in the example above). The only exception are scaling factors, where
only a single element is allowed on the left hand side.
In the righthand side of these formulalike expressions, each element can be
modified (using the "*"
operator) by either a numeric constant,
an expression resulting in a numeric constant, an expression resulting
in a character vector, or one
of three special functions: start()
, label()
and equal()
.
This provides the user with a mechanism to fix parameters, to provide
alternative starting values, to label the parameters, and to define equality
constraints among model parameters. All "*"
expressions are
referred to as modifiers. They are explained in more detail in the
following sections.
Fixing parameters
It is often desirable to fix a model parameter that is otherwise (by default)
free. Any parameter in a model can be fixed by using a modifier resulting
in a numerical constaint. Here are some examples:

Fixing the regression coefficient of the predictor
x2
:
y ~ x1 + 2.4*x2 + x3

Specifying an orthogonal (zero) covariance between two latent
variables:
f1 ~~ 0*f2

Specifying an intercept and a linear slope in a growth
model:
i =~ 1*y11 + 1*y12 + 1*y13 + 1*y14
s =~ 0*y11 + 1*y12 + 2*y13 + 3*y14
Instead of a numeric constant, one can use a mathematical function that returns
a numeric constant, for example sqrt(10)
. Multiplying with NA
will force the corresponding parameter to be free.
Additionally, the ==
operator can be used to set a labeled parameter
equal to a specific numeric value. This will be demonstrated in the section below
about (In)equality constraints.
Starting values
Userprovided starting values can be given by using the special function
start()
, containing a numeric constant. For example:
y ~ x1 + start(1.0)*x2 + x3
Note that if a starting value is provided, the parameter is not
automatically considered to be free.
Parameter labels and equality constraints
Each free parameter in a model is automatically given a name (or label).
The name given to a model
parameter consists of three parts, coerced to a single character vector.
The first part is the name of the variable in the lefthand side of the
formula where the parameter was
implied. The middle part is based on the special ‘operator’ used in the
formula. This can be either one of "=~"
, "~"
or "~~"
. The
third part is the name of the variable in the righthand side of the formula
where the parameter was implied, or "1"
if it is an intercept. The three
parts are pasted together in a single string. For example, the name of the
fixed regression coefficient in the regression formula
y ~ x1 + 2.4*x2 + x3
is the string "y~x2"
.
The name of the parameter
corresponding to the covariance between two latent variables in the
formula f1 ~~ f2
is the string "f1~~f2"
.
Although this automatic labeling of parameters is convenient, the user may
specify its own labels for specific parameters simply by premultiplying
the corresponding term (on the right hand side of the operator only) by
a character string (starting with a letter).
For example, in the formula f1 =~ x1 + x2 + mylabel*x3
, the parameter
corresponding with the factor loading of
x3
will be named "mylabel"
.
An alternative way to specify the label is as follows:
f1 =~ x1 + x2 + label("mylabel")*x3
,
where the label is the argument of special function label()
;
this can be useful if the label contains a space, or an operator (like "~").
To constrain a parameter
to be equal to another target parameter, there are two ways. If you
have specified your own labels, you can use the fact that
equal labels imply equal parameter values.
If you rely on automatic parameter labels, you
can use the special function equal()
. The argument of
equal()
is the (automatic or userspecified) name of the target
parameter. For example, in the confirmatory factor analysis example below, the
intercepts of the three indicators of each latent variable are constrained to
be equal to each other. For the first three, we have used the default
names. For the last three, we have provided a custom label for the
y2a
intercept.
model < '
# two latent variables with fixed loadings
f1 =~ 1*y1a + 1*y1b + 1*y1c
f2 =~ 1*y2a + 1*y2b + 1*y2c
# intercepts constrained to be equal
# using the default names
y1a ~ 1
y1b ~ equal("y1a~1") * 1
y1c ~ equal("y1a~1") * 1
# intercepts constrained to be equal
# using a custom label
y2a ~ int2*1
y2b ~ int2*1
y2c ~ int2*1
'
Multiple groups
In a multiple group analysis, modifiers that contain a single element
should be replaced by a vector, having the same length as the number
of groups. If you provide a single element, it will be recycled
for all the groups. This may be dangerous, in particular when the modifier
is a label. In that case, the (same) label is copied across all groups,
and this would imply an equality constraint across groups.
Therefore, when using modifiers in a multiple group setting, it is always
safer (and cleaner) to specify the same number of
elements as the number of groups. Consider this example with two groups:
HS.model < ' visual =~ x1 + 0.5*x2 + c(0.6, 0.8)*x3
textual =~ x4 + start(c(1.2, 0.6))*x5 + x6
speed =~ x7 + x8 + c(x9.group1, x9.group2)*x9 '
In this example, the factor loading of the ‘x2’ indicator is fixed to the
value 0.5 for both groups. However, the factor loadings of the ‘x3’ indicator
are fixed to 0.6 and 0.8 for group 1 and group 2 respectively. The same
logic is used for all modifiers. Note that character vectors can contain
unquoted strings.
Multiple modifiers
In the model syntax, you can specify a variable more than once on the right hand
side of an operator; therefore, several ‘modifiers’ can be applied
simultaneously; for example, if you want to fix the value of a parameter and
also label that parameter, you can use something like:
f1 =~ x1 + x2 + 4*x3 + x3.loading*x3
(In)equality constraints
The ==
operator can be used either to fix a parameter to a specific value,
or to set an estimated parameter equal to another parameter. Adapting the
example in the Parameter labels and equality constraints section, we
could have used different labels for the second factor's intercepts:
y2a ~ int1*1
y2b ~ int2*1
y2c ~ int3*1
Then, we could fix the first intercept to zero by including in the syntax an
operation that indicates the parameter's label equals that value:
int1 == 0
Whereas we could still estimate the other two intercepts under an equality
constraint by setting their different labels equal to each other:
int2 == int3
Optimization can be less efficient when constraining parameters this way (see
the documentation linked under See also for more information). But the
flexibility might be advantageous. For example, the constraints could be
specified in a separate characterstring object, which can be passed to the
lavaan(..., constraints=)
argument, enabling users to compare results
with(out) the constraints.
Inequality constraints work much the same way, using the <
or >
operator indicate which estimated parameter is hypothesized to be greater/less
than either a specific value or another estimated parameter. For example, a
variance can be constrained to be nonnegative:
y1a ~~ var1a*y1a
## hypothesized constraint:
var1a > 0
Or the factor loading of a particular indicator might be expected to exceed
other indicators' loadings:
f1 =~ L1*y1a + L2*y1b + L3*y1c
## hypothesized constraints:
L1 > L2
L3 < L1
Userdefined parameters
Functions of parameters can be useful to test particular hypotheses. Following
from the Multiple groups
example, we might be interested in which group's
factor loading is larger (i.e., an estimate of differential item functioning
(DIF) when the latent scales are linked by anchor items with equal loadings).
speed =~ c(L7, L7)*x7 + c(L8, L8)*x8 + c(L9.group1, L9.group2)*x9 '
## userdefined parameter:
DIF_L9 := L9.group1  L9.group2
Note that this hypothesis is easily tested without a userdefined parameter by
using the lavTestWald()
function. However, a userdefined parameter
additionally provides an estimate of the parameter being tested.
Userdefined parameters are particularly useful for specifying indirect effects
in models of mediation. For example:
model < ' # direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# user defined parameters:
# indirect effect (a*b)
ab := a*b
# total effect (defined using another userdefined parameter)
total := ab + c
'
References
Rosseel, Y. (2012). lavaan
: An R package for structural equation
modeling. Journal of Statistical Software, 48(2), 1–36.
doi:10.18637/jss.v048.i02