Lavaan Package in RStudio
Lavaan Package in RStudio
RStudio is a free, integrated development environment for the R programming language that provides
a range of powerful, user-friendly tools for data analysis and visualization. In addition to the core R
language, RStudio supports a vast array of packages that extend its functionality and allow users to
perform a variety of tasks, from data manipulation and cleaning to advanced statistical modelling and
machine learning. These packages can be easily installed and loaded within RStudio, and often include
detailed documentation and examples to help users get started. In this tutorial, the lavaan package of
RStudio, developed by prof. Yves Rosseel of Ghent University, will be elaborated. Lavaan is a powerful
RStudio package that can be used for specifying and estimating complex models, including path analysis,
confirmatory factor analysis, and structural equation modelling. Lavaan also offers a range of advanced
features, such as multiple-group analysis, mediation and moderation analysis, and latent growth curve
modelling. The package supports various estimation methods, including maximum likelihood, robust
maximum likelihood, and weighted least squares, and provides comprehensive output summaries and
model fit indices to assess the adequacy of the model. Lavaan is an indispensable tool for researchers
and practitioners in fields such as psychology, education, biological and social sciences. To use lavaan,
you must have a recent version (4.0.0 or higher) of R installed. You can download the latest R version
from this page: http://cran.r-project.org/. RStudio can be downloaded here: https://posit.co/.
This tutorial will mainly focus on confirmatory factor analysis, structural equation modelling, multiple
group analysis, and data visualization. If you want to learn more about lavaan and its other features,
we refer to the lavaan tutorial of prof. Yves Rosseel via https://lavaan.ugent.be/tutorial/index.html, on
which the examples in this document are based. However, this tutorial will go more into detail with
regard to model specification, model identification, parameter estimation, model evaluation (including
model fit and model validity), model modification, assessing measurement invariance, and calculating
modification indices. An overview of the six chapters in this document can be found below.
Index
1. Introduction .............................................................................................................................................................................................. 2
2. Model syntax of the lavaan package ................................................................................................................................................... 5
3. Functions and concepts ......................................................................................................................................................................... 6
4. Confirmatory factor analysis ................................................................................................................................................................ 7
Five steps in confirmatory factor analysis ................................................................................................................................ 7
Multiple group analysis ................................................................................................................................................................. 19
Categorical data.............................................................................................................................................................................. 25
Visualizing the model .................................................................................................................................................................... 26
5. Structural equation modelling ........................................................................................................................................................... 27
Five steps in structural equation modeling ............................................................................................................................ 27
Multiple group analysis ................................................................................................................................................................. 32
Visualizing the model .................................................................................................................................................................... 37
6. Sources ..................................................................................................................................................................................................... 38
1
1. Introduction
Before introducing the lavaan package of RStudio, we need to understand the difference(s) between
latent and observable variables. Observable variables, such as ‘carbon dioxide emissions [ppm]’, age,
‘concentration [g/L]’, Likert scale responses in consumer questionnaires, etc., can be directly observed
or measured and are stored in a structured way in a dataset. In contrast, latent variables are variables
that can only be inferred or measured indirectly from observable variables through a mathematical
model. Examples of latent variables are psychological traits such as ‘anxiety’ and ‘risk perceptions’, but
also ‘climate change’ is a good example. Latent variables are therefore never present in a dataset.
In this tutorial, latent variables will be measured and causal relationships between latent and observable
variables will be investigated by using three closely related techniques, i.e. confirmatory factor analysis
(CFA), path analysis (PA), and structural equation modelling (SEM). Note that these techniques are
not suitable as exploratory tools. If you want to use them, you must first develop a hypothetical model
that includes underlying (i.e. not measurable) latent variables, and/or that captures important causal
relationships between latent and observable variables. Only after designing this model and collecting
the data you need to validate it, you can use CFA, PA or SEM to effectively validate (or amend) your
model (if needed). If you don’t have a model, you will need to apply techniques that are developed for
the exploration of latent variable problems (e.g. Exploratory Factor Analysis or EFA).
CFA is most often applied to measure latent variables based on correlated variations in a dataset (such
as causal relationships). CFA is also used to reduce data dimensions, standardize the scale of multiple
observable variables, and account for correlations inherent in a dataset. PA, on the other hand, aims
to find causal relationships amongst latent and observable variables by creating a path diagram. Lastly,
SEM combines CFA and PA and consists of two models; a measurement model and a structural model.
The measurement model measures the latent variables, while the structural model is used to test all
hypothetical causal dependencies between latent and observable variables, based on PA.
The measurement model is essentially a linear regression model. For a single indicator y1 (note that
indicator is used as a synonym for observable variable), this model can be written as:
where τ1 is the intercept of the indicator, λ1 is the loading of the latent variable η on the indicator y1
(loadings are also called regression weights or regression coefficients), and ε1 is the residual of the in-
dicator. However, a latent variable should always be inferred or measured by more than one indicator
(otherwise, you don’t need a latent variable and you can just use the observable variable). For three
indicators and one latent variable, the measurement model becomes:
(𝜂)
where yi are the indicators or observable variables, τi are the indicator intercepts, λi are the loadings,
interpreted as the correlation of the indicators with the latent variable η, and εi are the residuals of
the model, i.e. what’s left over after accounting for the latent variable, or thus, what the latent variable
does not explain. The index refers to the indicator. For example, τ1 refers to the intercept of the first
indicator, λ2 is the loading of the latent variable on the second indicator, etc.
2
In the structural model, we are interested in identifying causal relationships amongst the observable
and latent variables. To describe these causal relationships, the model-implied covariance matrix Σ(Θ)
is typically used. Note that this is not the same matrix as the observed population variance-covariance
matrix Σ, which only comes from the data. The formula for the model-implied covariance matrix is:
where Λ is the loading matrix (consisting of the same loadings λ as the measurement model), Ψ is the
variance-covariance matrix of the latent variables (if there is only one latent variable, Ψ is a scalar Ψii),
and Θε is the variance-covariance matrix of the residuals. The dimensions of Σ(Θ) and Σ are always
the same. For three indicators and one latent variable, Σ(Θ) can be written as:
To simplify the interpretation of the measurement model and the structural model, a path diagram
can be used. The table below provides the typically used symbols in a path diagram.
Residual (ε)
Indicator intercept (τ) or latent mean, which is the intercept of a latent variable (see later)
Path between a latent and observable variable, i.e. a loading λ, or path from an intercept to a variable
For example, in the figure below, the diagram on the left depicts the regression of a latent variable on
an indicator (essentially a measurement model), while the diagram on the right depicts the variance of
a latent variable (and not the covariance, as the covariance of a variable with itself is the variance).
In the path diagram below, all measurement model parameters are coloured in green and all structural
model parameters are coloured in blue. Note that the loadings λ are shared between these models
but are coloured in green here for simplicity, and that only variances (and no covariances) are shown.
3
We will always assume that the residuals are independent. This implies that the variance-covariance
matrix of the residuals Θε is always a diagonal matrix. The path diagram above (with three indicators
and one latent variable) can thus be translated to the following measurement and structural model:
Sometimes, the terminology of exogenous and endogenous variable is used. An exogenous variable is
an independent variable (latent or observable) that explains an endogenous, dependent variable (latent
or observable), and from which a causal path originates. This implies that:
▪ A structural model always specifies causal relationships from exogenous to endogenous variables.
▪ Exogenous latent or observable variables influence, either directly or indirectly, endogenous latent
or observable variables.
For example, in the path diagram above, all observable variables (y) are endogenous, while the latent
variable (η) is exogenous. However, this will not always be the case. The path diagram below gives an
overview of the most common types of variables and relationships in CFA, PA and SEM.
4
2. Model syntax of the lavaan package
The name lavaan refers to latent variable analysis. At the heart of the lavaan package is the ‘model
syntax’, which is a description of the model to be estimated. Four formula types that are especially
important are summarized in the table below.
A regression formula always has the following form: d1 + d2 ~ i1 + i2 + i3. In this formula, the tilde (“~”)
is the regression operator. On the left-hand side of the operator, the dependent variables are defined,
while on the right-hand side, the independent variables are defined, separated by the “+” operator. In
lavaan, a typical model is simply a system of regression formulas, where some variables may be latent.
Latent variables should always be defined by the “=~” operator. In addition, intercepts and (residual)
(co)variances can manually be added to the model by using resp. the “~1” and “~~” operators. Note
that not all operators will be needed for every model. An example is given below.
In the example above, f1, f2 and f3 are the latent variables, while y1 to y10 are the observable variables.
Note that only y1, y2, f1 and f2 are dependent (endogenous) variables, while all other variables (latent
and observable) are independent (exogenous).
1
Intercept of an indicator: “indicator intercept”, Intercept of a latent variable: “latent mean”. See also Page 19.
5
3. Functions and concepts
Functions in lavaan are always indicated with brackets. For example: cfa() is a function, while CFA is a concept.
The column ‘page’ indicates where a function or concept is first explained or used.
6
4. Confirmatory factor analysis
We start with an example of CFA, using the cfa() function, which is a user-friendly function for fitting
CFA models. The lavaan package contains a built-in dataset called HolzingerSwineford1939. This data-
set consists of the mental ability test scores of 301 7th and 8th grade children from two different schools
(Pasteur and Grant-White). In our dataset, only 9 of the original 26 observable variables are included.
The CFA model that is proposed consists of 3 latent variables (or factors), each with 3 indicators:
With x1: visual perception, x2: cubes, x3: lozenges, x4: paragraph comprehension, x5: sentence comple-
tion, x6: word meaning, x7: speeded addition, x8: speeded counting of dots, and x9: speeded discrimina-
tion of straight and curved capitals. Figure 1 shows the path diagram of this model. In this example, all
latent variables are exogenous (independent), while all observable variables are endogenous.
Figure 1 - Path diagram of the CFA model for the HolzingerSwineford1939 dataset (residuals are not depicted). Note:
although path diagrams have their origins in path analysis, we can also use them in CFA (and of course also in SEM).
There are five logical steps in CFA and SEM: (i) model specification (defining the hypothesized relation-
ships among the variables based on prior knowledge), (ii) model identification (checking if the model
is over-identified, just-identified, or under-identified), (iii) parameter estimation (only possible if the
model is just-identified or over-identified), (iv) model evaluation (assessing model fit with quantitative
indices), and (v) model modification (post hoc modification of the model to improve model fit). We
will follow this sequence in subsequent paragraphs. However, first we need to pre-analyse the dataset.
## Load Libraries.
## If a package is not installed yet, use: install.packages(“package name”).
library(lavaan)
7
library(lavaanPlot)
library(dplyr)
library(semTools)
library(knitr)
library(mvnormalTest)
Before we analyse the dataset, we should check if the indicators (x1 to x9) are normally distributed, as
the maximum likelihood (ML) estimator, which we will use to estimate our model parameters, assumes
normality. Note that this assumption applies to the residuals and is thus only relevant for observable,
dependent variables (in this model, the observable variables, x1 to x9, are all dependent); in contrast,
independent variables can take any distributional form at all. If the observable, dependent variables are
not normally distributed, a more robust form of the maximum likelihood (ML) estimator needs to be
applied. Luckily, there are multiple robust forms of this estimator available in the lavaan package.
## Extracting the indicators from the dataset (all rows, columns 7 to 15).
indicators <- data[,7:15]
$mv.test
Test Statistic p-value Result
1 Skewness 344.9053 0 NO
2 Kurtosis 3.2344 0.0012 NO
3 MV Normality <NA> <NA> NO
$uv.shapiro
W p-value UV.Normality
x1 0.9928 0.1582 Yes
x2 0.9697 0 No
x3 0.9523 0 No
x4 0.9827 0.0011 No
x5 0.9769 1e-04 No
x6 0.9538 0 No
x7 0.9908 0.056 Yes
x8 0.9807 4e-04 No
x9 0.9942 0.307 Yes
We can conclude with 95% confidence that the indicators generally don’t follow a normal distribution
(skewness or asymmetry: p < 0.05, Kurtosis or thickness of tails: p < 0.05). Also the Shapiro-Wilk test
indicates that 6 out of 9 indicators are not normally distributed (only for x1, x7 and x9, p > 0.05). This
means that we will have to use a more robust form of the maximum likelihood estimator.
8
Step 1. Model specification
We can now enter the model syntax.
HS.model <- 'visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
Note that covariances between the latent variables, although present in Figure 1, are not written in
the model syntax. The reason why this model syntax is so short is that the cfa() function, which we
will use in step 2, takes care of several things. First, all exogenous (independent) latent variables are
correlated by default. Second, again by default, the loading of a latent variable on its first indicator is
fixed to 1 (this means that λ = 1 for the paths ‘x1 - visual’, ‘x4 - textual’ and ‘x7 - speed’), thereby fixing
the scale of the latent variable. Third, residual (co)variances (for or between the observable variables)
are added automatically. In this way, the model syntax is kept concise. On the other hand, the user
remains in control: this ‘default’ behaviour can be overridden or switched off.
Model identification can also be performed in RStudio. In lavaan, we can use the cfa() function to fit
the HS.model to the dataset. This function will also perform the model identification.
Note also that we use “MLR” as a robust alternative for the ML estimator2 because not all dependent
variables are normally distributed (see Step 0).
## Fit the model to a confirmatory factor analysis (CFA) model.
## If maximum likelihood estimations are used (ML or a robust variant of ML,
## such as MLR), lavaan will by default base the analysis on the so-called
## biased sample covariance matrix, where the elements are divided by N instead
## of N-1. To use the unbiased sample covariance matrix, and N-1 as the
## multiplier to compute the Chi-square statistic, the argument
## “likelihood = wishart” can be added.
2
"MLR" is a maximum likelihood estimation technique with robust (Huber-White) standard errors and a scaled test statistic that is (asymptotically) equal to
the Yuan-Bentler test statistic. MLR can be used for both complete and incomplete datasets. Alternatives for MLR: https://lavaan.ugent.be/tutorial/est.html.
9
fit <- cfa(HS.model, data, estimator = "MLR", likelihood = "wishart")
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 301
10
lavaan 0.6.14 ended normally after 35 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 301
Parameter Estimates:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
x1 1.000 0.901 0.772
x2 0.554 0.132 4.198 0.000 0.499 0.424
x3 0.729 0.141 5.178 0.000 0.657 0.581
textual =~
x4 1.000 0.991 0.852
11
x5 1.113 0.066 16.974 0.000 1.103 0.855
x6 0.926 0.061 15.114 0.000 0.918 0.838
speed =~
x7 1.000 0.621 0.570
x8 1.180 0.130 9.061 0.000 0.732 0.723
x9 1.082 0.266 4.067 0.000 0.671 0.665
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
textual 0.410 0.099 4.117 0.000 0.459 0.459
speed 0.263 0.060 4.373 0.000 0.471 0.471
textual ~~
speed 0.174 0.056 3.086 0.002 0.283 0.283
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.551 0.157 3.515 0.000 0.551 0.404
.x2 1.138 0.112 10.151 0.000 1.138 0.821
.x3 0.847 0.100 8.433 0.000 0.847 0.662
.x4 0.372 0.050 7.394 0.000 0.372 0.275
.x5 0.448 0.057 7.883 0.000 0.448 0.269
.x6 0.357 0.047 7.670 0.000 0.357 0.298
.x7 0.802 0.097 8.236 0.000 0.802 0.676
.x8 0.489 0.120 4.087 0.000 0.489 0.477
.x9 0.568 0.119 4.776 0.000 0.568 0.558
visual 0.812 0.181 4.494 0.000 1.000 1.000
textual 0.983 0.121 8.089 0.000 1.000 1.000
speed 0.385 0.107 3.602 0.000 1.000 1.000
R-Square:
Estimate
x1 0.596
x2 0.179
x3 0.338
x4 0.725
x5 0.731
x6 0.702
x7 0.324
x8 0.523
▪ The Comparative Fit Index (CFI) represents the amount of variance that has been accounted for
in a covariance matrix and ranges from 0.0 to 1.0. A higher CFI indicates a better model fit. In
practice, the CFI should be ≥ 0.95. The CFI is less affected by sample size than the χ2 statistic.
▪ The Root Mean Square Error of Approximation (RMSEA) and the Standardized Root Mean Square
Residual (SRMR): the RMSEA is a “badness of fit” index where 0 indicates a perfect fit and higher
values indicate a lack of fit. It is useful for detecting model misspecification and is less sensitive to
sample size than the χ2 statistic. In practice, the RMSEA should be ≤ 0.06. The SRMR is similar to
the RMSEA and should be ≤ 0.05 for a close model fit; 0.05 - 0.08 for a reasonable model fit; and
≥ 0.10 for a poor model fit.
▪ The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC): these are
both relative measures from the perspective of model selection rather than null hypothesis testing.
12
AIC offers a relative estimation of the information lost when the given model is used to generate
data, while BIC is an estimation of how parsimonious a model is among several candidate models.
The AIC and BIC are not useful in testing the null hypothesis but are useful for selecting the model
with the least overfitting, whereby the model with the lowest AIC- or BIC-value is least overfitted.
▪ While not often used in CFA or SEM, we can also interpret the R-Square (R²) value. Any variable
that has an arrow pointing to it in the path diagram (i.e. a dependent, endogenous variable) has an
R² value attached to it. The R² value indicates the percentage of variance of a dependent variable
that is explained by a corresponding independent variable. There is no strict cut-off value for an
acceptable R2 value, but R² values ≥ 0.500 are desirable and higher values are better.
In this example, the CFI-index is too low (0.930), the RMSEA is too high (0.093) and the SRMR is okay
(0.065). Based on this, we can conclude that the fit is rather bad. We will address this issue in Step 5
(model modification). The R2 values range from 0.179 to 0.731. It appears from these values that each
endogenous variable has a substantial relationship with their corresponding exogenous variable, but
also that there are big differences (highest R² value for x5 on textual, lowest R² value for x2 on visual).
Interpretation: covariances
The values in the last two columns are the standardized covariances between the latent variables. These
values are always a measure of the relationship between two latent factors. The covariances evaluate
to what extent two latent factors change together. In other words, it is essentially a measure of the
variance between two latent factors. A positive covariance indicates that two latent factors tend to
move in the same direction, while a negative covariance indicates that they tend to move in opposite
directions. As the P(>|z|) values are always equal to 0.000 and thus < 0.05, all three covariances are
significantly different from zero. Or in other words, the extent in which two latent variables in this
example change together is always significantly different from zero.
Interpretation: variances
The values in the last two columns are the standardized variances of the observed and latent variables.
As the P(>|z|) values are always equal to 0.000 and thus < 0.05, the variance of every variable (observed
or latent) in this example is significantly different from zero.
Note that in the variances tab, there is a dot (.) before the observable variables names. This is because
they are dependent (or endogenous) variables (predicted by the latent variables), and therefore, the
value for the variance that is printed in the output is an estimate of the residual variance: the left-over
variance not explained by the latent variables. In contrast, there is no dot before the latent variable
names, because they are exogenous variables in this model (there are no single-headed arrows pointing
to them). The values for the variances here are the estimated total variances of the latent variables.
13
Extracting information
It is noteworthy to mention that a few functions exist that can be used to extract certain elements of
this summary() function as a data.frame.
Function coef()
## Extract the estimated values of all free parameters in the model, together
## with their names (factor loadings or regression weights of the latent
## variables on the indicators: =~, variances and covariances: ~~).
coef(fit)
Note that the 21 parameters values displayed here are not standardized, as coef() does not allow the
argument “standardized = TRUE”. Also remember that the loading λ of a latent variable on its first
indicator is fixed to 1 (λ = 1 for the paths ‘x1 - visual’, ‘x4 - textual’ and ‘x7 - speed’), which is why these
parameter values are not displayed here (but we know that they are fixed to 1).
Function parameterEstimates()
## Extract the estimated values of all (free) parameters in the model, together
## with their standard error (se), z-value (z), p-value (pvalue) and 95%
## confidence interval (ci.lower and ci.upper). The lhs (left-hand side), op
## (operator) and rhs (right-hand side) columns define the parameter.
14
This function displays all estimated, standardized parameter values. In addition, a column with std.nox
is added. To calculate this column, the variances of exogenous observable variables are ignored. Since
all observable variables in this example are endogenous, this column is exactly the same as “std.all”.
Note that we have 24 parameters, of which 21 free parameters and 3 fixed parameters (p-value: NA).
vcov(fit)
vsl=~2 vsl=~3 txt=~5 txt=~6 spd=~8 spd=~9 x1~~x1 x2~~x2 x3~~x3 x4~~x4 x5~~x5 x6~~x6 x7~~x7 x8~~x8 x9~~x9 vsl~~v txtl~~t spd~~s vsl~~t vsl~~s txtl~~s
visual=~x2 0.017
visual=~x3 0.015 0.020
textual=~x5 0.001 0.000 0.004
textual=~x6 0.001 0.001 0.001 0.004
speed=~x8 0.003 0.002 0.001 0.001 0.017
speed=~x9 0.006 0.007 -0.001 -0.001 0.009 0.071
x1~~x1 0.015 0.018 0.000 0.001 0.002 0.008 0.025
x2~~x2 -0.004 -0.004 0.000 0.000 0.000 -0.003 -0.004 0.013
x3~~x3 -0.008 -0.010 0.000 -0.001 -0.002 -0.005 -0.009 0.003 0.010
x4~~x4 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.003
x5~~x5 -0.001 0.000 -0.002 0.000 0.000 0.000 0.000 0.000 0.000 -0.001 0.003
x6~~x6 0.000 0.000 0.000 -0.001 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.002
x7~~x7 0.001 0.000 0.000 0.000 0.002 0.017 0.001 -0.001 -0.001 0.001 0.000 0.001 0.009
x8~~x8 0.001 0.001 -0.001 -0.001 -0.001 0.024 0.002 -0.001 -0.001 0.000 0.000 0.001 0.006 0.014
x9~~x9 -0.001 -0.001 0.000 0.001 0.000 -0.025 -0.002 0.000 0.001 0.000 0.000 -0.001 -0.007 -0.011 0.014
visual~~visual -0.017 -0.021 -0.001 -0.001 -0.001 -0.002 -0.021 0.004 0.008 0.000 0.001 0.000 0.001 0.001 -0.001 0.033
textual~~textual -0.002 -0.002 -0.005 -0.004 -0.002 0.004 -0.001 0.001 0.001 -0.002 0.001 0.001 0.001 0.002 -0.002 0.004 0.015
speed~~speed -0.003 -0.002 0.000 0.000 -0.007 -0.024 -0.003 0.001 0.002 -0.001 0.000 -0.001 -0.006 -0.007 0.007 0.001 0.000 0.011
visual~~textual -0.007 -0.009 -0.002 -0.001 -0.001 0.001 -0.008 0.001 0.004 -0.001 0.001 0.001 0.001 0.001 -0.001 0.013 0.006 0.001 0.010
visual~~speed -0.003 -0.002 -0.001 0.000 -0.002 0.006 -0.002 0.001 0.001 0.000 0.000 0.000 0.002 0.003 -0.004 0.005 0.002 0.000 0.003 0.004
textual~~speed -0.001 -0.001 -0.001 0.000 -0.003 0.001 -0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.001 -0.001 0.002 0.003 0.001 0.003 0.002 0.003
Note that the model-implied (fitted) covariance matrix contains [9*(9+1)]/2 = 45 unique values. The
covariance matrix of the parameter estimates contains [21*(21+1)]/2 = 231 unique values.
Function lavResiduals()
## To extract the residuals of the fitted model (i.e. the differences between
## the observed covariance matrix and the model-implied covariance matrix),
## the function lavResiduals() can be used.
lavResiduals(fit, type = “cor.bentler”)
$type
[1] "cor.bentler"
$cov
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 0.000
x2 -0.030 0.000
x3 -0.008 0.094 0.000
x4 0.071 -0.012 -0.068 0.000
x5 -0.009 -0.027 -0.151 0.005 0.000
x6 0.060 0.030 -0.026 -0.009 0.003 0.000
x7 -0.140 -0.189 -0.084 0.037 -0.036 -0.014 0.000
x8 -0.039 -0.052 -0.012 -0.067 -0.036 -0.022 0.075 0.000
x9 0.149 0.073 0.147 0.048 0.067 0.056 -0.038 -0.032 0.000
$cov.z
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 0.000
x2 -1.089 0.000
15
x3 -0.269 2.211 0.000
x4 2.022 -0.243 -1.918 0.000
x5 -0.278 -0.529 -3.712 0.345 0.000
x6 1.675 0.604 -0.697 -0.688 0.242 0.000
x7 -3.476 -3.712 -1.917 0.896 -0.842 -0.366 0.000
x8 -1.241 -1.157 -0.301 -1.945 -1.008 -0.646 4.465 0.000
x9 3.370 1.747 3.433 1.086 1.667 1.278 -2.086 -2.871 0.000
$summary
cov
srmr 0.065
srmr.se 0.007
srmr.exactfit.z 4.227
srmr.exactfit.pvalue 0.000
usrmr 0.055
usrmr.se 0.010
usrmr.ci.lower 0.039
usrmr.ci.upper 0.072
usrmr.closefit.h0.value 0.050
usrmr.closefit.z 0.538
usrmr.closefit.pvalue 0.295
Note that summary statistics are provided, and that you can choose the type of residual information:
▪ When type = "raw" is used, the function lavResiduals provides the unscaled difference between the observed
and the model-implied covariance matrix.
▪ When type = "cor.bentler" is used, the function lavResiduals provides the rescaled difference between the
observed and the model-implied covariance matrix (rescaled by dividing all elements by the square roots of
the corresponding variances of the observed covariance matrix).
▪ Other options: type = “standardized”, type = “standardized.mplus”, type = “cor.bollen”.
Function fitMeasures()
## To extract important model fit measures (e.g. χ2 statistic, CFI-index, SRMR,
## RMSEA, etc.) of the fitted model, the function FitMeasures() can be used.
fitMeasures(fit,c("chisq.scaled","df.scaled","pvalue.scaled","aic","bic",
"cfi.robust","rmsea.robust","srmr"))
16
▪ Add parameters to the model
It is also possible to add parameters to the model (outer loadings, but also variances, covariances
or regressions). However, before adding a parameter to the model, critically consider if this action
will not violate model validity. For instance, don’t add an indicator to a latent variable definition if
it has nothing to do with the latent variable. This seems logical, but we will later discuss the lavaan
function modindices() that determines which parameters should preferentially be added to a CFA/
SEM model, and sometimes, this function will identify relationships that are statistically significant
but that are at the same time not causal (oorzakelijk). So, be critical when you use this function.
We will now discuss an important strategy that can be used to add parameters to a CFA or SEM
model that is not well-fitted: the determination of modification indices (MI). A modification index
is an estimate of the amount by which the χ2 statistic would be improved (i.e. reduced) if a certain
parameter restriction were to be removed from the model. Most commonly, modification indices
reflect the improvement in model fit that would result if a previously omitted parameter were to
be added and freely estimated. This parameter can be a loading, a regression coefficient, or even
a correlated residual. If a parameter is added based on a large MI, this is called a “post hoc, data-
driven model modification”. MIs > 3.84 indicate that the model fit would be improved (3.84 is the
critical value for the χ2 statistic with 1 df), and that the p-value for the added parameter would
be < 0.05. MIs > 10.83 indicate that the added parameter would have a p-value < 0.001.
Although MIs can be useful in identifying sources of misfit in a model, using them also carries risks.
First, they are completely determined by the data and are thus devoid of theory. The largest MIs
might be associated with parameters that are unsupported by theory and instead represent some
idiosyncratic characteristics of the data. Second, simulation research has suggested that using MIs
to guide model modifications rarely leads to the true population model.
However, like many things in statistics, MIs can be beneficial if used in a thoughtful, judicious way.
For instance, these can be used as another goodness-of-fit indicator. For example, a small number
of large MIs indicates that important parameters were omitted, while a large number of small MIs
indicates that a lot of less-important parameters were not considered. MIs are also used when
assessing measurement invariance (or the lack thereof) across groups in CFA and SEM models.
In lavaan, modification indices can be requested by the function modindices(). By default, MIs are
printed out for each non-free (fixed-to-zero) parameter. The MIs are supplemented by the expec-
ted parameter change (epc) values. The last three columns contain the standardized epc values
(sepc.lv: only standardizing latent variables; sepc.all: standardizing all variables; sepc.nox: standardi-
zing all but exogenous observable variables).
## Function modindices().
modindices(fit, sort = TRUE, minimum.value = 3.84)
17
50 x1 ~~ x9 7.335 0.138 0.138 0.247 0.247
65 x4 ~~ x6 6.220 -0.236 -0.236 -0.646 -0.646
66 x4 ~~ x7 5.920 0.099 0.099 0.180 0.180
48 x1 ~~ x7 5.420 -0.130 -0.130 -0.195 -0.195
77 x7 ~~ x9 5.183 -0.187 -0.187 -0.278 -0.278
36 textual =~ x9 4.796 0.138 0.137 0.136 0.136
29 visual =~ x8 4.295 -0.210 -0.190 -0.187 -0.187
63 x3 ~~ x9 4.126 0.102 0.102 0.147 0.147
e
As an example, we will now add the parameters for which the MI > 10.83 to the model. In practice,
this means that we consider that the latent variable “visual” is also influenced by indicators x9 (speeded
discrimination of straight and curved capitals) and x7 (speeded addition), that x7 and x8 are correlated
(x8: speeded counting of dots), and that x8 and x9 are correlated. As these relationships seem plausible
and not only statistically relevant, we will add them to the model and repeat steps 2, 3 and 4.
## Step 1.
HS.model2 <- 'visual =~ x1 + x2 + x3 + x7 + x9
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
x7 ~~ x8
x8 ~~ x9'
## Step 2. The output is not displayed here, but the residual df was equal to
## 20 (> 0), which means that also HS.model2 is over-identified.
fit2 <- cfa(HS.model2, data, estimator = "MLR", likelihood = "wishart")
Although variances cannot be negative, lavaan can produce variance estimates that are negative. The
solution is then called inadmissible. Negative variances are theoretically impossible, so the solution is
considered improper and other estimates are not reliable. Therefore, we will omit the parameters
that caused this warning message (x7 ~~ x8 and x8 ~~ x9, you can check this by yourself). We will also
omit “visual =~ x7”, as this loading is insignificant (p > 0.05) (execute the summary function to see this).
## Step 1.
HS.model2 <- 'visual =~ x1 + x2 + x3 + x9
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9’
18
From this output, we can observe that:
▪ The χ2 statistic is reduced, while the p-value of this statistic is increased (which we desired). The
p-value is still < 0.05, but we know that the χ2 statistic is sensitive to sample size. In addition, the
CFI-index is now ≥ 0.95 and also the RMSEA and SRMR are improved. From this, we can conclude
that the model fit is improved by altering the model specification.
▪ Since 1 parameter was added to the model, df.scaled decreased from 24 to 23.
d
▪ The AIC and BIC of HS.model2 are lower than the AIC and BIC of HS.model, which means that
the HS.model2 is less overfitted than the HS.model.
Overall, the model fit was improved by altering the first latent variable definition. As the validity of
the model seems not negatively affected by this adjustment, it can be justified. However, always
remember that model validity is more important than model fit, and that model fit is dependent on
sample size, while model validity is not. In addition, always consider that you should report model
modifications, whether guided by modification indices or other considerations, so that reviewers and
readers of your research can also judge these decisions.
The idea behind measurement invariance is that when we compare any measurement across groups,
this comparison should reflect true differences rather than measurement differences. When data are
continuous, testing for measurement invariance involves testing for three invariance types: configural
invariance, weak or metric invariance, and strong or scalar invariance.
▪ Configural invariance: the same latent variable structure is imposed on all groups (i.e. the
number and pattern of latent variables is the same for all groups, which means that the same
indicators are loading on the same latent variables for all groups). However, loadings (λ) and
indicator intercepts (τ) can be freely estimated.
▪ Weak or metric invariance posits the same constraints as configural invariance, but with the
additional constraint that the loadings (λ) are equal across groups.
▪ Strong or scalar invariance posits the same constraints as configural and metric invariance, but
with the additional constraint that the indicator intercepts (τ) are equal across groups.
In the code below, it is shown how these three invariance impositions can be checked in lavaan.
## Remark 1: In this example, measurement invariance will be checked for the
## three latent variables that are present in our model (visual, textual and
## speed) because we want to compare the values of their latent means between
## the two school groups that are present in this dataset (Pasteur and Grant-
## White). In other words: we want to determine which school group performs
## the best for the latent variables visual, textual and speed.
## Remark 2: The function in lavaan that can be used to test for measurement
## invariance is lavTestLRT(). This function is similar to ANOVA and compares
## lavaan models, based on a χ2 statistic. A p-value > 0.05 is desired, as this
19
## indicates that measurement invariance is supported by the dataset.
## Note that we don’t have to check for configural invariance in lavaan, as we
## imposed the same latent variable structure on all groups in the model
## specification (the model specification is the same for both groups). This
## is why the code below will only provide 2 p-values: the first p-value will
## test for the presence of weak invariance; the second p-value will test for
## the presence of strong invariance (if present: p > 0.05).
## Remark 3: In the code below, note that we omit the arguments estimator =
## "MLR" and likelihood = "wishart". We do this because lavTestLRT() will
## execute these corrections automatically. If we would still use these two
## arguments, lavaan would display the following note (no error, but a note):
## The “Chisq” column contains standard test statistics, not the robust
## test that should be reported per model. A robust difference test is a
## function of two standard statistics.
## In other words, using these two arguments would lead to a non-corrected
## χ2 statistic and a potentially false interpretation of measurement
## invariance, which is of course not desired.
-------------------------------------------------------------------------------
Testing for measurement invariance
-------------------------------------------------------------------------------
rm(list = ls())
data <- HolzingerSwineford1939
HS.model2 <- 'visual =~ x1 + x2 + x3 + x9
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
Because we provided three model fits, the function lavTestLRT() will produce two tests: the first test
compares the first model to the second model, while the second test compares the second model to
the third model. Because the first p-value is non-significant, we may conclude that weak invariance (or
metric invariance, i.e. equal loadings across groups) is supported in this dataset. However, because the
second p-value is significant, strong invariance (or scalar invariance, i.e. equal indicator intercepts across
groups) is not3. Therefore, we can’t compare the values of latent means across the two groups directly.
However, if full measurement invariance is not supported, Byrne et al. (1989) have proposed a solution
that is based on partial measurement invariance. This can be both partial metric invariance (some, but
3
This can i.a. be caused by the restricted size of the dataset, especially because we divided the dataset into two smaller groups.
20
not all loadings λ are equated across groups), or partial scalar invariance (some, but not all indicator
intercepts τ are equal across groups). However, for both these types of partial measurement invariance,
at least 2 indicators of a latent variable must have equal loadings or indicator intercepts across groups.
Partial metric and/or scalar measurement invariance allows researchers to at least compare the group-
specific (standardized) latent means. If less than two indicators per latent variable have equal loadings
and/or indicator intercepts, multiple group comparisons in CFA and SEM may become problematic.
Since metric invariance was established for our model fit, there is no need to introduce partial metric
invariance. However, as scalar invariance was not established, we will introduce partial scalar invariance
in our model fit. Remember: this means that we will allow that some indicator intercepts are not equal
across groups. In practice, we need to calculate the modification indices of all indicator intercepts τ,
after which we have to constrain invariant τ while setting free non-invariant τ. However, we also have
to make sure that at least two indicators of a latent variable still have equal intercepts across groups4.
If this procedure results in partial scalar invariance, we can reliably compare the values of latent means
across the two groups by using the summary() function.
-------------------------------------------------------------------------------
Introducing partial scalar (measurement) invariance
-------------------------------------------------------------------------------
## Step 1: calculating the modification indices. It is important to realize
## that the function modindices() will only consider fixed-to-zero parameters.
## Therefore, we need to fix all indicator intercepts to zero first.
## This can be done by amending the model specification. Note that
## (i) the operator (op) in lavaan for (indicator) intercepts is ~1
## (ii) we can fix a parameter in a lavaan formula by pre-multiplying the
## corresponding variable in the formula with a numerical value
## (iii) to fix a parameter to zero, pre-multiply with 0 (if you want to
## force a parameter to be freely estimated, pre-multiply with NA)
## (iv) because all indicator intercepts are fixed to the same value
## (i.e. zero), we only need one line of code and we can separate them
## by an addition operator or + sign (if you would want to fix all
## indicator intercepts to a different value, e.g. 5, the code would be:
## x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 ~ 5*1,
## and if you would want to set all indicator intercepts free, the code
## would be: x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 ~ NA*1)
4
For partial metric invariance, the solution would be similar, i.e. calculating the modification indices of all loadings λ, constraining invariant
λ and setting free non-invariant λ, while making sure that at least two indicators per latent variable still have equal loadings across groups.
21
From this output, we can notice that especially indicators x3 and x7 have intercept estimates that are
non-invariant across groups (i.e. high modification indices). Therefore, we will set free the intercepts
of these two indicators, while constraining all other indicator intercepts in the model. Note that “x3
~1” and “x7 ~ 1” are shown twice because the indicator intercepts are now estimated for both groups
(and thus, are not equal anymore). As the intercept estimate of x3 is strongly non-invariant for both
groups, it is especially important to set this intercept free. Also note that we can’t set free the
intercept of x8. If we would do this, the latent variable “speed” would only have one indicator left with
equal intercepts across groups (i.e. x9) since we already freed the intercept of x7.
-------------------------------------------------------------------------------
Introducing partial scalar (measurement) invariance
-------------------------------------------------------------------------------
## Step 2: constrain invariant τ and set free non-invariant τ.
## Attention: make sure that you use “HS.model2” and not “HS.modeltest”.
fitE <- cfa(HS.model2, data, group = "school", group.equal =
c("loadings", "intercepts"), group.partial = c("x3~1","x7~1"))
We can conclude from the output above that both metric invariance and partial scalar invariance are
present. Therefore, we can now reliably compare the values of latent means across the two groups.
22
Root Mean Square Error of Approximation:
RMSEA 0.067
90 Percent confidence interval - lower 0.042
90 Percent confidence interval - upper 0.090
P-value H_0: RMSEA <= 0.050 0.123
P-value H_0: RMSEA >= 0.080 0.180
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
x1 1.000 0.868 0.744
x2 (.p2.) 0.617 0.099 6.257 0.000 0.535 0.432
x3 (.p3.) 0.804 0.103 7.836 0.000 0.697 0.602
x9 (.p4.) 0.453 0.081 5.577 0.000 0.393 0.391
textual =~
x4 1.000 0.938 0.814
x5 (.p6.) 1.121 0.066 16.984 0.000 1.052 0.831
x6 (.p7.) 0.931 0.056 16.587 0.000 0.874 0.862
speed =~
x7 1.000 0.600 0.558
x8 (.p9.) 1.195 0.164 7.300 0.000 0.717 0.732
x9 (.10.) 0.682 0.107 6.359 0.000 0.409 0.407
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
textual 0.370 0.092 4.033 0.000 0.454 0.454
speed 0.084 0.064 1.314 0.189 0.162 0.162
textual ~~
speed 0.149 0.063 2.365 0.018 0.265 0.265
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 (.26.) 4.921 0.091 54.361 0.000 4.921 4.219
.x2 (.27.) 6.092 0.078 77.977 0.000 6.092 4.921
.x3 2.487 0.093 26.803 0.000 2.487 2.146
.x9 (.29.) 5.390 0.070 76.980 0.000 5.390 5.360
.x4 (.30.) 2.778 0.087 31.938 0.000 2.778 2.411
.x5 (.31.) 4.034 0.096 41.832 0.000 4.034 3.186
.x6 (.32.) 1.926 0.079 24.426 0.000 1.926 1.899
.x7 4.432 0.086 51.521 0.000 4.432 4.125
.x8 (.34.) 5.574 0.077 72.605 0.000 5.574 5.693
visual 0.000 0.000 0.000
textual 0.000 0.000 0.000
speed 0.000 0.000 0.000
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.608 0.127 4.774 0.000 0.608 0.447
.x2 1.247 0.153 8.123 0.000 1.247 0.813
.x3 0.857 0.123 6.964 0.000 0.857 0.638
.x9 0.637 0.088 7.272 0.000 0.637 0.630
.x4 0.447 0.069 6.431 0.000 0.447 0.337
.x5 0.497 0.082 6.084 0.000 0.497 0.310
.x6 0.265 0.050 5.276 0.000 0.265 0.258
.x7 0.795 0.114 6.981 0.000 0.795 0.689
.x8 0.445 0.107 4.174 0.000 0.445 0.464
visual 0.753 0.156 4.819 0.000 1.000 1.000
textual 0.880 0.131 6.692 0.000 1.000 1.000
speed 0.360 0.089 4.023 0.000 1.000 1.000
23
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
x1 1.000 0.855 0.734
x2 (.p2.) 0.617 0.099 6.257 0.000 0.528 0.477
x3 (.p3.) 0.804 0.103 7.836 0.000 0.687 0.663
x9 (.p4.) 0.453 0.081 5.577 0.000 0.387 0.382
textual =~
x4 1.000 0.932 0.847
x5 (.p6.) 1.121 0.066 16.984 0.000 1.046 0.862
x6 (.p7.) 0.931 0.056 16.587 0.000 0.868 0.795
speed =~
x7 1.000 0.727 0.702
x8 (.p9.) 1.195 0.164 7.300 0.000 0.869 0.833
x9 (.10.) 0.682 0.107 6.359 0.000 0.496 0.489
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
textual 0.435 0.096 4.544 0.000 0.545 0.545
speed 0.244 0.078 3.139 0.002 0.393 0.393
textual ~~
speed 0.182 0.072 2.526 0.012 0.268 0.268
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 (.26.) 4.921 0.091 54.361 0.000 4.921 4.225
.x2 (.27.) 6.092 0.078 77.977 0.000 6.092 5.508
.x3 1.970 0.105 18.726 0.000 1.970 1.900
.x9 (.29.) 5.390 0.070 76.980 0.000 5.390 5.319
.x4 (.30.) 2.778 0.087 31.938 0.000 2.778 2.524
.x5 (.31.) 4.034 0.096 41.832 0.000 4.034 3.327
.x6 (.32.) 1.926 0.079 24.426 0.000 1.926 1.764
.x7 4.000 0.096 41.737 0.000 4.000 3.862
.x8 (.34.) 5.574 0.077 72.605 0.000 5.574 5.345
visual 0.032 0.125 0.253 0.800 0.037 0.037
textual 0.576 0.117 4.919 0.000 0.618 0.618
speed -0.079 0.095 -0.834 0.404 -0.109 -0.109
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.625 0.117 5.360 0.000 0.625 0.461
.x2 0.945 0.122 7.767 0.000 0.945 0.773
.x3 0.603 0.094 6.413 0.000 0.603 0.561
.x9 0.480 0.073 6.616 0.000 0.480 0.467
.x4 0.342 0.062 5.534 0.000 0.342 0.282
.x5 0.377 0.073 5.135 0.000 0.377 0.256
.x6 0.438 0.067 6.572 0.000 0.438 0.367
.x7 0.545 0.092 5.928 0.000 0.545 0.508
.x8 0.332 0.101 3.299 0.001 0.332 0.306
visual 0.732 0.154 4.757 0.000 1.000 1.000
textual 0.869 0.131 6.660 0.000 1.000 1.000
speed 0.528 0.119 4.431 0.000 1.000 1.000
▪ In the tab “Intercepts”, the indicator intercepts and latent means are now shown by the summary()
function. Note that, by default, the cfa() and sem() functions fix the intercepts to zero (this was
the case in Step 3 & 4. Parameter estimation & Model evaluation). Otherwise, the model wouldn’t
be estimable. Only because we now have two groups, the intercepts can differ from zero.
▪ The loadings and indicator intercepts are equated across the two groups, which was desired. Note
that this is indicated in lavaan by the .p2. to .11. labels (loadings) and the .27 to .35 labels (indicator
intercepts). For x3 & x7, the intercepts are not labelled, as these indicator intercepts were set free.
24
▪ The model fit, based on the χ2 statistic, CFI-index, RMSEA and SRMR, is close to reasonable.
▪ The latent means of group 1 (Pasteur) are all standardized to 0.000. We can consider group 1 as
the reference group, which is necessary to compare Pasteur with Grand-White. The latent means
of group 2 (Grand-White) are 0.032 (visual), 0.576 (textual) and -0.079 (speed). Notice that we
cannot interpret these values directly, we can only compare them between the two groups. For
instance, we can conclude that, after allowing partial scalar measurement invariance, the students
of the Grand-White school globally perform better on the latent variable textual (p = 0.000), while
significant differences between the two schools for the latent variables visual (p = 0.800) and speed
(p = 0.404) are not present (significance level: α = 0.05).
Extra: You can also test for strict invariance or residual invariance, i.e. the variances of the indicators
are also equated across groups. However, this is not a prerequisite for testing latent mean differences
because the indicator residuals are not part of the latent variable(s), so their invariance is inconsequen-
tial to the interpretation of latent mean differences. However, residual invariance is still reported in
many tests of measurement invariance. If you want to account for strict invariance in lavaan, you can
design a CFA/SEM model where group.equal = c(“intercepts”, “loadings”, “residuals”). However, including
this strict invariance model in the lavTestLRT() function to decide if latent mean difference testing can
be executed, may lead to an over-rejection of your group comparison model.
Categorical data
Binary, ordinal and nominal observable variables result in categorical data. It makes a big difference if
these categorical variables are exogenous (independent) or endogenous (dependent) in the model.
2A. Declare them as ‘ordered’ in your data.frame before you run the analysis. For instance, if you need
to declare four observable variables a1, a2, a3 and a4 as ordinal in your data.frame “data”, use:
data[,c("a1","a2","a3","a4")] <- lapply(data[,c("a1","a2","a3","a4")], ordered)
2B. Use the ordered argument when using one of the fitting functions, i.e. cfa() or sem(). For example,
if you have four binary or ordinal observable variables a1, a2, a3 and a4, you can use:
fit <- cfa(model, data, ordered = c("a1","a2","a3","a4"))
If all the endogenous observable variables are to be treated as categorical, you can use the argument
“ordered = TRUE” as a shortcut. When the “ordered = …” argument is used, lavaan will automatically
switch to the WLSMV estimator: it will use diagonally weighted least squares (DWLS) to estimate the
25
model parameters and it will use the full weight covariance matrix to compute robust standard errors
as well as a mean- and variance-adjusted test statistic. Other options are ULSMV and PML.
lavaanPlot(model = fit2,
labels = labels,
node_options = list(shape = "box", fontname = "Helvetica"),
edge_options = list(color = "grey"),
coefs = TRUE,
sig = 0.05)
https://cran.r-project.org/web/packages/lavaanPlot/vignettes/Intro_to_lavaanPlot.html
https://cran.r-project.org/web/packages/lavaanPlot/lavaanPlot.pdf
26
5. Structural equation modelling
In our second example, we will study structural equation modelling (SEM), using the sem() function.
The lavaan package contains a built-in dataset called PoliticalDemocracy. This dataset consists of 75
observations with regard to 11 observable variables that are a measure for political democracy (dem)
or industrialization (ind) in developing countries in 1960 or 1965. The proposed SEM model is shown
in Figure 2 and consists of 3 latent variables (dem60, dem65, ind60) and the original 11 indicators:
Figure 2 - Path diagram of the SEM model for the PoliticalDemocracy dataset.
In the path diagram above, all observable variables are endogenous (dependent). The latent variables
dem60 and dem65 are also endogenous, while the latent variable ind60 is exogenous (independent).
## Load data.
data <- PoliticalDemocracy
head(data)
27
y1 y2 y3 y4 y5 y6 y7 y8 x1 x2 x3
1 2.50 0.000000 3.333333 0.000000 1.250000 0.000000 3.726360 3.333333 4.442651 3.637586 2.557615
2 1.25 0.000000 3.333333 0.000000 6.250000 1.100000 6.666666 0.736999 5.384495 5.062595 3.568079
3 7.50 8.800000 9.999998 9.199991 8.750000 8.094061 9.999998 8.211809 5.961005 6.255750 5.224433
4 8.90 8.800000 9.999998 9.199991 8.907948 8.127979 9.999998 4.615086 6.285998 7.567863 6.267495
5 10.00 3.333333 9.999998 6.666666 7.500000 3.333333 9.999998 6.666666 5.863631 6.818924 4.573679
6 7.50 3.333333 6.666666 6.666666 6.250000 1.100000 6.666666 0.368500 5.533389 5.135798 3.892270
## Assessing normality.
mardia(data)
$mv.test
Test Statistic p-value Result
1 Skewness 344.4944 0.0101 NO
2 Kurtosis -0.2581 0.7963 YES
3 MV Normality <NA> <NA> NO
$uv.shapiro
W p-value UV.Normality
y1 0.9316 6e-04 No
y2 0.8311 0 No
y3 0.8547 0 No
y4 0.8994 0 No
y5 0.9598 0.0179 No
y6 0.8107 0 No
y7 0.8715 0 No
y8 0.9006 0 No
x1 0.9751 0.1454 Yes
x2 0.9763 0.1715 Yes
x3 0.9734 0.1145 Yes
Based on the test above, we can conclude with 95% confidence that 9 out of 12 indicators don’t follow
a normal distribution. Thus, we will use a more robust form of the maximum likelihood estimator.
We already know from Section 4 that all exogenous (independent) latent variables are correlated by
default by the sem() function. However, this is not the case for endogenous latent variables, which is
why we need to add three regressions (dem60 ~ ind60, dem65 ~ ind60, dem65 ~ dem60) in the model
specification. We also know from Section 4 that residual (co)variances are added automatically. Still,
we have added six residual covariances in the model specification (y1 ~~ y5 to y6 ~~ y8). This is only
done when it is believed that observable variables have something in common that is not captured by
the latent variables. In the case of y1 ~~ y5 for instance, both indicators refer to identical scores (expert
ratings of the freedom of press), but measured in two different years (1960 and 1965, resp.). As there
is no latent variable that measures this ‘relationship’, we have added this residual covariance.
28
Step 2. Model identification
Without RStudio:
▪ The number of unique elements in the model-implied covariance matrix is equal to:
𝐾 ∗ (𝐾 + 1) 11 ∗ (11 + 1)
= = 66
2 2
𝑤𝑖𝑡ℎ 𝐾 = 𝑡ℎ𝑒 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑏𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 = 11
▪ The number of free parameters is equal to the sum of the number of loadings (11), the number
of regressions (3), the number of variances (14), and the number of covariances (6), minus the
number of loadings that correspond to the first indicator of a latent variable (3). This gives us 31
free parameters, which is smaller than the number of unique elements in the model-implied cova-
riance matrix (66). The residual df = 66 - 31 = 35 > 0. The model is thus over-identified.
With RStudio:
## Fit the model to a SEM model.
fit <- sem(model, data, estimator = "MLR", likelihood = "wishart")
29
Model Test Baseline Model:
Test statistic 720.912
Degrees of freedom 55
P-value 0.000
RMSEA 0.032
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.091
P-value H_0: RMSEA <= 0.050 0.629
P-value H_0: RMSEA >= 0.080 0.107
SRMR 0.044
Parameter Estimates:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ind60 =~
x1 1.000 0.674 0.920
x2 2.180 0.139 15.636 0.000 1.470 0.973
x3 1.819 0.153 11.887 0.000 1.226 0.872
dem60 =~
y1 1.000 2.238 0.850
y2 1.257 0.184 6.842 0.000 2.813 0.717
y3 1.058 0.152 6.940 0.000 2.367 0.722
y4 1.265 0.146 8.664 0.000 2.831 0.846
dem65 =~
y5 1.000 2.117 0.808
y6 1.186 0.170 6.977 0.000 2.510 0.746
y7 1.280 0.161 7.948 0.000 2.709 0.824
y8 1.266 0.159 7.953 0.000 2.680 0.828
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dem60 ~
ind60 1.483 0.402 3.691 0.000 0.447 0.447
dem65 ~
ind60 0.572 0.223 2.569 0.010 0.182 0.182
dem60 0.837 0.099 8.457 0.000 0.885 0.885
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y1 ~~
.y5 0.632 0.366 1.729 0.084 0.632 0.296
.y2 ~~
.y4 1.331 0.716 1.858 0.063 1.331 0.273
.y6 2.182 0.749 2.914 0.004 2.182 0.356
.y3 ~~
.y7 0.806 0.620 1.299 0.194 0.806 0.191
.y4 ~~
.y8 0.353 0.451 0.782 0.434 0.353 0.109
.y6 ~~
.y8 1.374 0.580 2.370 0.018 1.374 0.338
30
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.083 0.020 4.156 0.000 0.083 0.154
.x2 0.121 0.071 1.707 0.088 0.121 0.053
.x3 0.473 0.092 5.142 0.000 0.473 0.239
.y1 1.917 0.453 4.227 0.000 1.917 0.277
.y2 7.473 1.402 5.331 0.000 7.473 0.486
.y3 5.136 0.971 5.289 0.000 5.136 0.478
.y4 3.190 0.754 4.232 0.000 3.190 0.285
.y5 2.383 0.490 4.863 0.000 2.383 0.347
.y6 5.021 0.933 5.382 0.000 5.021 0.443
.y7 3.478 0.727 4.781 0.000 3.478 0.322
.y8 3.298 0.709 4.653 0.000 3.298 0.315
ind60 0.454 0.088 5.138 0.000 1.000 1.000
.dem60 4.009 0.940 4.266 0.000 0.800 0.800
.dem65 0.175 0.219 0.798 0.425 0.039 0.039
R-Square:
Estimate
x1 0.846
x2 0.947
x3 0.761
y1 0.723
y2 0.514
y3 0.522
y4 0.715
y5 0.653
y6 0.557
y7 0.678
y8 0.685
dem60 0.200
dem65 0.961
31
parameters is reduced by this modification. Because of this, and also because the model fit was already
very close, we will not make model modifications in this example.
Weak invariance is supported in this dataset, but strong invariance is not. Therefore, we will fix
all indicator intercepts to zero and calculate the MIs.
32
modeltest <- ' ## Measurement model (latent variable definitions)
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
Warning message:
In lav_object_post_check(object) :
lavaan WARNING: some estimates of variances are negative
Negative variances can occur due to (i) model misspecifications, (ii) small sample size, (iii) sampling
fluctuations, (iv) observable variables that have different scales, (v) observable, dependent variables
that are skewed or not-normally distributed, (vi) missing data, (vii) outliers, etc.
Given the impossibility of negative variances, researchers always need to determine the reason of
their occurrence. In this case, negative variances are most probably caused by randomly dividing
our dataset into two small(er) datasets without any factual evidence that this division holds true.
Because we only need fitD to calculate our modification indices and not to compare the values of
latent means, we can easily avoid the negative variances by fixing the variances of the corresponding
variables to their values in the complete dataset (i.e. no group division, as we know that no negative
variances occurred when we used the full dataset in Step 3 & 4). However, if negative variances
would appear in fitE, the model we will use to compare the values of latent means, this solution is
not appropriate because it can lead to deviating, incorrect estimates and misinterpretations in the
comparison of groups. To solve these kind of problems, we refer to two reference works:
(i) Kolenikov, S., & Bollen, K.A. (2012). Testing negative error variances: Is a Heywood case a symptom of
misspecification? Sociological Methods & Research, 41(1). 124-167. doi: 10.1177/0049124112442138.
(ii) Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation
modelling. Psychological Methods, 13(2). 150-170. doi: 10.1037/1082-989X.13.2.150.
R-Square:
Estimate
x1 NA
x2 0.284
x3 0.168
y1 0.803
y2 0.260
y3 0.530
y4 0.387
y5 0.723
y6 0.230
y7 0.588
y8 0.518
dem60 0.060
dem65 0.959
33
## Step 2: As the R² value cannot be calculated for x1, we will calculate the variance
## of x1 in the full dataset and use this value to fix the variance of x1 in our model.
> var(data$x1)
[1] 0.5371487
Especially ind60, x1 and y4 have high MIs. However, note that ind60 is an exogenous (independent)
latent variable. As the intercept of ind60 is thus a latent mean and not an indicator intercept, there
is no need to include “ind60 ~1” in our fitE model since we didn’t fix ind60 to zero in fitD. In other
words, the latent mean of ind60 was already free: including it in fitE will not increase the χ2 statistic.
We will only set free the intercepts of x1 and y4. Note that we can’t set free the intercept of x3. If
we would do this, the latent variable ind60 would only have one indicator left with equal intercepts
across groups (i.e. x2) since we already freed the intercept of x1.
fitE <- sem(model, groupdata, group = "Country",
group.equal = c("loadings", "intercepts"),
group.partial = c("y4~1", "x1~1")) ## Note: no negative variances
## are present!
lavTestLRT(fitA, fitB, fitE)
Since metric invariance and partial scalar invariance are now present (although the second p-value,
i.e. the p-value that evaluates partial scalar invariance, is only slightly larger than 0.05), we can now
compare the values of latent means across the two groups (Costa Rica and Moldova).
summary(fitE, fit.measures = TRUE, standardized = TRUE)
lavaan 0.6.14 ended normally after 114 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 87
34
Number of equality constraints 17
Number of observations per group:
Costa Rica 37
Moldova 38
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ind60 =~
x1 1.000 0.651 0.938
x2 (.p2.) 2.276 0.133 17.049 0.000 1.481 0.997
x3 (.p3.) 1.902 0.159 11.960 0.000 1.237 0.863
dem60 =~
y1 1.000 2.230 0.829
y2 (.p5.) 1.267 0.179 7.085 0.000 2.826 0.736
y3 (.p6.) 1.021 0.150 6.789 0.000 2.277 0.721
y4 (.p7.) 1.201 0.136 8.866 0.000 2.679 0.895
dem65 =~
y5 1.000 2.099 0.795
y6 (.p9.) 1.249 0.171 7.300 0.000 2.622 0.799
y7 (.10.) 1.299 0.163 7.988 0.000 2.726 0.849
y8 (.11.) 1.343 0.159 8.428 0.000 2.819 0.808
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dem60 ~
ind60 1.858 0.545 3.409 0.001 0.542 0.542
dem65 ~
ind60 0.959 0.334 2.875 0.004 0.297 0.297
dem60 0.734 0.121 6.070 0.000 0.780 0.780
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y1 ~~
.y5 0.704 0.520 1.353 0.176 0.704 0.292
.y2 ~~
.y4 1.692 0.932 1.816 0.069 1.692 0.486
.y6 1.259 0.792 1.590 0.112 1.259 0.246
.y3 ~~
.y7 1.409 0.787 1.791 0.073 1.409 0.380
.y4 ~~
.y8 -0.403 0.504 -0.801 0.423 -0.403 -0.146
.y6 ~~
.y8 1.201 0.847 1.419 0.156 1.201 0.296
35
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 5.340 0.114 46.849 0.000 5.340 7.702
.x2 (.36.) 5.180 0.244 21.213 0.000 5.180 3.487
.x3 (.37.) 3.883 0.222 17.499 0.000 3.883 2.709
.y1 (.38.) 5.702 0.423 13.482 0.000 5.702 2.120
.y2 (.39.) 4.514 0.576 7.838 0.000 4.514 1.175
.y3 (.40.) 6.833 0.472 14.474 0.000 6.833 2.163
.y4 5.416 0.487 11.113 0.000 5.416 1.808
.y5 (.42.) 5.358 0.402 13.333 0.000 5.358 2.029
.y6 (.43.) 3.163 0.507 6.242 0.000 3.163 0.964
.y7 (.44.) 6.480 0.505 12.831 0.000 6.480 2.019
.y8 (.45.) 4.315 0.526 8.201 0.000 4.315 1.237
ind60 0.000 0.000 0.000
.dem60 0.000 0.000 0.000
.dem65 0.000 0.000 0.000
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.057 0.018 3.126 0.002 0.057 0.120
.x2 0.014 0.065 0.209 0.835 0.014 0.006
.x3 0.524 0.130 4.015 0.000 0.524 0.255
.y1 2.264 0.681 3.324 0.001 2.264 0.313
.y2 6.767 1.844 3.670 0.000 6.767 0.459
.y3 4.790 1.251 3.830 0.000 4.790 0.480
.y4 1.793 0.746 2.403 0.016 1.793 0.200
.y5 2.572 0.692 3.717 0.000 2.572 0.369
.y6 3.885 1.061 3.663 0.000 3.885 0.361
.y7 2.874 0.840 3.424 0.001 2.874 0.279
.y8 4.230 1.211 3.494 0.000 4.230 0.347
ind60 0.423 0.108 3.905 0.000 1.000 1.000
.dem60 3.513 1.101 3.190 0.001 0.706 0.706
.dem65 0.230 0.273 0.842 0.400 0.052 0.052
Group 2 [Moldova]:
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ind60 =~
x1 1.000 0.584 0.895
x2 (.p2.) 2.276 0.133 17.049 0.000 1.329 0.943
x3 (.p3.) 1.902 0.159 11.960 0.000 1.110 0.865
dem60 =~
y1 1.000 2.185 0.872
y2 (.p5.) 1.267 0.179 7.085 0.000 2.769 0.712
y3 (.p6.) 1.021 0.150 6.789 0.000 2.232 0.673
y4 (.p7.) 1.201 0.136 8.866 0.000 2.626 0.787
dem65 =~
y5 1.000 2.020 0.793
y6 (.p9.) 1.249 0.171 7.300 0.000 2.522 0.741
y7 (.10.) 1.299 0.163 7.988 0.000 2.623 0.795
y8 (.11.) 1.343 0.159 8.428 0.000 2.713 0.893
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
dem60 ~
ind60 1.049 0.657 1.597 0.110 0.280 0.280
dem65 ~
ind60 0.539 0.288 1.874 0.061 0.156 0.156
dem60 0.861 0.108 7.955 0.000 0.932 0.932
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y1 ~~
.y5 0.807 0.482 1.674 0.094 0.807 0.423
.y2 ~~
.y4 1.557 0.968 1.607 0.108 1.557 0.276
.y6 2.325 1.088 2.138 0.033 2.325 0.372
.y3 ~~
.y7 0.664 0.943 0.704 0.482 0.664 0.135
.y4 ~~
.y8 1.003 0.619 1.621 0.105 1.003 0.356
.y6 ~~
.y8 0.947 0.652 1.453 0.146 0.947 0.303
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 5.120 0.123 41.727 0.000 5.120 7.852
.x2 (.36.) 5.180 0.244 21.213 0.000 5.180 3.678
.x3 (.37.) 3.883 0.222 17.499 0.000 3.883 3.025
.y1 (.38.) 5.702 0.423 13.482 0.000 5.702 2.274
.y2 (.39.) 4.514 0.576 7.838 0.000 4.514 1.160
.y3 (.40.) 6.833 0.472 14.474 0.000 6.833 2.061
.y4 4.191 0.594 7.059 0.000 4.191 1.256
.y5 (.42.) 5.358 0.402 13.333 0.000 5.358 2.104
36
.y6 (.43.) 3.163 0.507 6.242 0.000 3.163 0.929
.y7 (.44.) 6.480 0.505 12.831 0.000 6.480 1.965
.y8 (.45.) 4.315 0.526 8.201 0.000 4.315 1.420
ind60 -0.344 0.148 -2.330 0.020 -0.589 -0.589
.dem60 -0.152 0.569 -0.267 0.790 -0.069 -0.069
.dem65 0.179 0.274 0.651 0.515 0.088 0.088
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.084 0.028 3.061 0.002 0.084 0.199
.x2 0.218 0.111 1.967 0.049 0.218 0.110
.x3 0.415 0.121 3.445 0.001 0.415 0.252
.y1 1.511 0.538 2.811 0.005 1.511 0.240
.y2 7.478 1.904 3.927 0.000 7.478 0.494
.y3 6.011 1.522 3.951 0.000 6.011 0.547
.y4 4.241 1.185 3.579 0.000 4.241 0.381
.y5 2.406 0.675 3.567 0.000 2.406 0.371
.y6 5.217 1.364 3.825 0.000 5.217 0.451
.y7 3.998 1.099 3.639 0.000 3.998 0.368
.y8 1.871 0.688 2.720 0.007 1.871 0.203
ind60 0.341 0.088 3.865 0.000 1.000 1.000
.dem60 4.401 1.281 3.435 0.001 0.922 0.922
.dem65 0.107 0.251 0.427 0.669 0.026 0.026
From this output, we can conclude that the model fit, based on the χ2 statistic, CFI-index, RMSEA
and SRMR, is close to reasonable (only the RMSEA and SRMR are slightly too high). The latent means
of group 1 (Costa Rica, the reference group) are all standardized to 0.000. The latent means of group
2 (Moldova) are -0.344 (ind60, p = 0.020), -0.152 (dem60, p > 0.05) and 0.179 (dem65, p > 0.05). After
allowing partial scalar measurement invariance, we can thus conclude that the industrialization degree
of Costa Rica was significantly higher in 1960 compared to Moldova. Regarding political democracy in
1960 and 1965, the SEM model did not indicate significant differences between these countries.
Visualizing the model
library(lavaanPlot)
labels = list(ind60 = "Industrialization 1960",
dem60 = "Democracy 1960",
dem65 = "Democracy 1965",
y1 = "Press freedom 1960",
y2 = "Political opposition 1960",
y3 = "Fairness of elections 1960",
y4 = "Effective legislature 1960",
y5 = "Press freedom 1965",
y6 = "Political opposition 1965",
y7 = "Fairness of elections 1965",
y8 = "Effective legislature 1965",
x1 = "GNP PC 1960",
x2 = "Energy consumption PC 1960",
x3 = "Labour force % in industry 1960")
lavaanPlot(model = fit,
labels = labels,
node_options = list(shape = "box", fontname = "Helvetica"),
edge_options = list(color = "grey"),
coefs = TRUE, sig = 0.05)
37
6. Sources
Bean, Jerry. (2021). Using R for Social Work Research. Chapter 4: Confirmatory Factor Analysis, and
Chapter 5: Structural Equation Modelling. College of Social Work, Ohio State University.
Retrieved from https://bookdown.org/bean_jerry/using_r_for_social_work_research/.
Buchanan, Erin. (n.d.). Heywood Cases on the Latent Variable. Structural equation modelling with
lavaan in R. Retrieved from
https://projector-video-pdf-converter.datacamp.com/6419/chapter3.pdf.
Byrne, B.M., Shavelson, R.J., & Muthén, B. (1989). Testing for the equivalence of factor covariance
and mean structures: The issue of partial measurement invariance. Psychological Bulletin,
105(3). 456-466. doi: 10.1037/0033-2909.105.3.456.
Center for Statistical Training. (2019a). Can I estimate a structural equation model if the sample data
are not normally distributed? Retrieved from https://centerstat.org/can-i-estimate-an-sem-if-
the-sample-data-are-not-normally-distributed/.
Center for Statistical Training. (2019b). What are modification indices and should I use them when
fitting SEMs to my own data? Retrieved from https://centerstat.org/what-are-modification-
indices-and-should-i-use-them-when-fitting-sems-to-my-own-data/.
Collier, Joel E. (2020). Applied structural equation modelling using AMOS: Basic to advanced
techniques. Routledge. ISBN: 9780367435264.
Fan, Y., Chen, J., Shirkey, G., Ranjeet, J., Wu, S.R., Park, H., & Shao, C. (2016). Applications of
structural equation modelling (SEM) in ecological research: An updated review.
Ecological Processes, 5. doi: 10.1186/s13717-016-0063-3.
Henseler, J., Ringle, C.M., & Sarstedt, M. (2016). Testing measurement invariance of composites
using partial least squares. International Marketing Review, 33(3). 405-431.
doi: 10.1108/IMR-09-2014-0304.
Kolenikov, S., & Bollen, K.A. (2012). Testing negative error variances: Is a Heywood case a symptom
of misspecification? Sociological Methods & Research, 41(1). 124-167. doi:
10.1177/0049124112442138.
Lishinski, Alex. (2021). Intro to lavaanPlot. Retrieved from
https://cran.r-project.org/web/packages/lavaanPlot/vignettes/Intro_to_lavaanPlot.html.
Lishinski, Alex. (2022). Package ‘lavaanPlot’. Version 0.6.2. Retrieved from
https://cran.r-project.org/web/packages/lavaanPlot/lavaanPlot.pdf.
Meuleman, B., Żółtak, T., Pokropek, A., Davidov, E., Muthén, B., Oberski, D. L., Billiet, J., & Schmidt,
P. (2022). Why Measurement Invariance is Important in Comparative Research. A Response
to Welzel et al. (2021). Sociological Methods & Research, 0(0).
doi: 10.1177/00491241221091755.
Putnick, D.L., Bornstein, M.H. (2016). Measurement Invariance Conventions and Reporting: The
State of the Art and Future Directions for Psychological Research.
Developmental Review, 41. 71-90. doi: 10.1016/j.dr.2016.06.004.
Ramlall, Indranarain. (2016). Applied Structural Equation Modelling for Researchers and
Practitioners. Emerald Group Publishing Limited, Bingley. doi: 10.1108/9781786358820.
Rhudy, J.L., Arnau, R.C., Huber, F.A., Lannon, E.W., Kuhn, B.L., Palit, S., Payne, M.F., Sturycz, C.A.,
Hellman, N., Guereca, Y.M., Toledo, T.A., Shadlow, J.O. (2020). Examining Configural,
Metric, and Scalar Invariance of the Pain Catastrophizing Scale in Native American and Non-
Hispanic White Adults in the Oklahoma Study of Native American Pain Risk (OK-SNAP).
Journal of Pain Research, 13. 961-969. doi: 10.2147/JPR.S242126.
Rosseel, Yves, Vanbrabant, Leonard, van de Schoot, Rens. (2016). Being negative can be good for
your test. 4th Constrained Statistical Inference Meeting March 4, 2016. Tilburg University.
Retrieved from https://users.ugent.be/~yrosseel/lavaan/csi.pdf.
38
Rosseel, Yves. (2017). Mplus estimators: MLM and MLR. Department of Data Analysis. Ghent
University (Belgium). Retrieved from https://users.ugent.be/~yrosseel/lavaan/utrecht2010.pdf.
Rosseel, Yves. (2023a). Package ‘lavaan’. Version 0.6-14. Retrieved from
https://cran.r-project.org/web/packages/lavaan/lavaan.pdf.
Rosseel, Yves. (2023b). The lavaan tutorial. Department of Data Analysis. Ghent University
(Belgium). Retrieved from https://lavaan.ugent.be/tutorial/index.html.
Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural
equation modelling. Psychological Methods, 13(2). 150-170.
doi: 10.1037/1082-989X.13.2.150.
UCLA. (2021a). Confirmatory Factor Analysis in R with lavaan. Advanced Research Computing
Statistical Methods and Data Analysis. Retrieved from
https://stats.oarc.ucla.edu/r/seminars/rcfa/.
UCLA. (2021b). Introduction to Structural Equation Modelling in R with lavaan. Advanced Research
Computing Statistical Methods and Data Analysis. Retrieved from
https://stats.oarc.ucla.edu/r/seminars/rsem/#s1b.
Xu, Kate. (2012). Multiple group measurement invariance analysis in lavaan. Department of
Psychiatry. University of Cambridge. Retrieved from
https://users.ugent.be/~yrosseel/lavaan/multiplegroup6Dec2012.pdf.
39