In this example, we will show how to use lslx
to conduct multi-group semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939
in the package lavaan
. Hence, lavaan
must be installed.
In the following specification, x1
- x9
is assumed to be measurements of 3 latent factors: visual
, textual
, and speed
.
model <-
'
visual :=> fix(1) * x1 + x2 + x3
textual :=> fix(1) * x4 + x5 + x6
speed :=> fix(1) * x7 + x8 + x9
'
The operator :=>
means that the LHS latent factors is defined by the RHS observed variables. In this model, visual
is mainly measured by x1
- x3
, textual
is mainly measured by x4
- x6
, and speed
is mainly measured by x7
- x9
. Loadings of x1
, x4
, and x7
are fixed at 1 for scale setting. The above specification is valid for both groups. Details of model syntax can be found in the section of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Everytime we conduct analysis with lslx
, an lslx
object must be initialized. The following code initializes an lslx
object named r6_lslx
.
library(lslx)
r6_lslx <- lslx$new(model = model,
data = lavaan::HolzingerSwineford1939,
group_variable = "school",
reference_group = "Pasteur")
An 'lslx' R6 class is initialized via 'data'.
Response Variable(s): x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factor(s): visual textual speed
Group(s): Grant-White Pasteur
Reference Group: Pasteur
NOTE: Because Pasteur is set as reference, coefficients in other groups actually represent increments from the reference.
Here, lslx
is the object generator for lslx
object and new
is the build-in method of lslx
to generate a new lslx
object. The initialization of lslx
requires users to specify a model for model specification (argument model
) and a data set to be fitted (argument sample_data
). The data set must contains all the observed variables specified in the given model. Because in this example a multi-group analysis is considered, variable for group labeling (argument group_variable
) must be specified. In lslx, two types of parameterization can be used in multi-group analysis. The first type is the same with the traditional multi-group SEM, which treats model parameters in each group seperately. The second type sets one group as reference and treats model parameters in other gorups as increments with respect to the reference. Under the second type of parameterization, the group heterogeneity can be efficiently explored if we treat the increments as penalized parameters. In this example, Pasteur
is set as reference. Hence, the parameters in Grant-White
now reflect differences from the reference.
After an lslx
object is initialized, the heterogeneity of a multi-group model can be quickly respecified by free_heterogeneity
, fix_heterogeneity
, and penalize_heterogeneity
methods. The following code sets x2<-visual
, x3<-visual
, x5<-textual
, x6<-textual
, x8<-speed
, x9<-speed
, and x2<-1
, x3<-1
, x5<-1
, x6<-1
, x8<-1
, x9<-1
in Grant-White
as penalized parameters. Note that parameters in Grant-White
now reflect differences since Pasteur
is set as reference.
r6_lslx$penalize_heterogeneity(block = "y<-f", group = "Grant-White")
The relation x2<-visual under Grant-White is set as PENALIZED with starting value = 0.
The relation x3<-visual under Grant-White is set as PENALIZED with starting value = 0.
The relation x5<-textual under Grant-White is set as PENALIZED with starting value = 0.
The relation x6<-textual under Grant-White is set as PENALIZED with starting value = 0.
The relation x8<-speed under Grant-White is set as PENALIZED with starting value = 0.
The relation x9<-speed under Grant-White is set as PENALIZED with starting value = 0.
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment.
NOTE: Please check whether the starting value for the increment represents a difference.
r6_lslx$penalize_heterogeneity(block = "y<-1", group = "Grant-White")
The relation x1<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x2<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x3<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x4<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x5<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x6<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x7<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x8<-1 under Grant-White is set as PENALIZED with starting value = 0.
The relation x9<-1 under Grant-White is set as PENALIZED with starting value = 0.
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment.
NOTE: Please check whether the starting value for the increment represents a difference.
Since the homogeneity of latent factor means may not be a reasonable assumtion when examinning measurement invariance, the following code relaxes this assumption
r6_lslx$free_directed(left = c("visual", "textual", "speed"),
right = "1",
group = "Grant-White")
The relation visual<-1 under Grant-White is set as FREE with starting value = NA.
The relation textual<-1 under Grant-White is set as FREE with starting value = NA.
The relation speed<-1 under Grant-White is set as FREE with starting value = NA.
NOTE: Because Pasteur is set as reference, a relation under other group actually represents an increment.
NOTE: Please check whether the starting value for the increment represents a difference.
To see more methods to modify a specified model, please check the section of Set-Related Method via ?lslx
.
After an lslx
object is initialized, method fit_lasso
can be used to fit the specified model into the given data with lasso penalty funtion.
r6_lslx$fit_lasso(lambda_grid = seq(.01, .30, .01))
CONGRATS: The algorithm converged under all specified penalty levels.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
The fit_lasso
requires users to specify the considerd penalty levels (argument lambda_grid
). In this example, the lambda grid is seq(.01, .30, .01)
. All the fitting result will be stored in the fitting
field of r6_lslx
.
Unlike traditional SEM analysis, lslx
fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Availble selectors can be found in the section of Penalty Level Selection via ?lslx
. The following code summarize the fitting result under the penalty level selected by Bayesian information criterion (BIC).
r6_lslx$summarize(selector = "bic")
General Information
number of observations 301
number of complete observations 301
number of missing patterns none
number of groups 2
number of responses 9
number of factors 3
number of free coefficients 48
number of penalized coefficients 15
Fitting Information
penalty method lasso
lambda grid 0.01 - 0.3
delta grid none
algorithm fisher
missing method none
tolerance for convergence 0.001
Saturated Model Information
loss value 0.000
number of non-zero coefficients 108.000
degree of freedom 0.000
Baseline Model Information
loss value 3.181
number of non-zero coefficients 36.000
degree of freedom 72.000
Numerical Condition
lambda 0.120
delta none
objective value 0.518
objective gradient absolute maximum 0.001
objective Hessian convexity 0.459
number of iterations 2.000
loss value 0.459
number of non-zero coefficients 50.000
degree of freedom 58.000
robust degree of freedom 60.256
scaling factor 1.039
Information Criteria
Akaike information criterion (aic) 0.074
Akaike information criterion with penalty 3 (aic3) -0.119
consistent Akaike information criterion (caic) -0.833
Bayesian information criterion (bic) -0.640
adjusted Bayesian information criterion (abic) -0.029
Haughton Bayesian information criterion (hbic) -0.286
robust Akaike information criterion (raic) 0.059
robust Akaike information criterion with penalty 3 (raic3) -0.141
robust consistent Akaike information criterion (rcaic) -0.883
robust Bayesian information criterion (rbic) -0.683
robust adjusted Bayesian information criterion (rabic) -0.048
robust Haughton Bayesian information criterion (rhbic) -0.315
Fit Indices
root mean square error of approximation (rmsea) 0.096
comparative fit index (cfi) 0.909
non-normed fit index (nnfi) 0.887
standardized root mean of residual (srmr) 0.096
Likelihood Ratio Test
statistic df p-value
unadjusted 138.268 58.000 0.000
mean-adjusted 133.092 58.000 0.000
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.096 0.071 0.120
mean-adjusted 0.095 0.069 0.120
Coefficient Test (Group = "Pasteur", Standard Error = "sandwich", Alpha Level = 0.05)
Factor Loading (reference component)
type estimate std.error z-value p-value lower upper
x1<-visual fixed 1.000 - - - - -
x2<-visual free 0.582 0.132 4.402 0.000 0.323 0.841
x3<-visual free 0.774 0.148 5.210 0.000 0.483 1.065
x4<-textual fixed 1.000 - - - - -
x5<-textual free 1.120 0.067 16.598 0.000 0.987 1.252
x6<-textual free 0.932 0.063 14.679 0.000 0.807 1.056
x7<-speed fixed 1.000 - - - - -
x8<-speed free 1.170 0.132 8.873 0.000 0.912 1.429
x9<-speed free 1.028 0.203 5.059 0.000 0.630 1.427
Covariance (reference component)
type estimate std.error z-value p-value lower upper
textual<->visual free 0.414 0.131 3.154 0.001 0.157 0.671
speed<->visual free 0.174 0.066 2.629 0.004 0.044 0.304
speed<->textual free 0.176 0.061 2.905 0.002 0.057 0.294
Variance (reference component)
type estimate std.error z-value p-value lower upper
visual<->visual free 0.820 0.228 3.592 0.000 0.372 1.267
textual<->textual free 0.880 0.135 6.540 0.000 0.616 1.143
speed<->speed free 0.312 0.085 3.667 0.000 0.145 0.479
x1<->x1 free 0.537 0.173 3.107 0.001 0.198 0.876
x2<->x2 free 1.285 0.171 7.527 0.000 0.950 1.619
x3<->x3 free 0.905 0.129 6.998 0.000 0.651 1.158
x4<->x4 free 0.445 0.070 6.325 0.000 0.307 0.584
x5<->x5 free 0.503 0.083 6.025 0.000 0.339 0.666
x6<->x6 free 0.263 0.058 4.525 0.000 0.149 0.377
x7<->x7 free 0.859 0.115 7.462 0.000 0.633 1.085
x8<->x8 free 0.526 0.091 5.752 0.000 0.347 0.706
x9<->x9 free 0.655 0.116 5.627 0.000 0.427 0.883
Intercept (reference component)
type estimate std.error z-value p-value lower upper
x1<-1 free 4.957 0.092 53.727 0.000 4.776 5.138
x2<-1 free 6.119 0.082 74.358 0.000 5.958 6.281
x3<-1 free 2.382 0.094 25.243 0.000 2.197 2.567
x4<-1 free 2.778 0.087 31.910 0.000 2.607 2.949
x5<-1 free 4.035 0.103 39.168 0.000 3.833 4.237
x6<-1 free 1.926 0.075 25.774 0.000 1.779 2.072
x7<-1 free 4.333 0.088 49.375 0.000 4.161 4.505
x8<-1 free 5.602 0.075 75.022 0.000 5.455 5.748
x9<-1 free 5.438 0.072 75.735 0.000 5.297 5.579
Coefficient Test (Group = "Grant-White", Standard Error = "sandwich", Alpha Level = 0.05)
Factor Loading (increment component)
type estimate std.error z-value p-value lower upper
x1<-visual fixed 0.000 - - - - -
x2<-visual pen 0.000 - - - - -
x3<-visual pen 0.000 - - - - -
x4<-textual fixed 0.000 - - - - -
x5<-textual pen 0.000 - - - - -
x6<-textual pen 0.000 - - - - -
x7<-speed fixed 0.000 - - - - -
x8<-speed pen 0.000 - - - - -
x9<-speed pen 0.000 - - - - -
Covariance (increment component)
type estimate std.error z-value p-value lower upper
textual<->visual free 0.017 0.145 0.115 0.454 -0.268 0.301
speed<->visual free 0.149 0.108 1.385 0.083 -0.062 0.361
speed<->textual free 0.053 0.110 0.484 0.314 -0.162 0.268
Variance (increment component)
type estimate std.error z-value p-value lower upper
visual<->visual free -0.089 0.204 -0.437 0.331 -0.488 0.310
textual<->textual free -0.009 0.167 -0.056 0.478 -0.336 0.317
speed<->speed free 0.175 0.097 1.814 0.035 -0.014 0.364
x1<->x1 free 0.100 0.177 0.564 0.286 -0.248 0.448
x2<->x2 free -0.332 0.225 -1.472 0.071 -0.773 0.110
x3<->x3 free -0.285 0.144 -1.972 0.024 -0.568 -0.002
x4<->x4 free -0.102 0.093 -1.095 0.137 -0.286 0.081
x5<->x5 free -0.126 0.103 -1.226 0.110 -0.328 0.075
x6<->x6 free 0.174 0.093 1.873 0.031 -0.008 0.356
x7<->x7 free -0.253 0.138 -1.829 0.034 -0.524 0.018
x8<->x8 free -0.108 0.141 -0.764 0.223 -0.384 0.169
x9<->x9 free -0.128 0.142 -0.904 0.183 -0.407 0.150
Intercept (increment component)
type estimate std.error z-value p-value lower upper
visual<-1 free -0.047 0.130 -0.361 0.359 -0.303 0.209
textual<-1 free 0.576 0.120 4.790 0.000 0.340 0.812
speed<-1 free -0.125 0.094 -1.334 0.091 -0.308 0.059
x1<-1 pen 0.000 - - - - -
x2<-1 pen 0.000 - - - - -
x3<-1 pen -0.273 0.121 -2.252 0.012 -0.510 -0.035
x4<-1 pen 0.000 - - - - -
x5<-1 pen 0.000 - - - - -
x6<-1 pen 0.000 - - - - -
x7<-1 pen -0.212 0.113 -1.867 0.031 -0.434 0.011
x8<-1 pen 0.000 - - - - -
x9<-1 pen 0.000 - - - - -
In this example, we can see that all of the loadings are invariant across the two groups. However, the intercepts of x3
and x7
seem to be not invariant. The summarize
method also shows the result of significance tests for the coefficients. In lslx
, the default standard errors are calculated based on sandwich formula whenever raw data is available. It is generally valid even when the model is misspecified and the data is not normal. However, it may not be valid after selecting an optimal penalty level.