In this example, we will show how to use lslx
to conduct semi-confirmatory structural equation modeling. The example uses data PoliticalDemocracy
in the package lavaan
. Hence, lavaan
must be installed.
In the following specification, x1
- x3
and y1
- y8
is assumed to be measurements of 3 latent factors: ind60
, dem60
, and dem65
.
model <-
'
fix(1) * x1 + x2 + x3 <=: ind60
fix(1) * y1 + y2 + y3 + y4 <=: dem60
fix(1) * y5 + y6 + y7 + y8 <=: dem65
dem60 <= ind60
dem65 <= ind60 + dem60
'
The operator <=:
means that the RHS latent factors is defined by the LHS observed variables. In particular, the loadings are freely estimated. In this model, ind60
is measured by x1
- x3
, dem60
is mainly measured by y1
- y4
, and dem65
is mainly measured by y5
- y8
. The operator <=
means that the regression coefficients from the RHS variables to the LHS variables are freely estimated. In this model, dem60
is influenced by ind60
and dem65
is influenced by dem60
and ind60
. Details of model syntax can be found in the section of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Everytime we conduct analysis with lslx
, an lslx
object must be initialized. The following code initializes an lslx
object named r6_lslx
.
library(lslx)
r6_lslx <- lslx$new(model = model,
sample_cov = cov(lavaan::PoliticalDemocracy),
sample_size = nrow(lavaan::PoliticalDemocracy))
NOTE: Because argument 'sample_cov' doesn't contain group name(s), default group name(s) is created.
NOTE: Because argument 'sample_mean' is missing, default 'sample_mean' is created.
An 'lslx' R6 class is initialized via 'sample_cov'.
Response Variable(s): x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8
Latent Factor(s): ind60 dem60 dem65
Here, lslx
is the object generator for lslx
object and new
is the build-in method of lslx
to generate a new lslx
object. The initialization of lslx
requires users to specify a model for model specification (argument model
) and a sample moments to be fitted (argument sample_cov
and sample_size
). The sample moment must contains all the observed variables specified in the given model.
After an lslx
object is initialized, model can be respecified by free_coefficent
, fix_coefficent
, and penalize_coefficent
methods. The following code sets y1<->y5
, y2<->y4
, y2<->y6
, y3<->y7
, y4<->y8
, and y6<->y8
as penalized parameters.
r6_lslx$penalize_coefficient(name = c("y1<->y5",
"y2<->y4",
"y2<->y6",
"y3<->y7",
"y4<->y8",
"y6<->y8"))
The relation y5<->y1 under G is set as PENALIZED with starting value = 0.
The relation y4<->y2 under G is set as PENALIZED with starting value = 0.
The relation y6<->y2 under G is set as PENALIZED with starting value = 0.
The relation y7<->y3 under G is set as PENALIZED with starting value = 0.
The relation y8<->y4 under G is set as PENALIZED with starting value = 0.
The relation y8<->y6 under G is set as PENALIZED with starting value = 0.
To see more methods for respecifying model, please check the section of Set-Related Method via ?lslx
.
After an lslx
object is initialized, method fit_mcp
can be used to fit the specified model into the given data with mcp penalty funtion.
r6_lslx$fit_mcp(lambda_grid = seq(.01, .30, .01),
delta_grid = Inf)
CONGRATS: The algorithm converged under all specified penalty levels.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
The fit_mcp
requires users to specify the considerd penalty levels (argument lambda_grid
and delta_grid
). In this example, the lambda grid is seq(.01, .30, .01)
and the delta grid is Inf
. Note that in this example delta = Inf
makes mcp to be equivalent to the lasso panalty. All the fitting result will be stored in the fitting
field of r6_lslx
.
Unlike traditional SEM analysis, lslx
fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Availble selectors can be found in the section of Penalty Level Selection via ?lslx
. The following code summarize the fitting result under the penalty level selected by Akaike information criterion (AIC).
r6_lslx$summarize(selector = "aic")
General Information
number of observations 75
number of complete observations 75
number of missing patterns none
number of groups 1
number of responses 11
number of factors 3
number of free coefficients 36
number of penalized coefficients 6
Fitting Information
penalty method mcp
lambda grid 0.01 - 0.3
delta grid Inf
algorithm fisher
missing method none
tolerance for convergence 0.001
Saturated Model Information
loss value 0.000
number of non-zero coefficients 77.000
degree of freedom 0.000
Baseline Model Information
loss value 9.739
number of non-zero coefficients 22.000
degree of freedom 55.000
Numerical Condition
lambda 0.010
delta Inf
objective value 0.572
objective gradient absolute maximum 0.000
objective Hessian convexity 0.029
number of iterations 3.000
loss value 0.511
number of non-zero coefficients 42.000
degree of freedom 35.000
robust degree of freedom NaN
scaling factor NaN
Information Criteria
Akaike information criterion (aic) -0.423
Akaike information criterion with penalty 3 (aic3) -0.889
consistent Akaike information criterion (caic) -1.971
Bayesian information criterion (bic) -1.504
adjusted Bayesian information criterion (abic) -0.033
Haughton Bayesian information criterion (hbic) -0.646
robust Akaike information criterion (raic) NaN
robust Akaike information criterion with penalty 3 (raic3) NaN
robust consistent Akaike information criterion (rcaic) NaN
robust Bayesian information criterion (rbic) NaN
robust adjusted Bayesian information criterion (rabic) NaN
robust Haughton Bayesian information criterion (rhbic) NaN
Fit Indices
root mean square error of approximation (rmsea) 0.035
comparative fit index (cfi) 0.995
non-normed fit index (nnfi) 0.992
standardized root mean of residual (srmr) 0.045
Likelihood Ratio Test
statistic df p-value
unadjusted 38.302 35.000 0.322
mean-adjusted - - -
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.035 0.000 0.101
mean-adjusted - - -
Coefficient Test (Standard Error = "observed_fisher", Alpha Level = 0.05)
Factor Loading
type estimate std.error z-value p-value lower upper
x1<-ind60 fixed 1.000 - - - - -
x2<-ind60 free 2.181 0.139 15.665 0.000 1.908 2.453
x3<-ind60 free 1.819 0.152 11.941 0.000 1.520 2.117
y1<-dem60 fixed 1.000 - - - - -
y2<-dem60 free 1.268 0.184 6.901 0.000 0.908 1.628
y3<-dem60 free 1.060 0.148 7.156 0.000 0.769 1.350
y4<-dem60 free 1.272 0.150 8.491 0.000 0.978 1.565
y5<-dem65 fixed 1.000 - - - - -
y6<-dem65 free 1.192 0.170 7.002 0.000 0.859 1.526
y7<-dem65 free 1.283 0.160 8.023 0.000 0.969 1.596
y8<-dem65 free 1.272 0.163 7.824 0.000 0.953 1.591
Regression
type estimate std.error z-value p-value lower upper
dem60<-ind60 free 1.480 0.396 3.736 0.000 0.703 2.256
dem65<-ind60 free 0.564 0.232 2.437 0.007 0.110 1.019
dem65<-dem60 free 0.839 0.099 8.499 0.000 0.646 1.033
Covariance
type estimate std.error z-value p-value lower upper
y5<->y1 pen 0.612 0.363 1.686 0.046 -0.099 1.324
y4<->y2 pen 1.185 0.671 1.765 0.039 -0.131 2.501
y6<->y2 pen 2.011 0.693 2.901 0.002 0.652 3.369
y7<->y3 pen 0.705 0.611 1.154 0.124 -0.493 1.903
y8<->y4 pen 0.315 0.449 0.701 0.242 -0.565 1.194
y8<->y6 pen 1.286 0.553 2.326 0.010 0.203 2.370
Variance
type estimate std.error z-value p-value lower upper
ind60<->ind60 free 0.454 0.088 5.168 0.000 0.282 0.627
dem60<->dem60 free 3.995 0.951 4.201 0.000 2.131 5.859
dem65<->dem65 free 0.171 0.220 0.779 0.218 -0.260 0.603
x1<->x1 free 0.083 0.020 4.139 0.000 0.044 0.122
x2<->x2 free 0.121 0.071 1.711 0.044 -0.018 0.261
x3<->x3 free 0.473 0.090 5.232 0.000 0.296 0.651
y1<->y1 free 1.920 0.463 4.152 0.000 1.014 2.827
y2<->y2 free 7.185 1.284 5.597 0.000 4.669 9.701
y3<->y3 free 5.106 0.960 5.316 0.000 3.224 6.989
y4<->y4 free 3.097 0.733 4.226 0.000 1.661 4.534
y5<->y5 free 2.377 0.485 4.901 0.000 1.427 3.328
y6<->y6 free 4.854 0.859 5.649 0.000 3.170 6.538
y7<->y7 free 3.438 0.718 4.788 0.000 2.031 4.846
y8<->y8 free 3.216 0.688 4.675 0.000 1.868 4.564
Intercept
type estimate std.error z-value p-value lower upper
x1<-1 free 0.000 0.085 0.000 0.500 -0.166 0.166
x2<-1 free 0.000 0.174 0.000 0.500 -0.342 0.342
x3<-1 free 0.000 0.162 0.000 0.500 -0.318 0.318
y1<-1 free 0.000 0.304 0.000 0.500 -0.595 0.595
y2<-1 free 0.000 0.450 0.000 0.500 -0.883 0.883
y3<-1 free 0.000 0.378 0.000 0.500 -0.741 0.741
y4<-1 free 0.000 0.386 0.000 0.500 -0.756 0.756
y5<-1 free 0.000 0.302 0.000 0.500 -0.592 0.592
y6<-1 free 0.000 0.387 0.000 0.500 -0.758 0.758
y7<-1 free 0.000 0.379 0.000 0.500 -0.743 0.743
y8<-1 free 0.000 0.373 0.000 0.500 -0.731 0.731
In this example, we can see that the PL estimate under the selected penalty level doesn’t contain any zero value, which indicates that all of the covariance of measurements are relevant. The summarize
method also shows the result of significance tests for the coefficients. In lslx
, the default standard errors are calculated based on sandwich formula whenever raw data is available. In this example, because raw data is not used for lslx
object initialization, standarrd error is calculated by using observed Fisher information matrix. It may not be valid when the model is misspecified and the data are not normal. Also, it is generally invalid after choosing a penalty level.
In lslx
, many quantities related to SEM can be extracted by extract-related method. For example, the coefficient estimate and its asymptotic variance can be obtained by
r6_lslx$extract_coefficient(selector = "bic")
x1<-1|G x2<-1|G x3<-1|G y1<-1|G y2<-1|G y3<-1|G
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
y4<-1|G y5<-1|G y6<-1|G y7<-1|G y8<-1|G dem60<-ind60|G
0.0000 0.0000 0.0000 0.0000 0.0000 1.4799
dem65<-ind60|G dem65<-dem60|G x1<-ind60|G x2<-ind60|G x3<-ind60|G y1<-dem60|G
0.5645 0.8391 1.0000 2.1805 1.8185 1.0000
y2<-dem60|G y3<-dem60|G y4<-dem60|G y5<-dem65|G y6<-dem65|G y7<-dem65|G
1.2682 1.0596 1.2717 1.0000 1.1922 1.2828
y8<-dem65|G ind60<->ind60|G dem60<->dem60|G dem65<->dem65|G x1<->x1|G x2<->x2|G
1.2719 0.4545 3.9948 0.1714 0.0829 0.1215
x3<->x3|G y1<->y1|G y5<->y1|G y2<->y2|G y4<->y2|G y6<->y2|G
0.4733 1.9203 0.6124 7.1852 1.1854 2.0109
y3<->y3|G y7<->y3|G y4<->y4|G y8<->y4|G y5<->y5|G y6<->y6|G
5.1063 0.7051 3.0973 0.3146 2.3774 4.8543
y8<->y6|G y7<->y7|G y8<->y8|G
1.2864 3.4384 3.2156
diag(r6_lslx$extract_coefficient_acov(selector = "bic"))
x1<-1|G x2<-1|G x3<-1|G y1<-1|G y2<-1|G y3<-1|G
0.007165 0.030431 0.026350 0.092137 0.202812 0.142779
y4<-1|G y5<-1|G y6<-1|G y7<-1|G y8<-1|G dem60<-ind60|G
0.148890 0.091261 0.149386 0.143854 0.139237 0.156929
dem65<-ind60|G dem65<-dem60|G x1<-ind60|G x2<-ind60|G x3<-ind60|G y1<-dem60|G
0.053667 0.009748 NA 0.019375 0.023195 NA
y2<-dem60|G y3<-dem60|G y4<-dem60|G y5<-dem65|G y6<-dem65|G y7<-dem65|G
0.033772 0.021924 0.022429 NA 0.028990 0.025562
y8<-dem65|G ind60<->ind60|G dem60<->dem60|G dem65<->dem65|G x1<->x1|G x2<->x2|G
0.026431 0.007734 0.904346 0.048399 0.000401 0.005045
x3<->x3|G y1<->y1|G y5<->y1|G y2<->y2|G y4<->y2|G y6<->y2|G
0.008183 0.213933 0.131906 1.647994 0.450813 0.480474
y3<->y3|G y7<->y3|G y4<->y4|G y8<->y4|G y5<->y5|G y6<->y6|G
0.922531 0.373563 0.537139 0.201409 0.235288 0.738362
y8<->y6|G y7<->y7|G y8<->y8|G
0.305785 0.515648 0.473059
The NA
standard errors mean that the corresponding coefficients are fixed or identified as zero by penalty.