In this example, we will show how to use lslx
to conduct semi-confirmatory factor analysis. The example uses data HolzingerSwineford1939
in the package lavaan
. Hence, lavaan
must be installed.
Because HolzingerSwineford1939
doesn’t contain missing values, we use the code in semTools
to create NA
(see the example of twostage
function in semTools
).
data <- lavaan::HolzingerSwineford1939
data$x5 <- ifelse(data$x1 <= quantile(data$x1, .3), NA, data$x5)
data$age <- data$ageyr + data$agemo/12
data$x9 <- ifelse(data$age <= quantile(data$age, .3), NA, data$x9)
By the construction, we can see that the missingness of x5
depends on the value of x1
and the missingness of x9
relies on the age
variable. Note that age
is created by ageyr
and agemo
. Since ageyr
and agemo
are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis.
The following model specification is the same to our example of semi-confirmatory factor analysis (see vignette("factor-analysis")
).
model <-
'
visual :=> x1 + x2 + x3
textual :=> x4 + x5 + x6
speed :=> x7 + x8 + x9
visual :~> x4 + x5 + x6 + x7 + x8 + x9
textual :~> x1 + x2 + x3 + x7 + x8 + x9
speed :~> x1 + x2 + x3 + x4 + x5 + x6
visual <=> fix(1) * visual
textual <=> fix(1) * textual
speed <=> fix(1) * speed
'
To initialize an lslx
object with auxiliary variables, we need to specify the auxiliary_variable
argument. auxiliary_variable
argument only accepts numeric variables. If any categorical variable ia considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using model.matrix
function.
library(lslx)
r6_lslx <- lslx$new(model = model,
data = data,
auxiliary_variable = c("ageyr", "agemo"))
An 'lslx' R6 class is initialized via 'data'.
Response Variable(s): x1 x2 x3 x4 x5 x6 x7 x8 x9
Latent Factor(s): visual textual speed
Auxiliary Variable(s): ageyr agemo
So far, the specified auxiliary variables are only stored in lslx
object. They are actually used after implementing the fit
related methods.
r6_lslx$fit(penalty_method = "mcp",
lambda_grid = seq(.01, .30, .01),
delta_grid = c(5, 10))
CONGRATS: The algorithm converged under all specified penalty levels.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
By default, fit
related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via missing_method
argument. Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two step method.
The following code summarizes the fitting result under the penalty level selected by Bayesian information criterion (BIC). The number of missing pattern
shows how many missing patterns present in the data set (include the complete pattern). If the lslx
object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test. The correction is based on the asymptotic covariance of saturated derived by full information maximum likelihood. Also, the mean adjusted likelihood ratio test is based on this quantity. For the reference, please see the section of Missing Data in ?lslx
r6_lslx$summarize(selector = "bic")
General Information
number of observations 301.000
number of complete observations 138.000
number of missing patterns 4.000
number of groups 1.000
number of responses 9.000
number of factors 3.000
number of free coefficients 30.000
number of penalized coefficients 18.000
Fitting Information
penalty method mcp
lambda grid 0.01 - 0.3
delta grid 5 - 10
algorithm fisher
missing method two stage
tolerance for convergence 0.001
Saturated Model Information
loss value 0.000
number of non-zero coefficients 54.000
degree of freedom 0.000
Baseline Model Information
loss value 2.937
number of non-zero coefficients 18.000
degree of freedom 36.000
Numerical Condition
lambda 0.150
delta 5.000
objective value 0.214
objective gradient absolute maximum 0.000
objective Hessian convexity 0.683
number of iterations 3.000
loss value 0.151
number of non-zero coefficients 33.000
degree of freedom 21.000
robust degree of freedom 26.715
scaling factor 1.272
Information Criteria
Akaike information criterion (aic) 0.012
Akaike information criterion with penalty 3 (aic3) -0.058
consistent Akaike information criterion (caic) -0.317
Bayesian information criterion (bic) -0.247
adjusted Bayesian information criterion (abic) -0.026
Haughton Bayesian information criterion (hbic) -0.119
robust Akaike information criterion (raic) -0.026
robust Akaike information criterion with penalty 3 (raic3) -0.115
robust consistent Akaike information criterion (rcaic) -0.444
robust Bayesian information criterion (rbic) -0.355
robust adjusted Bayesian information criterion (rabic) -0.074
robust Haughton Bayesian information criterion (rhbic) -0.192
Fit Indices
root mean square error of approximation (rmsea) 0.062
comparative fit index (cfi) 0.971
non-normed fit index (nnfi) 0.950
standardized root mean of residual (srmr) 0.040
Likelihood Ratio Test
statistic df p-value
unadjusted 45.502 21.000 0.001
mean-adjusted 35.768 21.000 0.023
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.062 0.032 0.092
mean-adjusted 0.055 0.006 0.090
Coefficient Test (Standard Error = "sandwich", Alpha Level = 0.05)
Factor Loading
type estimate std.error z-value p-value lower upper
x1<-visual free 0.935 0.107 8.701 0.000 0.724 1.146
x2<-visual free 0.475 0.094 5.065 0.000 0.291 0.658
x3<-visual free 0.630 0.082 7.681 0.000 0.469 0.791
x4<-visual pen 0.000 - - - - -
x5<-visual pen -0.160 0.119 -1.345 0.089 -0.393 0.073
x6<-visual pen 0.000 - - - - -
x7<-visual pen -0.084 0.101 -0.835 0.202 -0.281 0.113
x8<-visual pen 0.000 - - - - -
x9<-visual pen 0.230 0.080 2.886 0.002 0.074 0.385
x1<-textual pen 0.000 - - - - -
x2<-textual pen 0.000 - - - - -
x3<-textual pen 0.000 - - - - -
x4<-textual free 0.978 0.062 15.839 0.000 0.857 1.099
x5<-textual free 1.130 0.079 14.281 0.000 0.975 1.285
x6<-textual free 0.917 0.059 15.651 0.000 0.802 1.031
x7<-textual pen 0.000 - - - - -
x8<-textual pen 0.000 - - - - -
x9<-textual pen 0.000 - - - - -
x1<-speed pen 0.000 - - - - -
x2<-speed pen 0.000 - - - - -
x3<-speed pen 0.000 - - - - -
x4<-speed pen 0.000 - - - - -
x5<-speed pen 0.000 - - - - -
x6<-speed pen 0.000 - - - - -
x7<-speed free 0.721 0.101 7.163 0.000 0.524 0.918
x8<-speed free 0.766 0.084 9.164 0.000 0.602 0.930
x9<-speed free 0.461 0.072 6.398 0.000 0.320 0.603
Covariance
type estimate std.error z-value p-value lower upper
textual<->visual free 0.463 0.070 6.574 0.000 0.325 0.601
speed<->visual free 0.332 0.085 3.885 0.000 0.164 0.499
speed<->textual free 0.220 0.081 2.707 0.003 0.061 0.379
Variance
type estimate std.error z-value p-value lower upper
visual<->visual fixed 1.000 - - - - -
textual<->textual fixed 1.000 - - - - -
speed<->speed fixed 1.000 - - - - -
x1<->x1 free 0.461 0.186 2.474 0.007 0.096 0.826
x2<->x2 free 1.150 0.116 9.893 0.000 0.922 1.378
x3<->x3 free 0.867 0.099 8.795 0.000 0.674 1.061
x4<->x4 free 0.389 0.053 7.274 0.000 0.284 0.494
x5<->x5 free 0.395 0.070 5.652 0.000 0.258 0.531
x6<->x6 free 0.351 0.046 7.641 0.000 0.261 0.441
x7<->x7 free 0.688 0.115 6.003 0.000 0.463 0.913
x8<->x8 free 0.433 0.112 3.882 0.000 0.214 0.652
x9<->x9 free 0.618 0.076 8.094 0.000 0.468 0.768
Intercept
type estimate std.error z-value p-value lower upper
x1<-1 free 4.936 0.067 73.473 0.000 4.804 5.067
x2<-1 free 6.088 0.068 89.855 0.000 5.955 6.221
x3<-1 free 2.250 0.065 34.579 0.000 2.123 2.378
x4<-1 free 3.061 0.067 45.694 0.000 2.930 3.192
x5<-1 free 4.420 0.093 47.369 0.000 4.237 4.603
x6<-1 free 2.186 0.063 34.667 0.000 2.062 2.309
x7<-1 free 4.186 0.063 66.766 0.000 4.063 4.309
x8<-1 free 5.527 0.058 94.854 0.000 5.413 5.641
x9<-1 free 5.366 0.074 72.693 0.000 5.221 5.510