Yoon, Steiner, and Reinhardt (2003) conducted a study of time spent by patients admitted to the emergency department of the University of Alberta Hospital between midnight January 23 and midnight January 29, 1999, for five stages of ED assessment and treatment: Registration, triage assessment, nursing assessment, physician assessment, and disposition decision. While Yoon, et al. analyzed predictors of the total length of stay in the emergency ward, we will follow the analyses in Smithson and Broomell (2022), who examine the proportions of the patients’ stays in the various stages.
Smithson and Broomell observed that the data include a substantial number of zeros (e.g., 696 out of 894 patients spending no time in the decision stage). They reduced the zeros by aggregating the decision and physician stages, and aggregating the registration and triage stages. The resulting composition had three parts: Registration-triage, nursing assessment, and physician-decision. We will use that composition here.
Our example focuses on the proportion of time spent in the Registration-triage stage. Patients arriving by ambulance tended to have more life-threatening conditions than those arriving as ``walk-ins’’, so we expect to find that the ambulance-arrivals spend a smaller proportion of their time in this preliminary stage because many of them need to be rushed into treatment. The more serious cases also typically required lengthy nursing and treatment times, so excpect that longer length of stay will predict a lower proportion of time spent in the Registration-triage stage.
A quick examination of the relevant variables reveals that the log of the length of stay adequately corrects skew in that variable, and the split between ambulance-arrivals and walk-ins has adequate numbers of cases in both categories.
# Ambulance-arrival vs walk-in split shows 683 walk-ins and 211 ambulance-arrivals
table(yoon$Ambulance)
##
## 0 1
## 683 211
#
# Length of stay (in hours) is skewed, but the log of it isn't:
hist(yoon$LOSh, breaks = 50, main = "", xlab = "LOS", col = "gray")
hist(yoon$LOSh, breaks = 50, main = "", xlab = "ln(LOS)", col = "gray")
hist(log(yoon$LOSh), breaks = 50, main = "", xlab = "ln(LOS)", col = "gray")
<- log(yoon$LOSh)
loglosh #
This example compares outer-position distributions for goodness-of-fit. The Cauchit-ArcSinh outer-W model turns out to have the best fit, and it identifies significant effects of both ambulance arrival and log of length of stay in the expected directions for the \(\theta\) (location) submodel. Note that the coefficients are positive for Ambulance and loglosh, because \(\theta\) tracks skew and therefore a positive coefficient predicts a decrease in the median proportion of time spent in the Registration-triage stage.
There is only a marginally significant effect of loglosh in the \(\sigma\) (dispersion) submodel. Nonetheless, a model without the dispersion submodel effects suffers a significant decline in goodness-of-fit. However, a model with interaction-effect terms does not significantly improve fit over the main-effects model.
Does a 3-parameter model contribute anything more? Note that we cannot compare 2- and 3-parameer models with likelihood-ratio tests because they are not nested. However, we can examine their AIC values. These turn out to be nearly identical, with the 2-parameter AIC slightly smaller. Moreover, there are no significant effects in the \(\mu\) submodel, so the 2-paraameter model appears to be our best alternative.
# Examine alternative distribution models' goodness-of-fit:
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "arcsinh", sd = "arcsinh", inner = FALSE, version = "V", data = yoon); logLik(fit1) fit1
## 'log Lik.' 911.3165 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "arcsinh", sd = "cauchy", inner = FALSE, version = "V", data = yoon); logLik(fit2) fit2
## 'log Lik.' 897.9554 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "cauchit", sd = "arcsinh", inner = FALSE, version = "V", data = yoon); logLik(fit3) fit3
## 'log Lik.' 922.6401 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "cauchit", sd = "arcsinh", inner = FALSE, version = "W", data = yoon); logLik(fit4) fit4
## 'log Lik.' 935.8294 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "cauchit", sd = "cauchy", inner = FALSE, version = "V", data = yoon); logLik(fit5) fit5
## 'log Lik.' 883.0797 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "cauchit", sd = "cauchy", inner = FALSE, version = "W", data = yoon); logLik(fit6) fit6
## 'log Lik.' 928.6193 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "t2", sd = "t2", inner = FALSE, version = "V", data = yoon); logLik(fit7) fit7
## 'log Lik.' 927.1892 (df=6)
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "t2", sd = "t2", inner = FALSE, version = "W", data = yoon); logLik(fit8) fit8
## 'log Lik.' 880.8514 (df=6)
#
# The Cauchit-ArcSinh outer-W model fits best.
summary(fit4)
## Family: cauchit arcsinh
## Call: cdfquantregFT(formula = pregptriage ~ Ambulance + loglosh | Ambulance +
## loglosh, data = yoon, fd = "cauchit", sd = "arcsinh", inner = FALSE,
## version = "W")
##
## No coefficients in location submodel
##
## Sigma coefficients (Dispersion submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4177 0.1330 -3.141 0.00168 **
## Ambulance -0.1109 0.6242 -0.178 0.85897
## loglosh 0.2429 0.1323 1.835 0.06645 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta coefficients (Skewness submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.3622 0.1429 9.536 < 2e-16 ***
## Ambulance 1.4479 0.6460 2.241 0.025 *
## loglosh 0.6019 0.1389 4.335 1.46e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Converge: successful completion
## Log-Likelihood: 935.8294
#
# Does a model without the effects in the dispersion submodel fit as well?
# The answer is no:
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|1, fd = "cauchit", sd = "arcsinh", inner = FALSE, version = "W", data = yoon); anova(fit4b, fit4) fit4b
## Likelihood ratio tests
##
## Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1 890 -1826.5
## 2 888 -1871.7 2 45.107 1.603e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# How about a 3-parameter model?
<- cdfquantregFT(pregptriage ~ Ambulance + loglosh|Ambulance + loglosh, fd = "cauchit", sd = "arcsinh", mu.fo ~ Ambulance + loglosh, inner = FALSE, version = "W", data = yoon)
fit4c c(AIC(fit4),AIC(fit4c))
## [1] -1859.659 -1859.998
# The AIC values are nearly identical. Moroever, there are no significant $\mu$ effects:
summary(fit4c)
## Family: cauchit arcsinh
## Call: cdfquantregFT(formula = pregptriage ~ Ambulance + loglosh | Ambulance +
## loglosh, data = yoon, mu.fo = mu.fo ~ Ambulance + loglosh,
## fd = "cauchit", sd = "arcsinh", inner = FALSE, version = "W")
##
## Mu coefficients (Location submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.2159 0.1360 -1.587 0.112
## Ambulance 0.2287 0.3933 0.582 0.561
## loglosh -0.1405 0.1419 -0.990 0.322
##
## Sigma coefficients (Dispersion submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3335 0.1218 -2.738 0.00617 **
## Ambulance -0.2530 0.3097 -0.817 0.41397
## loglosh 0.3163 0.1198 2.641 0.00827 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta coefficients (Skewness submodel)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1347 0.1701 6.670 2.56e-11 ***
## Ambulance 1.6702 0.3277 5.097 3.44e-07 ***
## loglosh 0.5586 0.1361 4.105 4.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Converge: successful completion
## Log-Likelihood: 938.9992
#
# Finally, what about a 2-parameter model with moderator effects?
# No significant improvement in model fit:
<- cdfquantregFT(pregptriage ~ Ambulance*loglosh|Ambulance*loglosh, fd = "cauchit", sd = "arcsinh", inner = FALSE, version = "W", data = yoon); anova(fit4, fit4d) fit4d
## Likelihood ratio tests
##
## Resid. Df -2Loglik Df LR stat Pr(>Chi)
## 1 888 -1871.7
## 2 886 -1871.8 2 0.19441 0.9074
#
An examination of the parameter estimate correlation matrix reveals two correlations whose magnitudes are above 0.85, but the model appears stable and converges to the same solution from alternative starting-values.
The marginal fitted pdfs for the walk-ins and ambulance-arrivals correspond fairly closely to their repsective empirical distributions. The rank-order correlation between the fitted cdf and the dependent variable is 0.767, suggesting that the empirical quantiles are reasonably well modeled.
# Parameter-estimate correlations
cov2cor(vcov(fit4))
## (sigma)_(Intercept) (sigma)_Ambulance (sigma)_loglosh
## (sigma)_(Intercept) 1.000000000 -0.01452776 -0.6743849
## (sigma)_Ambulance -0.014527759 1.00000000 -0.1648889
## (sigma)_loglosh -0.674384884 -0.16488894 1.0000000
## (theta)_(Intercept) -0.803816131 -0.05094655 0.5617546
## (theta)_Ambulance 0.009483982 -0.98253988 0.1571836
## (theta)_loglosh 0.574435351 0.20734673 -0.9030172
## (theta)_(Intercept) (theta)_Ambulance (theta)_loglosh
## (sigma)_(Intercept) -0.80381613 0.009483982 0.5744354
## (sigma)_Ambulance -0.05094655 -0.982539881 0.2073467
## (sigma)_loglosh 0.56175458 0.157183639 -0.9030172
## (theta)_(Intercept) 1.00000000 0.046492189 -0.6838369
## (theta)_Ambulance 0.04649219 1.000000000 -0.2174752
## (theta)_loglosh -0.68383687 -0.217475215 1.0000000
#
# Fit the marginal distributions to the walk-ins:
<- sort(unique(yoon$pregptriage[yoon$Ambulance==0]))
uniqdat <- c(rep(0,length(uniqdat)))
densfit for (i in 1:length(uniqdat)) {
<- pdfft(uniqdat[i], sigma = exp(coef(fit4)[4] + mean(loglosh)*coef(fit4)[6]), theta = coef(fit4)[1] + mean(loglosh)*coef(fit4)[3], fd = "cauchit", sd = "arcsinh", mu = NULL, inner = FALSE, version = "W")
densfit[i]
}truehist(yoon$pregptriage[yoon$Ambulance==0], nbins = 100, main = "Walk-ins", xlab = "proportion", ylab = "pdf")
lines(uniqdat,densfit,lwd = 2)
# Fit the marginal distributions to the ambulance-arrivals:
<- sort(unique(yoon$pregptriage[yoon$Ambulance==1]))
uniqdat <- c(rep(0,length(uniqdat)))
densfit for (i in 1:length(uniqdat)) {
<- pdfft(uniqdat[i], sigma = exp(coef(fit4)[4] + coef(fit4)[5] + mean(loglosh)*coef(fit4)[6]), theta = coef(fit4)[1] + coef(fit4)[2] + mean(loglosh)*coef(fit4)[3], fd = "cauchit", sd = "arcsinh", mu = NULL, inner = FALSE, version = "W")
densfit[i]
}truehist(yoon$pregptriage[yoon$Ambulance==1], nbins = 100, main = "Ambulance-arrivals", xlab = "proportion", ylab = "pdf")
lines(uniqdat,densfit,lwd = 2)
#
# The fitted cdf should be monotonically related to the dependent variable.
# How well does our model do?
<- c(rep(0,length(yoon$pregptriage)))
fittedcdf for (i in 1: length(yoon$pregptriage)) {
<- cdfft(yoon$pregptriage[i], sigma = exp(coef(fit4)[4] + yoon$Ambulance[i]*coef(fit4)[5] + loglosh[i]*coef(fit4)[6]), theta = coef(fit4)[1] + yoon$Ambulance[i]*coef(fit4)[2] + loglosh[i]*coef(fit4)[3], fd = "cauchit", sd = "arcsinh", mu = NULL, inner = FALSE, version = "W")
fittedcdf[i]
}cor(fittedcdf, yoon$pregptriage, method = "spearman")
## [1] 0.7672581
#
Yoon, P., Steiner, I. & Reinhardt, G. (2003). Analysis of factors influencing length of stay in the emergency department.Canadian Journal of Emergency Medicine5, 155-161.
Smithson, M. & Broomell, S.B. (online 31/01/2022). Compositional data analysis tutorial. Psychological Methods.