README

The cdnbcr package provides tools for modeling time-to-event data in the presence of a cure fraction, considering correlated competing risks. The Correlated Destructive Negative Binomial Cure Rate (CDNBCR) model describes the time until an event occurs while accounting for a cure fraction, assuming multiple initial competing causes with potential correlation. The model is destructive in the sense that some initial competing causes contributing to the occurrence of the event can be eliminated. Thus, the model is particularly useful in applications where interventions (e.g., treatments) may eliminate or inactivate some competing causes. Among the package’s contributions, we implemented an Expectation-Maximization (EM) algorithm for maximum likelihood estimation of the model parameters and we provide diagnostic plots based on Cox-Snell residuals.

Installation

devtools::install_github("rdmatheus/cdnbcr")

To run the above command, it is necessary that the devtools package is previously installed on R. If not, install it using the following command:

install.packages("devtools")

After installing the devtools package, if you are using Windows, install the most current RTools program. Finally, run the command devtools::install_github("rdmatheus/cdnbcr"), and then the package will be installed on your computer.

Example

The cdnbcr package provides functions for estimation and inference for the parameters as well as plot tools for assessing the goodness-of-fit of the model. Below is an example of some functions usage and available methods.

## Loading packages
library(cdnbcr)
library(survival)

e1690 dataset (see ?e1690)

The e1690 object contains observations of a well-known dataset from a Phase III cutaneous melanoma clinical trial. The study was conducted by the Eastern Cooperative Oncology Group (ECOG) and aimed to evaluate the effectiveness of the Interferon alpha-2b chemotherapy in preventing recurrence after surgery. The observations include 417 patients from 1991 to 1995, with follow-up until 1998.

Correlated destructive fit

The main function of the package is the cdnbcr() function, which implements the EM algorithm for the estimation of the CDNBCR model parameters.

## Correlated destructive fit
fit <- cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 1 |
                sex + trt + thickness + age, data = e1690)

class(fit)
#> [1] "cdnbcr"

It returns an object of the "cdnbcr" class, which has common usage methods (e.g., print, summary, plot, among others). The methods currently available for the "cdnbcr" class are presented below.

methods(class = "cdnbcr")
#>  [1] AIC          coef         logLik       model.frame  model.matrix
#>  [6] plot         predict      print        residuals    summary     
#> [11] vcov        
#> see '?methods' for accessing help and source code

## print
fit
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 
#>     1 | sex + trt + thickness + age, data = e1690)
#> 
#> Regression model for theta (log link):
#>   nodeII  nodeIII   nodeIV 
#> 1.100414 1.592870 2.418928 
#> 
#> Regression model for p (logit link):
#>     (Intercept)       sexFemale trtChemotherapy       thickness             age 
#>      -0.8283426      -1.9336639       0.5597650       0.4299035       0.0533924 
#> 
#> Dispersion, dependence, and baseline parameters:
#>       phi     alpha       mu    sigma
#>  2.598226 0.9781685 3.394487 2.287898
#> 
#> ---
#> Log-likelihood: -502.7273
#> AIC: 1029.455   BIC: 1077.852
#> Number of EM iterations:

## summary
summary(fit)
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 
#>     1 | sex + trt + thickness + age, data = e1690)
#> 
#> Summary for residuals:
#>       Min        1Q    Median        3Q       Max 
#> 0.0000000 0.0000000 0.3721069 0.9213613 1.2807131 
#> 
#> Regression model for theta (log link):
#>         Estimate Std. error t value Pr(>|t|)    
#> nodeII   1.10041    0.44635  2.4654 0.013687 *  
#> nodeIII  1.59287    0.54042  2.9475 0.003204 ** 
#> nodeIV   2.41893    0.51421  4.7042  2.5e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Regression model for p (logit link):
#>                  Estimate Std. error t value Pr(>|t|)
#> (Intercept)     -0.828343   1.679746 -0.4931   0.6219
#> sexFemale       -1.933664   1.299001 -1.4886   0.1366
#> trtChemotherapy  0.559765   0.788898  0.7096   0.4780
#> thickness        0.429904   0.363671  1.1821   0.2372
#> age              0.053392   0.033935  1.5734   0.1156
#> 
#> Dispersion, dependence, and baseline parameters:
#>                  phi     alpha        mu     sigma
#> Estimate   2.5982261 0.9781685 3.3944872 2.2878976
#> Std. error 0.7441392 0.0594861 0.3503527 0.2204368
#> 
#> ---
#> Log-lik value: -502.7273 
#> AIC: 1029.455 and BIC: 1077.852 
#> EM iterations:

## plot
par(mfrow = c(1, 2))
plot(fit, ask = FALSE)

par(mfrow = c(1, 1))

By default, the predict() function returns the fitted survival time or the fitted cure rate for the data that has been observed. If new individuals are added to the sample, the survival function and cure rate can also be estimated with the predict() function. See the example below

## Suppose we are interested in obtaining predictions for the following individuals:
newdata <- data.frame(trt = factor(c("Control", "Control", "Chemotherapy", "Chemotherapy")),
                     age = median(e1690$age),
                     sex = factor(c("Male", "Female", "Male", "Female")),
                     thickness = median(e1690$thickness),
                     nodeII = c(0, 0, 0, 0),
                     nodeIII = c(0, 0, 0, 0),
                     nodeIV = c(1, 1, 1, 1))
newdata
#>            trt     age    sex thickness nodeII nodeIII nodeIV
#> 1      Control 47.1075   Male       3.1      0       0      1
#> 2      Control 47.1075 Female       3.1      0       0      1
#> 3 Chemotherapy 47.1075   Male       3.1      0       0      1
#> 4 Chemotherapy 47.1075 Female       3.1      0       0      1

## Fitted survival curves
pred <- predict(fit, newdata)
plot(pred[1, ], type = "l", ylim = c(0, 1), xlab = "Time", ylab = "Survival")
lines(pred[2, ], col = 2, lty = 2)
lines(pred[3, ], col = 3, lty = 3)
lines(pred[4, ], col = 4, lty = 4)
legend("topright", legend = c("trt: Control, sex: Male",
                             "trt: Control, sex: Female",
                             "trt: Chemotherapy, sex: Male",
                             "trt: Chemotherapy, sex: Female"),
      col = 1:4, lty = 1:4)

## Predicted cure rates
predict(fit, newdata, type = "cure")
#> [1] 0.2957616 0.4190912 0.2847148 0.3633011
## Predicted expected number of initial competing causes
predict(fit, newdata, type = "theta")
#> [1] 11.23381 11.23381 11.23381 11.23381
## Predicted activation probability of an initial competing cause
predict(fit, newdata, type = "p")
#> [1] 0.9534491 0.7476045 0.9728620 0.8383012

The use of the EM algorithm also makes it possible to extract latent variables underlying the definition of the destructive model, such as the number of initial competing causes for the occurrence of the event (called “M”) and the remaining causes that were not destroyed (called “D”). These variables are made available in the "cdnbcr" object. See below.

M <- fit$latent$M
plot(M, ylab = "Initial competing causes", pch = 16, cex = 0.8)


D <- fit$latent$D
plot(D, ylab = "Remaining competing causes", pch = 16, cex = 0.8)

Uncorrelated destructive fit (alpha = 0)

The parameter \(\alpha\) describes the dependence of the CDNBCR model. An important special case occurs when \(\alpha = 0\), i.e. an uncorrelated destructive model. The uncorrelated model can be specified with the cdnbcr() function with the argument alpha = FALSE, which sets the parameter \(\alpha = 0\) in the parameter estimation process.

fit0 <- cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 1 |
                 sex + trt + thickness + age, alpha = FALSE, data = e1690)

summary(fit0)
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 
#>     1 | sex + trt + thickness + age, data = e1690, alpha = FALSE)
#> 
#> Summary for residuals:
#>       Min        1Q    Median        3Q       Max 
#> 0.0000000 0.0000000 0.3738455 0.9595133 1.2372406 
#> 
#> Regression model for theta (log link):
#>         Estimate Std. error t value Pr(>|t|)    
#> nodeII   0.98674    0.43676  2.2592 0.023869 *  
#> nodeIII  1.50469    0.54006  2.7861 0.005334 ** 
#> nodeIV   2.33377    0.54286  4.2990 1.72e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Regression model for p (logit link):
#>                  Estimate Std. error t value Pr(>|t|)
#> (Intercept)     -1.482456   1.389650 -1.0668   0.2861
#> sexFemale       -0.675326   0.856211 -0.7887   0.4303
#> trtChemotherapy -0.525835   1.004527 -0.5235   0.6007
#> thickness        0.187150   0.208348  0.8983   0.3690
#> age              0.045449   0.037213  1.2213   0.2220
#> 
#> Dispersion, dependence, and baseline parameters:
#>                 phi        mu     sigma
#> Estimate   2.565274 3.1279706 2.1749280
#> Std. error 1.008614 0.3881323 0.2319883
#> 
#> ---
#> Log-lik value: -505.0737 
#> AIC: 1032.147 and BIC: 1076.511 
#> EM iterations:

`cdnbcr`: Correlated Destructive Negative Binomial Cure Rate Model

Installation

Example

`e1690` dataset (see `?e1690`)

Correlated destructive fit

Uncorrelated destructive fit (alpha = 0)

cdnbcr: Correlated Destructive Negative Binomial Cure Rate Model

Installation

Example

e1690 dataset (see ?e1690)

Correlated destructive fit

Uncorrelated destructive fit (alpha = 0)

`cdnbcr`: Correlated Destructive Negative Binomial Cure Rate Model

`e1690` dataset (see `?e1690`)