Getting Started with rdstagger

Overview

rdstagger implements a unified framework that combines three identification strategies simultaneously:

  1. Regression Discontinuity (RD) — treatment assignment by running variable cutoff
  2. Staggered DiD — heterogeneous treatment adoption timing across cohorts
  3. Network Interference — spillover effects through a known network

This vignette walks through a complete analysis using simulated data.

Installation

# CRAN
install.packages("rdstagger")

# GitHub (development)
remotes::install_github("causalfragility-lab/rdstagger")

Step 1: Simulate Data

library(rdstagger)

set.seed(42)
sim <- sim_rdstagger(
  n               = 400,
  nperiods        = 8,
  n_cohorts       = 3,
  cutoff          = 0,
  bw              = 1,
  network_density = 0.08,
  true_direct     = 0.30,
  true_spill      = 0.10,
  outcome_type    = "continuous"
)

head(sim$data)
#>   id period          y        x   g treated neighbor_treated spillover_share
#> 1  1      1 -1.0045007 2.056438 Inf       0                0      0.00000000
#> 2  1      2 -0.5559569 2.056438 Inf       0                1      0.06896552
#> 3  1      3  0.6360552 2.056438 Inf       0                1      0.20689655
#> 4  1      4 -1.4064010 2.056438 Inf       0                1      0.20689655
#> 5  1      5  0.7920164 2.056438 Inf       0                1      0.24137931
#> 6  1      6  0.2741520 2.056438 Inf       0                1      0.24137931
sim$true_params
#> $direct
#> [1] 0.3
#> 
#> $spill
#> [1] 0.1
#> 
#> $cutoff
#> [1] 0
#> 
#> $bw
#> [1] 1
#> 
#> $cohorts
#> [1] 2 3 5
#> 
#> $n_never_treated
#> [1] 290
#> 
#> $outcome_type
#> [1] "continuous"

The data contains:

Step 2: Estimate ATT(g,t)

res <- rdstagger_attgt(
  data    = sim$data,
  yname   = "y",
  xname   = "x",
  cutoff  = 0,
  gname   = "g",
  tname   = "period",
  idname  = "id",
  network = sim$network,
  bw      = 1.5,
  boot    = FALSE        # set TRUE for inference
)

print(res)
#> 
#> rdstagger ATT(g,t) Estimates
#> ============================
#> Bandwidth:     1.5000
#> Cohorts:       2, 3, 5
#> Periods:       1, 2, 3, 4, 5, 6, 7, 8
#> Control group: nevertreated
#> 
#> Post-treatment ATT(g,t):
#>  cohort period     att      se  ci_lower ci_upper      pval
#>       2      2 0.39466 0.03607  0.323957   0.4654 7.394e-28
#>       2      3 0.25049 0.04054  0.171024   0.3300 6.486e-10
#>       2      4 0.25812 0.04007  0.179574   0.3367 1.186e-10
#>       2      5 0.25039 0.04050  0.171002   0.3298 6.340e-10
#>       2      6 0.07271 0.04159 -0.008793   0.1542 8.037e-02
#>       2      7 0.20881 0.03588  0.138480   0.2791 5.917e-09
#>       2      8 0.11068 0.03497  0.042142   0.1792 1.551e-03
#>       3      3 0.46768 0.02988  0.409107   0.5262 3.324e-55
#>       3      4 0.22545 0.03005  0.166555   0.2844 6.267e-14
#>       3      5 0.27265 0.03043  0.213016   0.3323 3.232e-19
#>       3      6 0.21745 0.03168  0.155359   0.2795 6.682e-12
#>       3      7 0.26473 0.02840  0.209077   0.3204 1.135e-20
#>       3      8 0.29497 0.02721  0.241637   0.3483 2.207e-27
#>       5      5 0.29713 0.02718  0.243856   0.3504 8.177e-28
#>       5      6 0.27365 0.02879  0.217229   0.3301 1.972e-21
#>       5      7 0.48238 0.02715  0.429167   0.5356 1.273e-70
#>       5      8 0.19985 0.02730  0.146344   0.2534 2.471e-13
#> 
#> Spillover estimates available. Use x$spillgt to view.

Step 3: Pre-Treatment Falsification Test

pt <- rdstagger_pretest(res, method = "both")
print(pt)
#> 
#> rdstagger Pre-Treatment Falsification Test
#> ==========================================
#> Pre-treatment ATT(g,t) cells: 7
#> 
#> Joint test (H0: all pre-treatment ATT = 0):
#>   Chi-squared(7) = 64.2348,  p-value = 0.0000
#>   Result: FAIL (evidence of pre-trends)
#> 
#> Individual cell tests:
#>  cohort period      att      se   tstat      pval sig05
#>       2      1  0.00000 0.03470  0.0000 1.000e+00 FALSE
#>       3      1  0.10223 0.03243  3.1524 1.619e-03  TRUE
#>       3      2  0.00000 0.02823  0.0000 1.000e+00 FALSE
#>       5      1  0.02722 0.03439  0.7916 4.286e-01 FALSE
#>       5      2 -0.04483 0.02809 -1.5959 1.105e-01 FALSE
#>       5      3 -0.19938 0.02788 -7.1501 8.673e-13  TRUE
#>       5      4  0.00000 0.02757  0.0000 1.000e+00 FALSE

A p-value above 0.05 in the joint test indicates no evidence of pre-treatment trends — a necessary condition for the parallel trends assumption within the bandwidth.

Step 4: Aggregate into Event Study

agg <- rdstagger_agg(res, type = "dynamic")
print(agg)
#> 
#> rdstagger Aggregation -- type: dynamic
#> ========================================
#> Overall ATT: 0.2672
#> 
#>  event_time      att      se n_cells pre_post ci_lower ci_upper      pval
#>          -4  0.02722 0.03439       1      pre -0.04018  0.09463 4.286e-01
#>          -3 -0.04483 0.02809       1      pre -0.09989  0.01023 1.105e-01
#>          -2 -0.04857 0.03024       2      pre -0.10785  0.01070 1.083e-01
#>          -1  0.00000 0.03034       3      pre -0.05947  0.05947 1.000e+00
#>           0  0.38649 0.03127       3     post  0.32520  0.44778 4.290e-35
#>           1  0.24986 0.03354       3     post  0.18412  0.31561 9.410e-14
#>           2  0.33772 0.03301       3     post  0.27302  0.40242 1.442e-24
#>           3  0.22256 0.03361       3     post  0.15668  0.28844 3.558e-11
#>           4  0.16872 0.03561       2     post  0.09893  0.23851 2.153e-06
#>           5  0.25189 0.03184       2     post  0.18948  0.31430 2.568e-15
#>           6  0.11068 0.03497       1     post  0.04214  0.17923 1.551e-03
plot(agg)

The event study plot shows ATT estimates relative to treatment adoption. Pre-treatment estimates (event time < 0) should be close to zero.

Step 5: Cohort-Level Aggregation

agg_group <- rdstagger_agg(res, type = "group")
print(agg_group)
#> 
#> rdstagger Aggregation -- type: group
#> ========================================
#> Overall ATT: 0.2672
#> 
#>  cohort    att      se n_cells ci_lower ci_upper      pval
#>       2 0.2208 0.03860       7   0.1452   0.2965 1.061e-08
#>       3 0.2905 0.02964       6   0.2324   0.3486 1.130e-22
#>       5 0.3133 0.02761       4   0.2591   0.3674 7.907e-30

Step 6: Spillover Estimates

Spillover estimates are stored in res$spillgt:

if (!is.null(res$spillgt)) {
  head(res$spillgt)
}
#>   cohort period   spill_att         se    ci_lower   ci_upper         pval
#> 1      2      1  0.00000000 0.05670614 -0.11114199  0.1111420 1.000000e+00
#> 2      2      2  0.05977016 0.05287300 -0.04385902  0.1633993 2.582877e-01
#> 3      2      3 -0.26743384 0.05353595 -0.37236237 -0.1625053 5.871190e-07
#> 4      2      4  0.31510538 0.05241815  0.21236770  0.4178431 1.839511e-09
#> 5      2      5  0.42893951 0.05066956  0.32962899  0.5282500 2.552139e-17
#> 6      2      6  0.37123905 0.05114066  0.27100520  0.4714729 3.894564e-13
#>   n_exposed pre_post
#> 1       160      pre
#> 2       160     post
#> 3       160     post
#> 4       160     post
#> 5       160     post
#> 6       160     post

Bandwidth Selection

Optimal bandwidth can be selected automatically:

bw_sel <- rdstagger_bw(
  data   = sim$data,
  yname  = "y",
  xname  = "x",
  cutoff = 0,
  gname  = "g",
  tname  = "period"
)
bw_sel$bw_common

References

Callaway, B., & Sant’Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.

Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6), 2295–2326.

Manski, C. F. (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16(1), S1–S23.