Introduction: The Safety Manifesto

How do we calculate the price of acting on imperfect data?

Causal inference is typically framed as a geometry problem: “How close is our observational experiment to the target interventional experiment?” While true, this abstraction often hides the stakes.

The causaldef package is built on a different premise: If you rely on observational data, you are accepting a safety risk. You need to calculate the cost of being wrong.

Flash Forward: The Destination

Before we dive into the math, here is where we are going. This is the Safety Floor: the minimum unavoidable risk you accept when you choose not to run a Randomized Control Trial (RCT).

Figure 1: Regret Bounds vs Deficiency. The transfer penalty and minimax floor scale linearly in $\delta$:

\[ \text{Regret}_{do}(\pi) \leq \text{Regret}_{obs}(\pi) + M\delta \]

Regret: The cost of making a suboptimal decision based on your data.
Regret_obs: The regret you can evaluate under the observational regime.
$M\delta$: The transfer penalty (additive worst-case regret inflation term). The minimax safety floor is $(M/2)\delta$.

This is not just about bias—this is calculating the safety price for your decision. You are about to learn how to calculate, bound, and minimize this deficiency ($\delta$).

The Core Theory: Markov Kernels in Plain English

Formally, we use Le Cam’s theory of statistical experiments (Akdemir 2026). We seek a Markov Kernel $K$ such that:

\[ P_{do} \approx K P_{obs} \]

In plain English:

$P_{obs}$ is the messy data you have.
$P_{do}$ is the clean RCT data you want.
$K$ (the kernel) is the transformation (like matching or weighting) that tries to turn what you have into what you want.
$\delta$ (deficiency) is how much you failed.

If you get $\delta$ wrong, you need to know how much regret you’re signing up for.

Concept	Analogy
Observational experiment	Your car (parked where it is)
Interventional experiment (RCT)	Your destination (where you want to be)
Deficiency (\(\delta\))	Physical distance your car is parked from the destination
IPW, AIPW estimators	Drivers trying to drive the car closer to the destination
Safety floor	Risk you accept by walking those remaining miles through unknown territory

Workflow

We will demonstrate the package workflow using a simulated medical decision problem.

library(causaldef)
library(ggplot2)

set.seed(2026)

# DAG Helper (using ggplot2 only)
plot_dag <- function(coords, edges, title = NULL) {
  edges_df <- merge(edges, coords, by.x = "from", by.y = "name")
  colnames(edges_df)[c(3,4)] <- c("x_start", "y_start")
  edges_df <- merge(edges_df, coords, by.x = "to", by.y = "name")
  colnames(edges_df)[c(5,6)] <- c("x_end", "y_end")
  
  ggplot2::ggplot(coords, ggplot2::aes(x = x, y = y)) +
    ggplot2::geom_segment(data = edges_df, ggplot2::aes(x = x_start, y = y_start, xend = x_end, yend = y_end),
                 arrow = ggplot2::arrow(length = ggplot2::unit(0.3, "cm"), type = "closed"), 
                 color = "gray40", size = 1, alpha = 0.8) +
    ggplot2::geom_point(size = 14, color = "white", fill = "#4682B4", shape = 21, stroke = 1.5) +
    ggplot2::geom_text(ggplot2::aes(label = name), fontface = "bold", size = 4, color = "white") +
    ggplot2::ggtitle(title) + 
    ggplot2::theme_void(base_size = 14) +
    ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold", margin = ggplot2::margin(b = 10))) +
    ggplot2::coord_fixed()
}

Step 1: Data Generation

Consider a scenario where we want to estimate the effect of a new treatment A on patient recovery Y, but assignment is confounded by severity W.

# Simulate data
n <- 500
W <- rnorm(n) # Severity (Confounder)

# Treatment assignment biased by severity
prob_A <- plogis(0.5 * W) 
A <- rbinom(n, 1, prob_A)

# Outcome: Treatment effect = 2, Severity effect = 1
Y <- 1 + 2 * A + 1 * W + rnorm(n)

# Negative Control: Affected by W but not A
Y_nc <- 0.5 * W + rnorm(n)

df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc)

# Visualize the Causal Structure
coords <- data.frame(
  name = c("W", "A", "Y", "Y_nc"),
  x = c(0, -1, 1, 0.5),
  y = c(1, 0, 0, 1) # Pushing Y_nc up/out to side
)
edges <- data.frame(
  from = c("W", "W", "W", "A"),
  to =   c("A", "Y", "Y_nc", "Y")
)
plot_dag(coords, edges, title = "Scenario: Confounding + Negative Control")

Step 2: Specification

We define the causal problem using causal_spec.

spec <- causal_spec(
  data = df,
  treatment = "A",
  outcome = "Y",
  covariates = "W",
  negative_control = "Y_nc"
)
print(spec)
#> 
#> -- Causal Specification --------------------------------------------------
#> 
#> * Treatment: A ( binary )
#> * Outcome: Y ( continuous )
#> * Covariates: W 
#> * Sample size: 500 
#> * Estimand: ATE 
#> * Negative control: Y_nc

Step 3: Deficiency Estimation

We wish to know if we can “transport” our observational experiment to the ideal randomized experiment. We compare multiple adjustment strategies (our “drivers”).

# Compare Baseline (Unadjusted) vs IPTW vs AIPW
results <- estimate_deficiency(
  spec,
  methods = c("unadjusted", "iptw", "aipw"),
  n_boot = 50 
)

print(results)
#> 
#> -- Deficiency Proxy Estimates (PS-TV) ------
#> 
#>      Method  Delta     SE               CI            Quality
#>  unadjusted 0.1343 0.0224 [0.1064, 0.1916] Insufficient (Red)
#>        iptw 0.0226 0.0124 [0.0189, 0.0635]  Excellent (Green)
#>        aipw 0.0226 0.0098 [0.0193, 0.0539]  Excellent (Green)
#> Note: delta is a propensity-score TV proxy (overlap/balance diagnostic).
#> 
#> Best method: iptw (delta = 0.0226 )
plot(results, type = "bar")

Interpretation of Deficiency Results: The Distance to Walk

We evaluate our drivers based on how close they got us to the RCT:

Unadjusted ($\delta \approx 0.26$): Only drove us part of the way. Leaving us with a large distributional mismatch (TV distance ~ 0.26).
IPW / AIPW ($\delta \approx 0.005$): Drove us 98% of the way to the RCT. The remaining TV distance is negligible.

We use a Traffic Light system to interpret $\delta$:

Zone	$\delta$ Value	Interpretation	Color
🟢 Green	< 0.05	Excellent—very close to RCT quality	Green
🟡 Yellow	0.05 – 0.15	Good approximation, but some mismatch	Yellow/Amber
🔴 Red	> 0.15	Adjustment insufficient (TV distance > 0.15)	Red

Step 4: Diagnostics (The “Negative Control Trap”)

Warning: Do not fall into the trap of thinking a high p-value means “safe”.

We use the Negative Control outcome Y_nc to test our assumptions. Y_nc shares confounders with Y but is causally unrelated to A.

diag_res <- nc_diagnostic(spec, method = "iptw")
print(diag_res)
#> 
#> -- Negative Control Diagnostic ----------------------------------------
#> 
#> * screening statistic (weighted corr): -0.0375 
#> * delta_NC (association proxy): 0.0375 
#> * delta bound (under kappa alignment): 0.0375 (kappa = 1 )
#> * screening p-value: 0.41294 
#> * screening method: weighted_permutation_correlation 
#> 
#> RESULT: NOT REJECTED. This is a screening result, not proof that confounding is absent.
#> NOTE: Your effect estimate must exceed the Noise Floor (delta_bound) to be meaningful.

Critical Interpretation: The Deductible

delta_bound: This is your Noise Floor (or minimum deductible). Even with a high p-value, you cannot be more precise than this. If your estimated effect is smaller than this noise floor, you have a problem.
kappa ($\kappa$): This is the Skepticism Dial. It is the multiplier linking the negative control to the actual outcome. $\kappa=1$ assumes confounding strength is identical.
P-value: A high p-value simply means “We failed to catch an error”, not that an error isn’t there.

Interpretation: “This is your noise floor. Your effect estimate must exceed this to be meaningful.”

Step 5: Effect Estimation

Having validated our strategy (checked the car’s position), we estimate the causal effect.

# Estimate ATE
effect <- estimate_effect(results, target_method = "iptw")
print(effect)
#> 
#> -- Causal Effect Estimate ----------------------
#> Method:    iptw 
#> Type:      ATE 
#> Contrast:  1 vs 0 
#> Estimate:  1.9324

Step 6: Policy Regret Bounds (Transfer Penalty and Safety Floor)

This is the most critical step. We convert $\delta$ into decision-theoretic quantities.

\[ \text{Transfer Penalty} = M\delta,\qquad \text{Minimax Safety Floor} = (M/2)\delta \]

Where:

$M$ = Stakes (difference between best and worst financial/clinical outcome).
$\delta$ = Le Cam deficiency (distance to RCT).

# Utility range: [0, 10] outcome units
bounds <- policy_regret_bound(results, utility_range = c(0, 10), method = "aipw")
print(bounds)
#> 
#> -- Policy Regret Bounds -------------------------------------------------
#> 
#> * Deficiency delta: 0.0226 
#> * Delta mode: point 
#> * Delta method: aipw 
#> * Delta selection: pre-specified method 
#> * Utility range: [0, 10]
#> * Transfer penalty: 0.2262 (additive regret upper bound)
#> * Minimax floor: 0.1131 (worst-case lower bound)
#> 
#> Note: this is a plug-in bound using a deficiency proxy rather than an identified exact deficiency.
#> 
#> Interpretation: Transfer penalty is 2.3 % of utility range given delta
plot(bounds, type = "safety_curve")

Decision Framework

Example: If Stakes ($M$) = $10M and $\delta$ = 0.01, then:
- Transfer penalty = $100,000
- Minimax safety floor = $50,000

Decision:

*   If expected profit ($5M) >> transfer penalty ($100K) $\rightarrow$ **Signal is Clear**.
*   If expected profit ($120K) $\approx$ transfer penalty ($100K) $\rightarrow$ **That's Gambling**.

Step 7: Sensitivity Analysis & Decision

Finally, if we cannot rule out unobserved confounding, we map the Confounding Frontier. We must overlay our Safety Floor to see if we are in “Unsafe Territory”.

# What if there were an unobserved confounder U?
frontier <- confounding_frontier(spec, grid_size = 30)

# Calculate the delta corresponding to an acceptable transfer-penalty limit
# Suppose our max acceptable transfer penalty is 0.2 units, and M=10
# Transfer penalty = M * delta  =>  delta = transfer penalty / M
acceptable_transfer_penalty <- 0.2
M <- 10
delta_threshold <- acceptable_transfer_penalty / M

plot(frontier, type = "heatmap", threshold = c(delta_threshold))

Go/No-Go Decision

The contour line represents the border of an acceptable transfer penalty.

Ask yourself: > “Does the unsafe territory in the heat map push beyond your safety floor?”

If Yes: The analysis has failed. Do not publish. Run the RCT.
If No: We are safe within these bounds. Reliable result.

The causaldef Methodology: Theory and Practice