Negative Control Diagnostics in causaldef

Theoretical Background

What is a Negative Control Outcome?

A negative control outcome ($Y'$) is a variable that:

Shares confounders with the true outcome $Y$ — it is affected by the same unmeasured variables $U$ that confound the treatment-outcome relationship
Is NOT causally affected by treatment $A$ — the true causal effect of $A$ on $Y'$ is zero

The Diagnostic Logic

The key insight is:

If your adjustment strategy correctly removes confounding, then the residual association between $A$ and $Y'$ should be zero.

If you observe a non-zero association between $A$ and $Y'$ after adjustment, this indicates that confounding remains and your causal estimates may be biased.

Negative Control Sensitivity Bound (manuscript `thm:nc_bound`)

The causaldef package combines two ingredients:

a screening test for residual association between treatment and the negative control after adjustment, and
the manuscript’s negative control sensitivity bound (thm:nc_bound):

\[\delta(\hat{K}) \leq \kappa \cdot \delta_{NC}(\hat{K})\]

where: - $\delta(\hat{K})$ is the true deficiency (what we want to know) - $\delta_{NC}(\hat{K})$ is a negative-control association proxy (what we can measure) - $\kappa$ is an alignment constant reflecting how well $Y'$ proxies for $Y$’s confounding

Practical Example

Simulating Data with a Negative Control

Let’s create a dataset where we have: - An unmeasured confounder $U$ - An observed covariate $W$ (correlated with $U$) - Binary treatment $A$ - Outcome $Y$ affected by $A$ and $U$ - Negative control $Y'$ affected only by $U$ (not $A$)

library(causaldef)
set.seed(42)

n <- 500

# Unmeasured confounder
U <- rnorm(n)

# Observed covariate (partially captures U)
W <- 0.7 * U + rnorm(n, sd = 0.5)

# Treatment assignment (confounded by U via W)
ps_true <- plogis(0.3 + 0.8 * U)
A <- rbinom(n, 1, ps_true)

# True causal effect
beta_true <- 2.0

# Outcome (affected by A and U)
Y <- 1 + beta_true * A + 1.5 * U + rnorm(n)

# Negative control outcome (affected by U only, NOT by A)
Y_nc <- 0.5 + 1.2 * U + rnorm(n, sd = 0.8)

# Create data frame
df <- data.frame(W = W, A = A, Y = Y, Y_nc = Y_nc)

Creating the Causal Specification

We specify the causal problem including the negative control:

spec <- causal_spec(
  data = df,
  treatment = "A",
  outcome = "Y",
  covariates = "W",
  negative_control = "Y_nc"
)
#> ✔ Created causal specification: n=500, 1 covariate(s)

print(spec)
#> 
#> -- Causal Specification --------------------------------------------------
#> 
#> * Treatment: A ( binary )
#> * Outcome: Y ( continuous )
#> * Covariates: W 
#> * Sample size: 500 
#> * Estimand: ATE 
#> * Negative control: Y_nc

Running the Negative Control Diagnostic

Now we test whether our IPTW adjustment successfully removes confounding:

nc_result <- nc_diagnostic(
  spec,
  method = "iptw",
  alpha = 0.05,
  n_boot = 200
)

print(nc_result)

Interpreting the Results

The diagnostic returns:

screening$statistic: Weighted residual association between $A$ and $Y'$ after adjustment
p_value: Permutation p-value for that residual association
delta_nc: The observed negative-control association proxy
delta_bound: Upper bound on true deficiency ($\kappa \times \delta_{NC}$)
falsified: Whether the residual-association screening test rejects

Scenarios

Scenario 1: Adjustment Succeeds

If $W$ fully captures $U$, the negative control test will NOT falsify:

# When W = U (no unmeasured confounding)
df_full <- df
df_full$W <- U  # Perfect proxy

spec_full <- causal_spec(
  df_full, "A", "Y", "W", negative_control = "Y_nc"
)

nc_full <- nc_diagnostic(spec_full, method = "iptw", n_boot = 100)
print(nc_full)
# Expect: falsified = FALSE

Scenario 2: Adjustment Fails

When $W$ is a poor proxy for $U$, falsification occurs:

# When W is noise (no information about U)
df_bad <- df
df_bad$W <- rnorm(n)  # Useless proxy

spec_bad <- causal_spec(
  df_bad, "A", "Y", "W", negative_control = "Y_nc"
)

nc_bad <- nc_diagnostic(spec_bad, method = "iptw", n_boot = 100)
print(nc_bad)
# Expect: falsified = TRUE

Choosing Good Negative Control Outcomes

Ideal Properties

The best negative control outcomes have:

Strong confounding alignment: $Y'$ shares the same unmeasured confounders as $Y$
Zero treatment effect: No plausible mechanism by which $A$ affects $Y'$
Measurable: Available in your dataset

Examples by Domain

Domain	Treatment	Outcome	Possible Negative Control
Cardiovascular	Statin use	CVD events	Accidental injuries
Oncology	Chemotherapy	Tumor response	Hospital-acquired infections
Economics	Job training	Earnings in 1978	Earnings in 1974 (pre-treatment)
Epidemiology	Vaccination	Flu incidence	Unrelated disease incidence

Combining with Deficiency Estimation

The negative control diagnostic complements deficiency estimation:

# Step 1: Estimate deficiency
def_results <- estimate_deficiency(
  spec,
  methods = c("unadjusted", "iptw", "aipw"),
  n_boot = 100
)

print(def_results)

# Step 2: Run negative control diagnostic on best method
best_method <- names(which.min(def_results$estimates))
nc_check <- nc_diagnostic(spec, method = best_method, n_boot = 100)

# Step 3: Compute policy bounds if assumptions not falsified
if (!nc_check$falsified) {
  bounds <- policy_regret_bound(
    def_results,
    utility_range = c(-5, 10),
    method = best_method
  )
  print(bounds)
} else {
  warning("Causal assumptions falsified. Consider additional covariates.")
}

Advanced: Estimating Kappa

The alignment constant $\kappa$ affects the bound’s tightness. The default $\kappa = 1$ is conservative. You can estimate $\kappa$ from domain knowledge:

# If you believe Y' has 80% of Y's confounding structure:
nc_tight <- nc_diagnostic(
  spec,
  method = "iptw",
  kappa = 0.8,
  n_boot = 100
)

print(nc_tight)

Summary

Function	Purpose
`nc_diagnostic()`	Screen for residual association and compute a sensitivity bound
`delta_nc`	Observable negative-control association proxy
`delta_bound`	Upper bound on true deficiency
`falsified`	Screening rejection of residual association

Negative control diagnostics provide a data-driven way to assess causal assumptions. Use them alongside deficiency estimation for robust causal inference.

References

Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison. DOI: 10.5281/zenodo.18367347. See thm:nc_bound (Negative Control Sensitivity Bound).
Lipsitch, M., Tchetgen, E., & Cohen, T. (2010). Negative controls: A tool for detecting confounding and bias. Epidemiology, 21(3), 383-388.
Shi, X., Miao, W., & Tchetgen Tchetgen, E. (2020). A selective review of negative control methods. Current Epidemiology Reports, 7, 190-202.

Negative Control Diagnostics in causaldef

Deniz Akdemir

2026-03-26

Introduction

Theoretical Background

What is a Negative Control Outcome?

The Diagnostic Logic

Negative Control Sensitivity Bound (manuscript `thm:nc_bound`)

Practical Example

Simulating Data with a Negative Control

Creating the Causal Specification

Running the Negative Control Diagnostic

Interpreting the Results

Scenarios

Scenario 1: Adjustment Succeeds

Scenario 2: Adjustment Fails

Choosing Good Negative Control Outcomes

Ideal Properties

Examples by Domain

Combining with Deficiency Estimation

Advanced: Estimating Kappa

Summary

References

Negative Control Diagnostics in causaldef

Deniz Akdemir

2026-03-26

Introduction

Theoretical Background

What is a Negative Control Outcome?

The Diagnostic Logic

Negative Control Sensitivity Bound (manuscript thm:nc_bound)

Practical Example

Simulating Data with a Negative Control

Creating the Causal Specification

Running the Negative Control Diagnostic

Interpreting the Results

Scenarios

Scenario 1: Adjustment Succeeds

Scenario 2: Adjustment Fails

Choosing Good Negative Control Outcomes

Ideal Properties

Examples by Domain

Combining with Deficiency Estimation

Advanced: Estimating Kappa

Summary

References

Negative Control Sensitivity Bound (manuscript `thm:nc_bound`)