ReproStat does not just return a single score. It returns a collection of stability summaries, each reflecting a different way a model result can vary under perturbation.
This article explains how to read those outputs in a way that is useful for real analysis work.
The package asks:
If I perturb the observed data in reasonable ways and refit the same model many times, how much do the key outputs move?
That is different from:
ReproStat is focused on stability of fitted outputs under repeated data perturbation.
Everything else in the package builds from this object.
coef_stability(diag_obj)
#> (Intercept) wt hp disp
#> 5.766188e+00 9.574906e-01 1.350558e-04 8.814208e-05This measures how much estimated coefficients vary across perturbation runs.
Interpretation:
Use this when you care about the magnitude of effects, not only whether they are statistically significant.
This reports the proportion of perturbation runs in which each
coefficient is significant at the chosen alpha.
Interpretation:
1 mean the term is almost always
significant0 mean the term is almost never
significant0.5 mean the significance decision is
unstableThis is especially helpful when a predictor looks “significant” in the base fit but may be borderline under small perturbations.
For standard regression backends, this reflects sign consistency
across perturbations. For glmnet, it reflects non-zero
selection frequency.
Interpretation:
This is often the most intuitive measure when the practical question is: “Does this variable keep showing up in the same way?”
prediction_stability(diag_obj)
#> $pointwise_variance
#> [1] 0.3662248 0.3309453 0.5611693 0.6449963 0.8857157 0.4844340 0.6296128
#> [8] 0.7715900 0.5702159 0.6549459 0.6549459 0.7078378 0.3601373 0.3973861
#> [15] 2.0421895 2.1812327 2.0343783 0.7530433 1.4530479 1.0200743 0.4814946
#> [22] 0.5898336 0.4856211 0.6416806 1.1922447 0.9322830 0.6916174 1.3695036
#> [29] 1.0020490 0.9246772 3.3817941 0.5130429
#>
#> $mean_variance
#> [1] 0.9284364Prediction stability summarizes how much the model’s predictions vary across perturbation runs.
Interpretation:
This is useful when the model will be used for scoring, ranking, or decision support rather than only interpretation.
reproducibility_index(diag_obj)
#> $index
#> [1] 88.71268
#>
#> $components
#> coef pvalue selection prediction
#> 0.9270762 0.8311111 0.8155556 0.9747641The Reproducibility Index aggregates multiple stability dimensions into a single 0-100 score.
A practical reading guide is:
90-100: highly stable under the chosen perturbation
scheme70-89: reasonably stable overall50-69: mixed stability; inspect components
carefully< 50: low stability; the model is sensitive in ways
worth investigatingThese are not universal cutoffs. They are interpretive anchors.
Two models can have similar RI values for different reasons:
For that reason, the RI should be treated as a summary, not a replacement for the component-level diagnostics.
This interval reflects uncertainty in the RI induced by the stored perturbation draws. It is useful when you want to know whether the reported RI is itself stable or highly variable.
Different perturbation schemes answer different questions.
Use "bootstrap" when you want a broad sense of ordinary
sampling variability.
Use "subsample" when you want to know whether the result
depends heavily on which observations happen to be included.
Use "noise" when you want to stress test sensitivity to
small measurement error or recording noise in numeric predictors.
A low RI is not a failure of the package. It is a result.
It can tell you that:
For applied work, a compact reporting pattern is:
BAfter reading this article, the most useful follow-up pages are:
vignette("ReproStat-intro") for the basic workflow