| Type: | Package |
| Title: | Value-Price Analysis with Bayesian and Panel Data Methods |
| Version: | 0.1.0 |
| Description: | Provides tools for analyzing the relationship between direct prices (based on labor values) and prices of production using Bayesian generalized linear models, panel data methods, partial least squares regression, canonical correlation analysis, and panel vector autoregression. Includes functions for model comparison, out-of-sample validation, and structural break detection. Here, methods use raw accounting data with explicit temporal structure, following Gomez Julian (2023) <doi:10.17605/OSF.IO/7J8KF> and standard econometric techniques for panel data analysis. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | stats, utils, Metrics |
| Suggests: | rstanarm, loo, plm, lme4, pls, vars, panelvar, strucchange, lmtest, sandwich, dplyr, tidyr, tibble, testthat (≥ 3.0.0), knitr, rmarkdown |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| URL: | https://github.com/isadorenabi/valueprhr |
| BugReports: | https://github.com/isadorenabi/valueprhr/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-12-02 21:29:21 UTC; ROG |
| Author: | Jose Mauricio Gomez Julian
|
| Maintainer: | Jose Mauricio Gomez Julian <isadore.nabi@pm.me> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-09 08:10:02 UTC |
valueprhr: Value-Price Analysis with Bayesian and Panel Data Methods
Description
This package provides tools for analyzing the relationship between direct prices (based on labor values) and prices of production using various econometric and statistical methods.
Main Functions
prepare_panel_dataConvert wide-format price matrices to panel data
fit_bayesian_glm_sectorsFit Bayesian GLM for each sector
fit_twoway_feFit two-way fixed effects panel model
fit_mundlak_creFit Mundlak correlated random effects model
fit_pls_multivariateFit PLS regression with CV selection
run_cca_var_analysisRun sparse CCA and panel VAR analysis
rolling_window_cvPerform rolling window cross-validation
test_structural_breaksTest for structural breaks
compare_modelsGenerate model comparison table
Author(s)
Maintainer: Jose Mauricio Gomez Julian isadore.nabi@pm.me (ORCID)
See Also
Useful links:
Report bugs at https://github.com/isadorenabi/valueprhr/issues
Create Time-Series Aggregated Data
Description
Aggregates panel data to time series by computing means across sectors.
Usage
aggregate_to_timeseries(panel_data, vars = c("log_direct", "log_production"))
Arguments
panel_data |
Data frame with panel data. |
vars |
Character vector of variables to aggregate. Default c("log_direct", "log_production"). |
Value
Data frame with one row per year containing aggregated values.
Examples
set.seed(123)
panel <- data.frame(
year = rep(2000:2005, 3),
sector = rep(c("A", "B", "C"), each = 6),
log_direct = rnorm(18, mean = 5),
log_production = rnorm(18, mean = 5)
)
ts_agg <- aggregate_to_timeseries(panel)
head(ts_agg)
Bayesian Generalized Linear Models for Sector Analysis
Description
Functions for fitting Bayesian GLMs sector by sector.
Calculate Mode Using Kernel Density Estimation
Description
Estimates the mode of a numeric vector using kernel density estimation. Falls back to median if density estimation fails.
Usage
calculate_mode(x)
Arguments
x |
A numeric vector. |
Value
A single numeric value representing the estimated mode.
Examples
set.seed(123)
x <- rnorm(100, mean = 5, sd = 1)
calculate_mode(x)
Canonical Correlation Analysis and Panel VAR
Description
Functions for sparse CCA and panel vector autoregression.
Check if Required Package is Available
Description
Checks for package availability and provides informative error message.
Usage
check_package(pkg, reason = "this functionality")
Arguments
pkg |
Character string with package name. |
reason |
Character string explaining why the package is needed. |
Value
TRUE invisibly if package is available, otherwise stops with error.
Examples
check_package("stats", "basic statistics")
Choose Number of Principal Components
Description
Internal function to select PCs based on variance threshold.
Usage
choose_n_pcs(pca_obj, min_k, max_k, threshold)
Arguments
pca_obj |
Result from prcomp. |
min_k |
Minimum components. |
max_k |
Maximum components. |
threshold |
Variance threshold. |
Value
Integer number of components.
Compare All Models
Description
Generates a comprehensive comparison table of all fitted models.
Usage
compare_models(
twoway_result = NULL,
mundlak_result = NULL,
bayes_result = NULL,
pls_result = NULL
)
Arguments
twoway_result |
Result from fit_twoway_fe (or NULL). |
mundlak_result |
Result from fit_mundlak_cre (or NULL). |
bayes_result |
Result from fit_bayesian_hierarchical (or NULL). |
pls_result |
Result from fit_pls_multivariate (or NULL). |
Value
A data frame with model comparison metrics.
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2019, 5),
sector = rep(LETTERS[1:5], each = 20),
log_direct = rnorm(100, 5, 0.5),
log_production = rnorm(100, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(100, 0, 0.1)
twoway <- fit_twoway_fe(panel)
mundlak <- fit_mundlak_cre(panel)
comparison <- compare_models(
twoway_result = twoway,
mundlak_result = mundlak
)
print(comparison)
}
Compute Multivariate MAE
Description
Compute Multivariate MAE
Usage
compute_multivariate_mae(Y_actual, Y_predicted)
Arguments
Y_actual |
Actual Y matrix. |
Y_predicted |
Predicted Y matrix. |
Value
Numeric MAE value.
Compute Multivariate Metrics
Description
Internal function to compute all metrics for multivariate predictions.
Usage
compute_multivariate_metrics(Y_actual, Y_predicted)
Arguments
Y_actual |
Actual Y matrix. |
Y_predicted |
Predicted Y matrix. |
Value
List of metrics.
Compute Multivariate R-squared
Description
Compute Multivariate R-squared
Usage
compute_multivariate_r2(Y_actual, Y_predicted)
Arguments
Y_actual |
Actual Y matrix. |
Y_predicted |
Predicted Y matrix. |
Value
Numeric R-squared value.
Compute Multivariate RMSE
Description
Compute Multivariate RMSE
Usage
compute_multivariate_rmse(Y_actual, Y_predicted)
Arguments
Y_actual |
Actual Y matrix. |
Y_predicted |
Predicted Y matrix. |
Value
Numeric RMSE value.
Compute OOS Degradation
Description
Compares out-of-sample metrics to in-sample metrics.
Usage
compute_oos_degradation(insample_metrics, cv_results)
Arguments
insample_metrics |
Named list with in-sample metrics. |
cv_results |
Data frame from rolling_window_cv. |
Value
Data frame with degradation percentages.
Examples
insample <- list(rmse_fe = 0.05, rmse_m = 0.04)
cv_res <- data.frame(
rmse_fe_all = c(0.06, 0.055, 0.058),
rmse_m_all = c(0.045, 0.042, 0.044)
)
degradation <- compute_oos_degradation(insample, cv_res)
print(degradation)
Compute PLS Cross-Validation Metrics
Description
Internal function to compute CV metrics for each component number.
Usage
compute_pls_cv_metrics(Y_actual, cv_predictions, ncomp)
Arguments
Y_actual |
Actual Y matrix. |
cv_predictions |
3D array of CV predictions [obs, vars, components]. |
ncomp |
Number of components fitted. |
Value
Data frame with CV metrics by component.
Compute BIC for Panel VAR Model
Description
Internal function to compute approximate BIC.
Usage
compute_pvar_bic(pvar_obj)
Arguments
pvar_obj |
Fitted pvarfeols object. |
Value
Numeric BIC value.
Compute In-Sample R-squared
Description
Calculates the coefficient of determination (R-squared) for predictions.
Usage
compute_r2(actual, predicted)
Arguments
actual |
Numeric vector of actual values. |
predicted |
Numeric vector of predicted values. |
Value
A single numeric R-squared value, or NA if not computable.
Examples
actual <- c(1, 2, 3, 4, 5)
predicted <- c(1.1, 1.9, 3.1, 4.0, 4.9)
compute_r2(actual, predicted)
Create Mundlak-Transformed Panel Data
Description
Adds sector-level means and within-deviations for Mundlak/CRE estimation.
Usage
create_mundlak_data(panel_data, x_var = "log_direct")
Arguments
panel_data |
Data frame with panel data. |
x_var |
Character string. Name of the explanatory variable. Default "log_direct". |
Value
Panel data with additional columns:
- x_mean_sector
Sector-level mean of x_var
- x_within
Within-sector deviation (x - sector mean)
Examples
set.seed(123)
panel <- data.frame(
year = rep(2000:2002, 3),
sector = rep(c("A", "B", "C"), each = 3),
log_direct = rnorm(9, mean = 5),
log_production = rnorm(9, mean = 5)
)
panel_mundlak <- create_mundlak_data(panel)
head(panel_mundlak)
Create Summary Table from Sector Results
Description
Internal function to create a data frame summary from sector results.
Usage
create_sector_summary_table(results_list)
Arguments
results_list |
List of successful sector results. |
Value
Data frame with summary statistics.
Data Preparation Functions
Description
Functions for preparing and transforming price data for analysis.
Evaluate In-Sample Model Performance
Description
Computes various error metrics comparing predictions to actual values.
Usage
evaluate_insample(predicted, actual)
Arguments
predicted |
Numeric vector of predicted values (log scale). |
actual |
Numeric vector of actual values (log scale). |
Value
A list containing:
- mae_log
MAE in log scale
- rmse_log
RMSE in log scale
- mae_orig
MAE in original scale (after exp transformation)
- rmse_orig
RMSE in original scale
- mae_rel_range
MAE as percentage of range
Examples
set.seed(123)
actual <- log(runif(50, 100, 200))
predicted <- actual + rnorm(50, 0, 0.1)
evaluate_insample(predicted, actual)
Export Results to CSV
Description
Exports analysis results to CSV files.
Usage
export_results_csv(
comparison_table = NULL,
cv_results = NULL,
sector_summary = NULL,
output_dir = tempdir(),
prefix = "valueprhr"
)
Arguments
comparison_table |
Data frame from compare_models. |
cv_results |
Data frame from rolling_window_cv. |
sector_summary |
Data frame from fit_bayesian_glm_sectors. |
output_dir |
Directory for output files. Default tempdir(). |
prefix |
Filename prefix. Default "valueprhr". |
Value
Character vector of created file paths.
Examples
comparison <- data.frame(
model = c("Model A", "Model B"),
R2 = c(0.95, 0.92)
)
files <- export_results_csv(comparison_table = comparison)
print(files)
Extract Estimated Breakpoints
Description
Internal function to estimate breakpoint locations.
Usage
extract_breakpoints(ts_data)
Arguments
ts_data |
Aggregated time series data. |
Value
Vector of estimated breakpoint years, or NULL.
Extract Top CCA Loadings
Description
Extracts the variables with highest absolute loadings for each CCA component.
Usage
extract_cca_loadings(
cca_result,
n_top = 5L,
which_matrix = c("both", "X", "Y")
)
Arguments
cca_result |
Result from run_sparse_cca. |
n_top |
Number of top variables to show. Default 5. |
which_matrix |
Character. "X", "Y", or "both". Default "both". |
Value
Data frame with top loadings.
Examples
set.seed(123)
n <- 50
p <- 20
X <- matrix(rnorm(n * p), n, p)
Y <- X %*% matrix(rnorm(p * 5), p, 5) + matrix(rnorm(n * 5, 0, 0.5), n, 5)
colnames(X) <- paste0("X", 1:p)
colnames(Y) <- paste0("Y", 1:5)
cca_res <- run_sparse_cca(X, Y, n_components = 2)
top_loads <- extract_cca_loadings(cca_res)
print(top_loads)
Extract PLS Variable Importance
Description
Extracts variable importance scores from a fitted PLS model.
Usage
extract_pls_importance(pls_result, ncomp = NULL)
Arguments
pls_result |
Result from fit_pls_multivariate. |
ncomp |
Number of components to use. Default uses optimal. |
Value
Data frame with variable names and importance scores.
Examples
if (requireNamespace("pls", quietly = TRUE)) {
set.seed(123)
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
Y <- X[, 1:3] %*% diag(c(1, 0.5, 0.3)) + matrix(rnorm(n * 3, 0, 0.5), n, 3)
colnames(Y) <- paste0("Y", 1:3)
result <- fit_pls_multivariate(X, Y, max_components = 5)
importance <- extract_pls_importance(result)
print(head(importance))
}
Extract Random Effects Variance Components
Description
Internal function to extract variance components from plm summary.
Usage
extract_re_variances(summary_obj)
Arguments
summary_obj |
Summary object from plm random effects model. |
Value
A list with id (between) and idios (within) variance components.
Extract Sector Coefficients
Description
Extracts intercepts and slopes from all sector models.
Usage
extract_sector_coefficients(sector_results)
Arguments
sector_results |
List of sector results from fit_bayesian_glm_sectors. |
Value
Data frame with sector names, intercepts, and slopes.
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {
set.seed(123)
years <- 2000:2010
direct <- data.frame(
Year = years,
A = 100 + cumsum(rnorm(11)),
B = 120 + cumsum(rnorm(11))
)
production <- data.frame(
Year = years,
A = 102 + cumsum(rnorm(11)),
B = 118 + cumsum(rnorm(11))
)
results <- fit_bayesian_glm_sectors(direct, production,
chains = 2, iter = 1000)
coefs <- extract_sector_coefficients(results$results)
print(coefs)
}
Fit Aggregated VAR Model
Description
Fits a VAR model on time-aggregated (mean across sectors) data.
Usage
fit_aggregated_var(panel_data, max_lags = 6L, difference = TRUE)
Arguments
panel_data |
Data frame in panel format. |
max_lags |
Maximum lag order. Default 6. |
difference |
Logical. Apply first differencing. Default TRUE. |
Value
A list containing:
- model
The fitted vars::VAR model
- selected_lag
Lag selected by information criteria
- irf
Impulse response functions (if computed)
- fevd
Forecast error variance decomposition (if computed)
Examples
if (requireNamespace("vars", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2019, 5),
sector = rep(LETTERS[1:5], each = 20),
log_direct = rnorm(100, 5, 0.5),
log_production = rnorm(100, 5, 0.5)
)
result <- fit_aggregated_var(panel)
print(result$selected_lag)
}
Fit Bayesian GLM for Each Sector
Description
Fits separate Bayesian generalized linear models for each sector, regressing production prices on direct prices.
Usage
fit_bayesian_glm_sectors(
direct_prices,
production_prices,
chains = 4L,
iter = 4000L,
seed = 12345L,
verbose = TRUE
)
Arguments
direct_prices |
Data frame with direct prices. First column must be 'Year', remaining columns are sector values. |
production_prices |
Data frame with prices of production. Must have same structure as direct_prices. |
chains |
Number of MCMC chains. Default 4. |
iter |
Number of iterations per chain. Default 4000. |
seed |
Random seed for reproducibility. Default 12345. |
verbose |
Logical. Print progress messages. Default TRUE. |
Details
This function requires the 'rstanarm' and 'loo' packages to be installed. Each sector model uses a Gaussian family with identity link and weakly informative priors.
Value
A list with two elements:
- results
List of results for each sector
- summary_table
Data frame with summary statistics for all sectors
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {
set.seed(123)
years <- 2000:2010
direct <- data.frame(
Year = years,
Agriculture = 100 + cumsum(rnorm(11, 2, 1)),
Manufacturing = 120 + cumsum(rnorm(11, 2, 1))
)
production <- data.frame(
Year = years,
Agriculture = 102 + cumsum(rnorm(11, 2, 1)),
Manufacturing = 118 + cumsum(rnorm(11, 2, 1))
)
results <- fit_bayesian_glm_sectors(
direct, production,
chains = 2, iter = 1000
)
print(results$summary_table)
}
Fit Bayesian Hierarchical Panel Model
Description
Fits a Bayesian mixed effects model with random slopes by sector.
Usage
fit_bayesian_hierarchical(
panel_data,
include_time = TRUE,
chains = 4L,
iter = 4000L,
seed = 12345L
)
Arguments
panel_data |
Data frame in panel format. |
include_time |
Logical. Include time trend. Default TRUE. |
chains |
Number of MCMC chains. Default 4. |
iter |
Number of iterations. Default 4000. |
seed |
Random seed. Default 12345. |
Value
A list containing:
- model
The fitted rstanarm model object
- r2_bayes
Bayesian R-squared (mean)
- summary
Model summary for fixed effects
- metrics
In-sample evaluation metrics
Examples
## Not run:
if (requireNamespace("rstanarm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2009, 5),
sector = rep(LETTERS[1:5], each = 10),
time = rep(1:10, 5),
log_direct = rnorm(50, 5, 0.5),
log_production = rnorm(50, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(50, 0, 0.1)
result <- fit_bayesian_hierarchical(panel, chains = 2, iter = 1000)
print(result$r2_bayes)
}
## End(Not run)
Fit Mundlak Correlated Random Effects Model
Description
Fits a Mundlak (CRE) model that decomposes effects into within and between components, allowing for correlation between unit effects and regressors.
Usage
fit_mundlak_cre(panel_data, include_time_fe = TRUE, robust_se = TRUE)
Arguments
panel_data |
Data frame in panel format. |
include_time_fe |
Logical. Include time fixed effects. Default TRUE. |
robust_se |
Logical. Compute robust standard errors. Default TRUE. |
Details
The Mundlak transformation adds sector-level means of the regressors to a random effects model, allowing consistent estimation even when the random effects are correlated with the regressors.
Value
A list containing:
- model
The fitted plm model object
- summary
Model summary
- panel_data_augmented
Panel data with Mundlak transformations
- coeftest_robust
Robust coefficient tests
- variance_components
Random effects variance components
- metrics
In-sample evaluation metrics
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2009, 5),
sector = rep(LETTERS[1:5], each = 10),
log_direct = rnorm(50, 5, 0.5),
log_production = rnorm(50, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(50, 0, 0.1)
result <- fit_mundlak_cre(panel)
print(result$variance_components)
}
Fit Panel VAR Model
Description
Fits a panel vector autoregression model with first-difference transformation.
Usage
fit_panel_var(panel_data, max_lags = 2L, verbose = TRUE)
Arguments
panel_data |
Data frame in panel format. |
max_lags |
Maximum lag order to consider. Default 2. |
verbose |
Logical. Print progress. Default TRUE. |
Value
A list containing:
- model
The fitted panelvar model
- best_lag
Selected lag order
- bic_values
BIC for each lag order tested
Examples
if (requireNamespace("panelvar", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2019, 5),
sector = rep(LETTERS[1:5], each = 20),
log_direct = rnorm(100, 5, 0.5),
log_production = rnorm(100, 5, 0.5)
)
result <- fit_panel_var(panel)
print(result$best_lag)
}
Fit PLS Regression with Cross-Validation Component Selection
Description
Fits a partial least squares regression model with automatic selection of the optimal number of components via cross-validation.
Usage
fit_pls_multivariate(
X_matrix,
Y_matrix,
max_components = NULL,
cv_segments = 10L,
scale = TRUE,
center = TRUE
)
Arguments
X_matrix |
Numeric matrix of predictor variables (direct prices). |
Y_matrix |
Numeric matrix of response variables (production prices). |
max_components |
Maximum number of components to consider. Default NULL uses min(ncol(X), nrow(X)-1, ncol(Y), 25). |
cv_segments |
Number of cross-validation segments. Default 10. |
scale |
Logical. Scale variables before fitting. Default TRUE. |
center |
Logical. Center variables before fitting. Default TRUE. |
Details
This function uses the pls package for PLS regression. Component selection is based on minimizing cross-validated RMSE. The function handles log-transformed data and reports metrics in both log and original scales.
Value
A list containing:
- model
The fitted pls model object
- optimal_ncomp
Optimal number of components by CV-RMSE
- cv_table
Data frame with CV metrics by number of components
- metrics_cv
CV metrics at optimal component number
- metrics_insample
In-sample metrics at optimal component number
Examples
if (requireNamespace("pls", quietly = TRUE)) {
set.seed(123)
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
Y <- X[, 1:3] %*% diag(c(1, 0.5, 0.3)) + matrix(rnorm(n * 3, 0, 0.5), n, 3)
colnames(Y) <- paste0("Y", 1:3)
result <- fit_pls_multivariate(X, Y, max_components = 8)
print(result$optimal_ncomp)
print(result$cv_table)
}
Fit and Predict with FE Sector Model
Description
Internal function to fit FE model and generate predictions.
Usage
fit_predict_fe_sector(df_train, df_test)
Arguments
df_train |
Training data. |
df_test |
Test data. |
Value
Numeric vector of predictions.
Fit and Predict with Mundlak Model
Description
Internal function to fit Mundlak model and generate predictions.
Usage
fit_predict_mundlak(df_train, df_test)
Arguments
df_train |
Training data. |
df_test |
Test data. |
Value
Numeric vector of predictions.
Fit Single Sector Bayesian GLM (Internal)
Description
Internal function to fit a Bayesian GLM for one sector.
Usage
fit_single_sector_glm(x_values, y_values, sector_name, chains, iter, seed)
Arguments
x_values |
Numeric vector of direct price values. |
y_values |
Numeric vector of production price values. |
sector_name |
Character string with sector name. |
chains |
Number of MCMC chains. |
iter |
Number of iterations. |
seed |
Random seed. |
Value
A list with model results and diagnostics.
Fit Two-Way Fixed Effects Panel Model
Description
Fits a two-way fixed effects model with sector and time effects, regressing log production prices on log direct prices.
Usage
fit_twoway_fe(
panel_data,
robust_se = TRUE,
cluster_type = c("group", "time", "twoway")
)
Arguments
panel_data |
Data frame in panel format with columns: year, sector, log_direct, log_production. |
robust_se |
Logical. Compute robust standard errors. Default TRUE. |
cluster_type |
Character. Type of cluster for robust SE. One of "group", "time", or "twoway". Default "group". |
Details
This function requires the 'plm' package. The model specification is: log_production ~ log_direct with two-way (sector and time) fixed effects.
Value
A list containing:
- model
The fitted plm model object
- summary
Model summary
- r2_within
Within R-squared
- coeftest_robust
Coefficient test with robust SE (if robust_se=TRUE)
- metrics
In-sample evaluation metrics
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2009, 5),
sector = rep(LETTERS[1:5], each = 10),
log_direct = rnorm(50, 5, 0.5),
log_production = rnorm(50, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(50, 0, 0.1)
result <- fit_twoway_fe(panel)
print(result$r2_within)
}
Format Structural Break Results
Description
Creates a formatted summary of structural break test results.
Usage
format_break_results(break_results, alpha = 0.05)
Arguments
break_results |
Result from test_structural_breaks. |
alpha |
Significance level for interpretation. Default 0.05. |
Value
A data frame with formatted test summaries.
Examples
if (requireNamespace("strucchange", quietly = TRUE)) {
set.seed(123)
years <- 1980:2020
panel <- data.frame(
year = rep(years, 5),
sector = rep(LETTERS[1:5], each = length(years)),
log_direct = rnorm(length(years) * 5, 5, 0.5),
log_production = rnorm(length(years) * 5, 5, 0.5)
)
break_tests <- test_structural_breaks(panel)
summary_df <- format_break_results(break_tests)
print(summary_df)
}
Generate Comprehensive Analysis Summary
Description
Creates a complete summary of all analysis results including models, validation, and structural break tests.
Usage
generate_analysis_summary(
comparison_table,
cv_summary = NULL,
break_results = NULL,
cca_results = NULL,
granger_results = NULL
)
Arguments
comparison_table |
Data frame from compare_models. |
cv_summary |
Data frame from summarize_cv_results (or NULL). |
break_results |
Result from test_structural_breaks (or NULL). |
cca_results |
Result from run_sparse_cca (or NULL). |
granger_results |
Data frame from panel_granger_test (or NULL). |
Value
A list with formatted summary components.
Examples
comparison <- data.frame(
model = c("Model A", "Model B"),
R2 = c(0.95, 0.92),
RMSE_log = c(0.05, 0.06)
)
summary_list <- generate_analysis_summary(comparison)
print(summary_list$best_model)
Extract R-squared Values from rstanarm Model
Description
Attempts to extract R-squared using loo_R2, falling back to bayes_R2.
Usage
get_r2_values(fit, verbose = FALSE)
Arguments
fit |
A fitted rstanarm model object. |
verbose |
Logical. Print messages about extraction method. Default FALSE. |
Value
A named numeric vector with mean, median, and mode R-squared values.
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {
data(mtcars)
fit <- rstanarm::stan_glm(mpg ~ wt, data = mtcars,
chains = 2, iter = 1000, refresh = 0)
get_r2_values(fit)
}
Interpret Break Test Results
Description
Provides textual interpretation of structural break tests.
Usage
interpret_break_tests(break_results, alpha = 0.05)
Arguments
break_results |
Result from test_structural_breaks. |
alpha |
Significance level. Default 0.05. |
Value
Character string with interpretation.
Examples
if (requireNamespace("strucchange", quietly = TRUE)) {
set.seed(123)
years <- 1980:2020
panel <- data.frame(
year = rep(years, 5),
sector = rep(LETTERS[1:5], each = length(years)),
log_direct = rnorm(length(years) * 5, 5, 0.5),
log_production = rnorm(length(years) * 5, 5, 0.5)
)
break_tests <- test_structural_breaks(panel)
interpretation <- interpret_break_tests(break_tests)
cat(interpretation)
}
Leave-One-Sector-Out Cross-Validation
Description
Performs LOSO CV, leaving out each sector in turn as the test set.
Usage
leave_one_sector_out(panel_data, verbose = TRUE)
Arguments
panel_data |
Data frame in panel format. |
verbose |
Logical. Print progress. Default TRUE. |
Value
A data frame with RMSE and MAE for each held-out sector.
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2019, 5),
sector = rep(LETTERS[1:5], each = 20),
log_direct = rnorm(100, 5, 0.5),
log_production = rnorm(100, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(100, 0, 0.1)
loso_results <- leave_one_sector_out(panel)
print(loso_results)
}
Panel Granger Causality Test (Dumitrescu-Hurlin)
Description
Performs panel Granger causality tests between direct and production prices.
Usage
panel_granger_test(panel_data, lags = c(1L, 2L))
Arguments
panel_data |
Data frame in panel format. |
lags |
Integer vector of lag orders to test. Default c(1, 2). |
Details
Tests both directions: direct -> production and production -> direct.
Value
A data frame with test results for each direction and lag.
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2019, 5),
sector = rep(LETTERS[1:5], each = 20),
log_direct = rnorm(100, 5, 0.5),
log_production = rnorm(100, 5, 0.5)
)
granger_results <- panel_granger_test(panel)
print(granger_results)
}
Panel Data Models for Value-Price Analysis
Description
Functions for fitting two-way fixed effects and Mundlak CRE models.
Partial Least Squares Regression Analysis
Description
Functions for multivariate PLS regression with cross-validation.
Predict from PLS Model
Description
Generate predictions from a fitted PLS model.
Usage
predict_pls(pls_result, newdata = NULL, ncomp = NULL)
Arguments
pls_result |
Result from fit_pls_multivariate. |
newdata |
Optional new data matrix for prediction. |
ncomp |
Number of components to use. Default uses optimal. |
Value
Matrix of predictions.
Examples
if (requireNamespace("pls", quietly = TRUE)) {
set.seed(123)
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("X", 1:p)
Y <- X[, 1:3] %*% diag(c(1, 0.5, 0.3)) + matrix(rnorm(n * 3, 0, 0.5), n, 3)
colnames(Y) <- paste0("Y", 1:3)
result <- fit_pls_multivariate(X, Y, max_components = 5)
preds <- predict_pls(result)
dim(preds)
}
Prepare Log-Transformed Matrices
Description
Extracts numeric columns from price data frames and applies log transform.
Usage
prepare_log_matrices(
direct_prices,
production_prices,
exclude_cols = c("Year")
)
Arguments
direct_prices |
Data frame with direct prices. |
production_prices |
Data frame with prices of production. |
exclude_cols |
Character vector of columns to exclude. Default c("Year"). |
Value
A list containing:
- X_log
Log-transformed matrix of direct prices
- Y_log
Log-transformed matrix of production prices
- complete_cases
Logical vector indicating complete cases
- X_clean
Subset of X_log with complete cases
- Y_clean
Subset of Y_log with complete cases
Examples
set.seed(123)
direct <- data.frame(
Year = 2000:2005,
A = runif(6, 100, 200),
B = runif(6, 100, 200)
)
production <- data.frame(
Year = 2000:2005,
A = runif(6, 100, 200),
B = runif(6, 100, 200)
)
matrices <- prepare_log_matrices(direct, production)
str(matrices)
Prepare Panel Data from Wide Format Matrices
Description
Converts wide-format price matrices (with Year as first column and sectors as subsequent columns) into long-format panel data suitable for panel regression analysis.
Usage
prepare_panel_data(direct_prices, production_prices, log_transform = TRUE)
Arguments
direct_prices |
Data frame with direct prices (labor value-based). First column must be 'Year', remaining columns are sector values. |
production_prices |
Data frame with prices of production. Must have same structure as direct_prices. |
log_transform |
Logical. Apply natural log transformation. Default TRUE. |
Value
A data frame in panel (long) format with columns:
- year
Year of observation
- sector
Sector identifier
- sector_id
Numeric sector identifier
- time
Time index (year minus minimum year plus 1)
- direct_price
Direct price value
- production_price
Price of production value
- log_direct
Log of direct price (if log_transform = TRUE)
- log_production
Log of production price (if log_transform = TRUE)
Examples
set.seed(123)
years <- 2000:2010
sectors <- c("Agriculture", "Manufacturing", "Services")
direct <- data.frame(
Year = years,
Agriculture = 100 + cumsum(rnorm(11)),
Manufacturing = 120 + cumsum(rnorm(11)),
Services = 90 + cumsum(rnorm(11))
)
production <- data.frame(
Year = years,
Agriculture = 102 + cumsum(rnorm(11)),
Manufacturing = 118 + cumsum(rnorm(11)),
Services = 92 + cumsum(rnorm(11))
)
panel <- prepare_panel_data(direct, production)
head(panel)
Print Analysis Summary
Description
Prints a formatted summary of analysis results to console.
Usage
print_analysis_summary(summary_list, verbose = TRUE)
Arguments
summary_list |
Result from generate_analysis_summary. |
verbose |
Logical. Print detailed output. Default TRUE. |
Value
Invisible NULL.
Examples
comparison <- data.frame(
model = c("Two-Way FE", "Mundlak CRE"),
R2 = c(0.95, 0.92),
RMSE_log = c(0.05, 0.06)
)
summary_list <- generate_analysis_summary(comparison)
print_analysis_summary(summary_list)
Robust Summary Statistics with Bootstrap Confidence Intervals
Description
Computes mean, standard deviation, median, MAD, trimmed mean, and bootstrap confidence intervals for a numeric vector.
Usage
robust_summary(x, bootstrap_reps = 200L, trim_proportion = 0.1)
Arguments
x |
Numeric vector. |
bootstrap_reps |
Number of bootstrap replications for CI. Default 200. |
trim_proportion |
Proportion to trim for trimmed mean. Default 0.10. |
Value
A list containing:
- mean
Arithmetic mean
- sd
Standard deviation
- median
Median
- mad
Median Absolute Deviation (scaled)
- tmean
Trimmed mean
- ci
Vector of length 2 with 95 percent bootstrap CI bounds
Examples
set.seed(123)
x <- rnorm(50)
robust_summary(x)
Rolling Window Cross-Validation
Description
Performs time-series cross-validation using rolling windows, comparing fixed effects and Mundlak models.
Usage
rolling_window_cv(
panel_data,
window_sizes = c(20L, 30L),
step_size = 2L,
test_horizon = 3L,
verbose = TRUE
)
Arguments
panel_data |
Data frame in panel format. |
window_sizes |
Integer vector of training window sizes. Default c(20, 30). |
step_size |
Integer step between windows. Default 2. |
test_horizon |
Integer number of periods to forecast. Default 3. |
verbose |
Logical. Print progress. Default TRUE. |
Details
For each rolling window, the function fits both a fixed effects model (by sector) and a Mundlak CRE model, then evaluates predictions on the test period. Results are separated by whether test sectors were seen during training (common) or not (new).
Value
A data frame with validation results for each window, including:
- window_size
Training window size
- window_start
Start year of training window
- window_end
End year of training window
- rmse_fe_all
RMSE for FE model on all test observations
- rmse_m_all
RMSE for Mundlak model on all test observations
- n_test_common
Number of test obs from sectors in training
- n_test_new
Number of test obs from new sectors
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2029, 5),
sector = rep(LETTERS[1:5], each = 30),
log_direct = rnorm(150, 5, 0.5),
log_production = rnorm(150, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(150, 0, 0.1)
cv_results <- rolling_window_cv(panel, window_sizes = c(15, 20))
print(head(cv_results))
}
Run Complete CCA and VAR Analysis
Description
Convenience function to run both sparse CCA and panel/aggregated VAR.
Usage
run_cca_var_analysis(
direct_prices,
production_prices,
panel_data,
cca_components = 3L,
verbose = TRUE
)
Arguments
direct_prices |
Data frame with direct prices. |
production_prices |
Data frame with production prices. |
panel_data |
Data frame in panel format. |
cca_components |
Number of CCA components. Default 3. |
verbose |
Logical. Print progress. Default TRUE. |
Value
A list with cca, pvar, agg_var, and granger results.
Examples
set.seed(123)
years <- 2000:2019
sectors <- LETTERS[1:5]
direct <- data.frame(Year = years)
production <- data.frame(Year = years)
for (s in sectors) {
direct[[s]] <- 100 + cumsum(rnorm(20, 2, 1))
production[[s]] <- 102 + cumsum(rnorm(20, 2, 1))
}
panel <- prepare_panel_data(direct, production)
matrices <- prepare_log_matrices(direct, production)
result <- run_cca_var_analysis(
direct, production, panel,
cca_components = 2
)
Run Chow Tests at Candidate Break Years
Description
Internal function to perform Chow tests.
Usage
run_chow_tests(ts_data, break_years, min_segment)
Arguments
ts_data |
Aggregated time series data. |
break_years |
Vector of candidate break years. |
min_segment |
Minimum segment size. |
Value
Data frame with Chow test results.
Run CUSUM Test
Description
Internal function to perform recursive CUSUM test.
Usage
run_cusum_test(ts_data)
Arguments
ts_data |
Aggregated time series data. |
Value
List with test statistic and p-value.
Run Complete Analysis Pipeline
Description
Convenience function to run the full analysis pipeline.
Usage
run_full_analysis(
direct_prices,
production_prices,
run_bayesian = FALSE,
run_cv = TRUE,
run_breaks = TRUE,
verbose = TRUE
)
Arguments
direct_prices |
Data frame with direct prices. |
production_prices |
Data frame with production prices. |
run_bayesian |
Logical. Run Bayesian models. Default FALSE. |
run_cv |
Logical. Run cross-validation. Default TRUE. |
run_breaks |
Logical. Run structural break tests. Default TRUE. |
verbose |
Logical. Print progress. Default TRUE. |
Value
A list with all analysis results.
Examples
set.seed(123)
years <- 2000:2019
sectors <- LETTERS[1:5]
direct <- data.frame(Year = years)
production <- data.frame(Year = years)
for (s in sectors) {
direct[[s]] <- 100 + cumsum(rnorm(20, 2, 1))
production[[s]] <- 102 + cumsum(rnorm(20, 2, 1))
}
if (requireNamespace("plm", quietly = TRUE)) {
results <- run_full_analysis(
direct, production,
run_bayesian = FALSE,
run_cv = FALSE
)
print(results$comparison)
}
Run MOSUM Test
Description
Internal function to perform OLS-MOSUM test.
Usage
run_mosum_test(ts_data)
Arguments
ts_data |
Aggregated time series data. |
Value
List with test statistic and p-value.
Run Sparse CCA with PCA Preprocessing
Description
Performs canonical correlation analysis on PCA-reduced price matrices, with optional sparsity penalties.
Usage
run_sparse_cca(
X_matrix,
Y_matrix,
n_components = 3L,
variance_threshold = 0.9,
min_pcs = 8L,
max_pcs = 12L
)
Arguments
X_matrix |
Numeric matrix of direct prices. |
Y_matrix |
Numeric matrix of production prices. |
n_components |
Number of canonical components to extract. Default 3. |
variance_threshold |
Cumulative variance threshold for PCA. Default 0.90. |
min_pcs |
Minimum number of PCs to retain. Default 8. |
max_pcs |
Maximum number of PCs to retain. Default 12. |
Details
The function first reduces dimensionality using PCA, then applies CCA. Falls back to base R cancor if specialized packages unavailable.
Value
A list containing:
- method
Method used for CCA
- correlations
Canonical correlations
- U_loadings
X loadings in PC space
- V_loadings
Y loadings in PC space
- W_X_original
X loadings projected to original variables
- W_Y_original
Y loadings projected to original variables
- n_pcs_x
Number of PCs used for X
- n_pcs_y
Number of PCs used for Y
Examples
set.seed(123)
n <- 50
p <- 20
X <- matrix(rnorm(n * p), n, p)
Y <- X %*% matrix(rnorm(p * 5), p, 5) + matrix(rnorm(n * 5, 0, 0.5), n, 5)
colnames(X) <- paste0("X", 1:p)
colnames(Y) <- paste0("Y", 1:5)
result <- run_sparse_cca(X, Y, n_components = 2)
print(result$correlations)
Run Supremum F Test
Description
Internal function to perform supF test for unknown break date.
Usage
run_supf_test(ts_data)
Arguments
ts_data |
Aggregated time series data. |
Value
List with test statistic and p-value.
Safe MAE Calculation
Description
Calculates Mean Absolute Error handling non-finite values.
Usage
safe_mae(actual, predicted)
Arguments
actual |
Numeric vector of actual values. |
predicted |
Numeric vector of predicted values. |
Value
A single numeric value for MAE, or NA if calculation not possible.
Examples
actual <- c(1, 2, 3, 4, 5)
predicted <- c(1.1, 2.2, 2.9, 4.1, 5.2)
safe_mae(actual, predicted)
Safe Percentage Calculation
Description
Calculates percentage while handling division by zero and non-finite values.
Usage
safe_pct(numerator, denominator)
Arguments
numerator |
Numeric vector for the numerator. |
denominator |
Numeric vector for the denominator. |
Value
Numeric vector of percentages, with NA for invalid calculations.
Examples
safe_pct(c(10, 20, 30), c(100, 0, 200))
Safe RMSE Calculation
Description
Calculates Root Mean Squared Error handling non-finite values.
Usage
safe_rmse(actual, predicted)
Arguments
actual |
Numeric vector of actual values. |
predicted |
Numeric vector of predicted values. |
Value
A single numeric value for RMSE, or NA if calculation not possible.
Examples
actual <- c(1, 2, 3, 4, 5)
predicted <- c(1.1, 2.2, 2.9, 4.1, 5.2)
safe_rmse(actual, predicted)
Structural Break Tests
Description
Functions for testing structural breaks in the price relationship.
Summarize Rolling Window Results
Description
Computes summary statistics from rolling window cross-validation.
Usage
summarize_cv_results(cv_results, bootstrap_reps = 300L)
Arguments
cv_results |
Data frame from rolling_window_cv. |
bootstrap_reps |
Number of bootstrap replications. Default 300. |
Value
A data frame with summary statistics by model and partition.
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2029, 5),
sector = rep(LETTERS[1:5], each = 30),
log_direct = rnorm(150, 5, 0.5),
log_production = rnorm(150, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(150, 0, 0.1)
cv_results <- rolling_window_cv(panel, window_sizes = c(15, 20))
summary_stats <- summarize_cv_results(cv_results)
print(summary_stats)
}
Model Comparison and Summary Functions
Description
Functions for comparing models and generating summary tables.
Test Mundlak Specification
Description
Performs Wald test on the sector-mean coefficient to test whether fixed effects would be preferred over random effects.
Usage
test_mundlak_specification(mundlak_result)
Arguments
mundlak_result |
Result from fit_mundlak_cre. |
Details
Under the null hypothesis that the sector means coefficient equals zero, random effects would be appropriate. Rejection suggests fixed effects should be used.
Value
A list with test statistic, degrees of freedom, and p-value.
Examples
if (requireNamespace("plm", quietly = TRUE)) {
set.seed(123)
panel <- data.frame(
year = rep(2000:2009, 5),
sector = rep(LETTERS[1:5], each = 10),
log_direct = rnorm(50, 5, 0.5),
log_production = rnorm(50, 5, 0.5)
)
panel$log_production <- panel$log_direct * 0.95 + rnorm(50, 0, 0.1)
mundlak_fit <- fit_mundlak_cre(panel)
test_result <- test_mundlak_specification(mundlak_fit)
print(test_result)
}
Test for Structural Breaks
Description
Performs multiple structural break tests on the aggregated time series relationship between direct and production prices.
Usage
test_structural_breaks(panel_data, chow_years = NULL, min_segment = 10L)
Arguments
panel_data |
Data frame in panel format. |
chow_years |
Integer vector of candidate break years for Chow test. Default NULL uses 1986, 1997, 2001, 2008 if present. |
min_segment |
Integer minimum observations per segment. Default 10. |
Details
This function aggregates panel data to a single time series by taking means across sectors, then applies various structural break tests from the strucchange package.
Value
A list containing:
- chow
Chow test results for candidate years
- cusum
CUSUM test results
- mosum
MOSUM test results
- supf
supremum F test results
- breakpoints
Estimated breakpoint dates
- aggregated_data
The aggregated time series used
Examples
if (requireNamespace("strucchange", quietly = TRUE)) {
set.seed(123)
years <- 1980:2020
panel <- data.frame(
year = rep(years, 5),
sector = rep(LETTERS[1:5], each = length(years)),
log_direct = rnorm(length(years) * 5, 5, 0.5),
log_production = rnorm(length(years) * 5, 5, 0.5)
)
break_tests <- test_structural_breaks(panel)
print(break_tests$cusum)
}
Utility Functions for Value-Price Analysis
Description
Internal utility functions used across the package.
Validate Panel Data Structure
Description
Checks that panel data has required columns and valid structure.
Usage
validate_panel_data(panel_data, require_log = TRUE)
Arguments
panel_data |
Data frame to validate. |
require_log |
Logical. Check for log-transformed columns. Default TRUE. |
Value
TRUE invisibly if valid, otherwise stops with informative error.
Examples
set.seed(123)
panel <- data.frame(
year = rep(2000:2002, 3),
sector = rep(c("A", "B", "C"), each = 3),
log_direct = rnorm(9),
log_production = rnorm(9)
)
validate_panel_data(panel)
Out-of-Sample Validation Functions
Description
Functions for rolling window and leave-one-sector-out validation.