Type: Package
Title: Phase-Function Based Estimation and Inference for Linear Errors-in-Variables (EIV) Models
Version: 0.1.0
Description: Estimation and inference for coefficients of linear EIV models with symmetric measurement errors. The measurement errors can be homoscedastic or heteroscedastic, for the latter, replication for at least some observations needs to be available. The estimation method and asymptotic inference are based on a generalised method of moments framework, where the estimating equations are formed from (1) minimising the distance between the empirical phase function (normalised characteristic function) of the response and that of the linear combination of all the covariates at the estimates, and (2) minimising a corrected least-square discrepancy function. Specifically, for a linear EIV model with p error-prone and q error-free covariates, if replicates are available, the GMM approach is based on a 2(p+q) estimating equations if some replicates are available and based on p+2q estimating equations if no replicate is available. The details of the method are described in Nghiem and Potgieter (2020) <doi:10.1093/biomet/asaa025> and Nghiem and Potgieter (2025) <doi:10.5705/ss.202022.0331>.
License: GPL-2
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 3.5)
Imports: nleqslv
Suggests: extraDistr
NeedsCompilation: no
Packaged: 2026-03-30 01:13:06 UTC; liuchang
Author: Chang Liu [aut, cre], Linh Nghiem [aut]
Maintainer: Chang Liu <leo12345liu@gmail.com>
Repository: CRAN
Date/Publication: 2026-04-02 07:50:17 UTC

Gradient of the phase-function estimating equations

Description

Computes the gradient of the phase-function estimating equations with respect to the regression coefficients.

Usage

PhaseGradient(beta, S_y, C_y, WZ, t, estWeight)

Arguments

beta

Numeric vector of length p + q containing the regression coefficients.

S_y

Numeric vector containing sum_j sin(t * Y_j) evaluated on the grid t.

C_y

Numeric vector containing sum_j cos(t * Y_j) evaluated on the grid t.

WZ

Numeric matrix of dimension n x (p + q) containing all covariates.

t

Numeric vector giving the grid of frequency values.

estWeight

Numeric vector of length n giving observation weights.

Value

Numeric vector of length p + q containing the phase-function gradient evaluated at beta.


Extract Coefficients from an eiv_mlr Object

Description

Extract Coefficients from an eiv_mlr Object

Usage

coef(object)

Arguments

object

An object of class "eiv_mlr".

Value

A named numeric vector of estimated regression coefficients.


GMM estimator combining moment-corrected and phase-function equations

Description

Computes Generalized Method of Moments (GMM) estimators by combining moment-corrected estimating equations and phase-function estimating equations. The GMM weighting matrix is estimated via a cluster bootstrap that accounts for uncertainty in estimating the measurement error covariance.

Usage

computing_GMM_estimator(
  W,
  y,
  Zmat,
  weight_method = c("uniform", "minimax", "quasi-likelihood"),
  B = 100,
  t_grid_length = 1000
)

Arguments

W

Numeric array of dimension n x p x J containing replicate measurements of contaminated covariates.

y

Numeric vector of length n containing the response.

Zmat

Numeric matrix of dimension n x q containing error-free covariates.

B

Integer specifying the number of bootstrap replicates.

t_grid_length

Integer specifying the length of optimal t for computational intergration.

Value

A list with components:

est

Numeric matrix of GMM estimates; one column per weight choice.

CV_boot

Array of bootstrap covariance matrices used in GMM.


Compute the corrected estimator via estimating equations

Description

This function computes the corrected estimator for a linear model with measurement error in covariates by solving the corrected estimating equations using a nonlinear root-finding algorithm.

Usage

computing_corrected_estimator(y, Wbar, Zmat, estSigmaU_bar, beta_in = NULL)

Arguments

y

Numeric vector of length n. Response variable.

Wbar

Numeric matrix of dimension n × p. Error-prone covariates at the averaged (replicate) level.

Zmat

Numeric matrix of dimension n × q. Error-free covariates (including intercept if applicable).

estSigmaU_bar

Measurement error variance/covariance associated with the averaged covariates. Can be n × p × p or n × 1.

beta_in

Optional numeric vector of length p + q. Initial value for the estimating equation solver.

Details

The estimator is obtained by solving

E_n(\beta) = 0

where E_n is the corrected estimating equation implemented in estim_eq_corrected().

Value

Numeric vector of length p + q, the corrected estimator.

See Also

estim_eq_corrected


Confidence Intervals for eiv_mlr Coefficients

Description

Computes Wald-type confidence intervals for regression coefficients.

Usage

confint(object, level = 0.95)

Arguments

object

An object of class "eiv_mlr".

level

Confidence level; defaults to 0.95.

Value

A matrix with columns lower and upper.


dietary_white_women

Description

A processed Dataset containing repeated 24-hour dietary recalls.

Usage

dietary_white_women

Format

A data frame with 2 rows per individual (one per recall day).

unit

Integer unit identifier

bmi

Body Mass Index (kg/m^2)

energy

Total energy intake (kcal)

protein

Protein intake (g)

fat

Fat intake (g)

age_in_month

Age in months

replicate

Replicate id of each observation

Details

The variables energy, protein, and fat are treated as error-prone covariates with two replicate measurements per individual, and age_in_month is treated as error-free covariate.

Source

Processed from NHANES dietary recall data.


Linear Regression with Errors-in-Variables Using Replicated Measurements

Description

Fits a linear regression model in the presence of measurement error in covariates using replicated measurements and a combination of phase-function estimation and generalized method of moments (GMM).

Usage

eiv_mlr(
  formula,
  data,
  weight_method = c("uniform", "minimax", "quasi-likelihood"),
  B = 100,
  t_grid_length = 1000
)

Arguments

formula

A symbolic description of the model to be fitted. Error-prone covariates must be wrapped in W() and error-free covariates must be wrapped in Z().

The general form is:

    y ~ W(W1 + W2 + ...) + Z(Z1 + Z2 + ...)
  

An intercept is included automatically unless removed explicitly.

data

A data frame containing the response variable and all covariates appearing in formula. Each row corresponds to one replicate measurement of a statistical unit.

The data frame must contain:

  • A column named unit identifying statistical units.

  • One or more rows per unit if replicated measurements exist.

  • One column for each error-prone covariate (appearing in W()).

  • One column for each error-free covariate (appearing in Z()).

Replicated measurements are represented by multiple rows sharing the same unit identifier. Error-free covariates and the response should be constant within each unit.

weight_method

Character string specifying the observation weighting method used in estimation. One of:

"uniform"

Uniform weights across observations.

"minimax"

Minimax optimal weights.

"quasi-likelihood"

Quasi-likelihood-based weights (default recommended).

B

Integer specifying the number of bootstrap replications used to estimate the GMM weighting matrix. Defaults to 100.

t_grid_length

Integer specifying the number of frequency grid points used in phase-function integration. Larger values improve numerical accuracy at the cost of computation time.

Details

The function provides an lm-like interface while internally handling replicated error-prone covariates, measurement error correction, and robust variance estimation.

This function implements a measurement-error-corrected linear regression estimator for models with replicated error-prone covariates. When fewer than two units contain replicated measurements, the function automatically falls back to a quadratic (identity-weight) estimator. This ensures the model remains estimable even in the absence of replication.

The estimation procedure:

  1. Aggregates replicated measurements into a structured array.

  2. Uses phase-function estimating equations to correct for unknown measurement error distributions.

  3. Combines moment conditions via GMM when sufficient replication information is available.

  4. Automatically switches to a quadratic (identity-weight) estimator when fewer than two statistical units contain replicated measurements.

Variance estimation is performed using a sandwich estimator, with the GMM weighting matrix estimated via a cluster bootstrap over statistical units.

Value

An object of class "eiv_mlr" containing:

coef

Estimated regression coefficients.

vcov

Estimated variance-covariance matrix.

se

Standard errors of the estimates.

zvalue

Z-statistics for hypothesis testing.

pvalue

Two-sided p-values.

fitted

Fitted values at the unit level.

method

Estimation method used (GMM or quadratic fallback).

n

Number of statistical units.

Standard methods such as summary(), coef(), vcov(), confint(), predict(), and residuals() are available for objects of this class.

Examples

## ------------------------------------------
## Small reproducible example (for speed reasons, we chose a too small number of bootstrap samples)
## ------------------------------------------

set.seed(1)

n  <- 30
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

# For speed reasons, we use a very small number of bootstrap samples
fit <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 10,
  t_grid_length = 20
)

coef(fit)
summary(fit)


## ------------------------------------------
## Additional examples (not run during checks)
## ------------------------------------------


## ------------------------------------------
## Example using included dataset
## ------------------------------------------

fit <- eiv_mlr(
  bmi ~ W(energy + protein + fat) + Z(age_in_month),
  data = dietary_white_women,
  weight_method = "minimax",
  B = 100,
  t_grid_length = 200
)

summary(fit)
confint(fit)


## ------------------------------------------
## Simulated example with replication
## ------------------------------------------

set.seed(1)

n  <- 200
J  <- 2

unit <- rep(1:n, each = J)

W_true <- rnorm(n)
W_obs  <- rep(W_true, each = J) + rnorm(n * J, sd = 0.5)

Z1 <- rep(rnorm(n), each = J)
y  <- rep(1 + 2 * W_true - 0.5 * Z1[seq(1, n * J, by = J)], each = J) +
      rnorm(n * J)

sim_data <- data.frame(
  unit = unit,
  y = y,
  W1 = W_obs,
  Z1 = Z1
)

fit_rep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_data,
  B = 20
)

summary(fit_rep)


## ------------------------------------------
## Simulated example without replication
## ------------------------------------------

sim_norep <- sim_data[!duplicated(sim_data$unit), ]

fit_norep <- eiv_mlr(
  y ~ W(W1) + Z(Z1),
  data = sim_norep,
  B = 20
)

summary(fit_norep)



Moment-corrected estimating equations for measurement error regression

Description

Constructs the moment-corrected estimating equations for a linear regression model with averaged error-prone covariates and error-free covariates.

Usage

estim_eq_corrected(y, Wbar, Zmat, b_in, estSigmaU_bar)

Arguments

y

Numeric vector of length n containing the response.

Wbar

Numeric matrix of dimension n x p containing averaged error-prone covariates.

Zmat

Numeric matrix of dimension n x q containing error-free covariates (including an intercept if required).

b_in

Numeric vector of length p + q containing the regression coefficients at which the estimating equations are evaluated.

estSigmaU_bar

Numeric array of dimension n x p x p (or numeric vector in the univariate case) containing measurement error covariance estimates.

Value

Numeric vector of length p + q containing the stacked estimating equations.


Combined estimating equations: moment correction and phase function

Description

Constructs a stacked system of estimating equations combining moment-corrected score equations and phase-function score equations.

Usage

estim_eq_correctedAndPhase(
  Wbar,
  Zmat,
  y,
  S_y,
  C_y,
  b_in,
  estWeight,
  estSigmaU_bar,
  t
)

Arguments

Wbar

Numeric matrix of dimension n x p containing averaged error-prone covariates.

Zmat

Numeric matrix of dimension n x q containing error-free covariates (including an intercept if required).

y

Numeric vector of length n containing the response.

S_y

Numeric vector containing precomputed sine terms for the phase function.

C_y

Numeric vector containing precomputed cosine terms for the phase function.

b_in

Numeric vector of length p + q at which the estimating equations are evaluated.

estWeight

Numeric vector of weights used in the phase-function component.

estSigmaU_bar

Numeric array of dimension n x p x p containing measurement error covariance estimates.

t

Numeric vector of frequency values used in the phase function.

Value

Numeric vector containing the stacked estimating equations of length 2 * (p + q).


Estimate observation weights for measurement error models

Description

Computes observation-specific weights using uniform, minimax, and quasi-likelihood weighting schemes based on replicate measurements of contaminated covariates.

Usage

estimating_weight(
  W,
  method = c("uniform", "minimax", "quasi-likelihood"),
  gamma = NULL
)

Arguments

W

Numeric array of dimension n x p x J containing replicate measurements of contaminated covariates. Missing replicates should be coded as NA.

method

Character string specifying which weights to return. One of "all", "uniform", "minimax", or "quasi-likelihood".

gamma

Optional non-negative regularisation parameter used in the quasi-likelihood weights. Defaults to log(n).

Value

A numeric matrix with n rows. Columns correspond to the requested weighting schemes.


Predictions from an Errors-in-Variables Linear Model

Description

Generates fitted values or confidence intervals for new observations using plug-in estimates. Measurement error is not corrected for new data.

Usage

## S3 method for class 'eiv_mlr'
predict(
  object,
  newdata = NULL,
  interval = c("none", "confidence"),
  level = 0.95,
  ...
)

Arguments

object

An object of class "eiv_mlr".

newdata

Optional data frame containing covariates. If omitted, predictions are returned for the training data.

interval

Type of interval to compute: "none" or "confidence".

level

Confidence level for intervals.

...

Not used.

Value

A numeric vector of predictions, or a matrix with columns fit, lwr, and upr if intervals are requested.


Print Method for eiv_mlr Objects

Description

Displays a concise summary of an errors-in-variables linear model fit.

Usage

## S3 method for class 'eiv_mlr'
print(x, ...)

Arguments

x

An object of class "eiv_mlr".

...

Not used.

Value

Invisibly returns the input object x of class "eiv_mlr". This function is called for its side effect of printing a summary of the model.


Residuals from an Errors-in-Variables Linear Model

Description

Computes plug-in residuals based on observed covariates and estimated coefficients. Measurement error is not corrected in residuals.

Usage

residuals(object)

Arguments

object

An object of class "eiv_mlr".

Value

A numeric vector of residuals.


Summary of an Errors-in-Variables Linear Model

Description

Produces a coefficient table including standard errors, z-statistics, p-values, and significance stars.

Usage

summary(object)

Arguments

object

An object of class "eiv_mlr".

Value

An object of class "summary" containing a coefficient table and model information.


Variance-Covariance Matrix for eiv_mlr Objects

Description

Variance-Covariance Matrix for eiv_mlr Objects

Usage

vcov(object)

Arguments

object

An object of class "eiv_mlr".

Value

A numeric variance-covariance matrix of the parameter estimates.