| Title: | LUCID with Multiple Omics Data |
| Version: | 3.1.0 |
| Description: | Implements Latent Unknown Clusters By Integrating Multi-omics Data (LUCID; Peng (2019) <doi:10.1093/bioinformatics/btz667>) for integrative clustering with exposures, multi-omics data, and health outcomes. Supports three integration strategies: early, parallel, and serial. Provides model fitting and tuning, lasso-type regularization for exposure and omics feature selection, handling of missing data, including both sporadic and complete-case patterns, prediction, and g-computation for estimating causal effects of exposures, bootstrap inference for uncertainty estimation, and S3 summary and plot methods. For the multi-omics integration framework, see Jia (2024) https://journal.r-project.org/articles/RJ-2024-012/RJ-2024-012.pdf. For the missing-data imputation mechanism, see Jia (2024) <doi:10.1093/bioadv/vbae123>. |
| Depends: | R (≥ 3.6.0) |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| LazyData: | true |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Imports: | mclust, nnet, boot, jsonlite, networkD3, progress, stats, utils, glasso, glmnet |
| URL: | https://journal.r-project.org/articles/RJ-2024-012/RJ-2024-012.pdf, https://doi.org/10.1093/bioadv/vbae123 |
| NeedsCompilation: | no |
| Packaged: | 2026-03-11 00:36:46 UTC; qiranjia19961112 |
| Author: | Qiran Jia |
| Maintainer: | Qiran Jia <qiranjia@usc.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-11 08:10:16 UTC |
I-step of LUCID
Description
Impute missing data in Z by maximizing the likelihood given fixed parameters
Usage
Istep_Z(Z, p, mu, sigma, index, lucid_model)
Arguments
Z |
Matrix or list of matrices for omics data |
p |
Probability matrix/vector or list for parallel model |
mu |
Mean matrix or list of matrices |
sigma |
Covariance array or list of arrays |
index |
Missing value indicators |
lucid_model |
Model type |
Value
Imputed dataset
Analyze missing data patterns in detail
Description
Analyze missing data patterns in detail
Usage
analyze_missing_pattern(Z)
Arguments
Z |
Matrix of data to analyze for missing patterns |
Value
List containing detailed missing data analysis
Inference of LUCID model based on bootstrap resampling
Description
Generate R bootstrap replicates of LUCID parameters and
derive confidence interval (CI) based on bootstrap. Bootstrap replicates are
generated by nonparametric resampling, implemented with the ordinary
method of boot::boot. Supports lucid_model = "early",
lucid_model = "parallel", and lucid_model = "serial".
Usage
boot_lucid(
G,
Z,
Y,
lucid_model = c("early", "parallel", "serial"),
CoG = NULL,
CoY = NULL,
model,
conf = 0.95,
R = 100,
verbose = FALSE
)
Arguments
G |
Exposures, a numeric vector, matrix, or data frame. Categorical variable should be transformed into dummy variables. If a matrix or data frame, rows represent observations and columns correspond to variables. |
Z |
Omics data: for LUCID early integration, a numeric matrix/data frame; for LUCID in parallel, a list of numeric matrices/data frames. Rows correspond to observations and columns correspond to variables. |
Y |
Outcome, a numeric vector. Categorical variable is not allowed. Binary outcome should be coded as 0 and 1. |
lucid_model |
Specifying LUCID model, "early" for early integration, "parallel" for LUCID in parallel, "serial" for LUCID in serial. Bootstrap inference is implemented for all three model types. |
CoG |
Optional, covariates to be adjusted for estimating the latent cluster. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
CoY |
Optional, covariates to be adjusted for estimating the association between latent cluster and the outcome. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
model |
A LUCID model fitted by |
conf |
A numeric scalar between 0 and 1 to specify confidence level(s) of the required interval(s). |
R |
An integer to specify number of bootstrap replicates for LUCID model. If feasible, it is recommended to set R >= 1000. |
verbose |
A flag indicates whether detailed information is printed in console. Default is FALSE. |
Value
A list containing:
beta |
Bootstrap CI table(s) for G-to-X effects. For
|
mu |
Bootstrap CI table(s) for cluster-specific means of omics features.
For |
gamma |
Bootstrap CI table for X-to-Y parameters. |
stage |
For |
bootstrap |
The |
Examples
# use simulated data
G <- sim_data$G[1:300, , drop = FALSE]
Z <- sim_data$Z[1:300, , drop = FALSE]
Y_normal <- sim_data$Y_normal[1:300]
# fit lucid model
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early",
family = "normal", K = 2,
seed = 1008)
# conduct bootstrap resampling
boot1 <- suppressWarnings(
boot_lucid(G = G, Z = Z, Y = Y_normal,
lucid_model = "early", model = fit1, R = 5)
)
# Use 90% CI
boot2 <- suppressWarnings(
boot_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early",
model = fit1, R = 5, conf = 0.9)
)
Check matrix condition and stabilize if needed
Description
Check matrix condition and stabilize if needed
Usage
check_and_stabilize_sigma(sigma, threshold = 1e+10, epsilon = 1e-06)
Arguments
sigma |
Covariance matrix to check |
threshold |
Condition number threshold (default 1e10) |
epsilon |
Small constant for regularization (default 1e-6) |
Value
Stabilized covariance matrix
Check convergence with both absolute and relative criteria
Description
Check convergence with both absolute and relative criteria
Usage
check_convergence(old_val, new_val, abs_tol = 1e-06, rel_tol = 1e-06)
Arguments
old_val |
Previous value |
new_val |
New value |
abs_tol |
Absolute tolerance |
rel_tol |
Relative tolerance |
Value
Boolean indicating convergence
Check quality of imputed data
Description
Check quality of imputed data
Usage
check_imputation_quality(original, imputed)
Arguments
original |
Original data matrix with missing values |
imputed |
Imputed data matrix |
Value
List containing imputation quality metrics and overall assessment
Check missing patterns in omics data
Description
Check missing patterns in omics data
Usage
check_na(Z, lucid_model = c("early", "parallel"))
Arguments
Z |
A data matrix or list of matrices for parallel model |
lucid_model |
Model type ("early" or "parallel") |
Value
List containing missing data analysis
Fit LUCID models with one or multiple omics layers
Description
EM algorithm to estimate LUCID with one or multiple omics layers
Usage
estimate_lucid(
lucid_model = c("early", "parallel", "serial"),
G,
Z,
Y,
CoG = NULL,
CoY = NULL,
K,
init_omic.data.model = "EEV",
useY = TRUE,
tol = 0.001,
max_itr = 1000,
max_tot.itr = 10000,
Rho_G = 0,
Rho_Z_Mu = 0,
Rho_Z_Cov = 0,
family = c("normal", "binary"),
seed = 123,
init_impute = c("mix", "lod"),
init_par = c("mclust", "random"),
verbose = FALSE
)
Arguments
lucid_model |
Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial |
G |
an N by P matrix representing exposures |
Z |
Omics data, if "early", an N by M matrix; If "parallel", a list, each element i is a matrix with N rows and P_i features; If "serial", a list, each element i is a matrix with N rows and p_i features or a list with two or more matrices with N rows and a certain number of features |
Y |
a length N vector |
CoG |
an N by V matrix representing covariates to be adjusted for G -> X |
CoY |
an N by K matrix representing covariates to be adjusted for X -> Y |
K |
Number of latent clusters. If "early", an integer greater or equal to 2; If "parallel", an integer vector, same length as Z, with each element being an integer greater or equal to 2; If "serial", a list, each element is either an integer like that for "early" or an list of integers like that for "parallel", same length as Z |
init_omic.data.model |
a vector of strings specifies the geometric model of omics data. If NULL, See more in ?mclust::mclustModelNames |
useY |
logical, if TRUE, EM algorithm fits a supervised LUCID; otherwise unsupervised LUCID. |
tol |
stopping criterion for the EM algorithm |
max_itr |
Maximum iterations of the EM algorithm. If the EM algorithm iterates more than max_itr without converging, the EM algorithm is forced to stop. |
max_tot.itr |
Max number of total iterations for |
Rho_G |
A scalar. This parameter is the LASSO penalty to regularize
exposure coefficients in the G-to-X model. |
Rho_Z_Mu |
A scalar. This parameter is the LASSO penalty to
regularize cluster-specific means for omics data (Z). If user wants to tune the
penalty, use the wrapper function |
Rho_Z_Cov |
A scalar. This parameter is the graphical LASSO
penalty to estimate sparse cluster-specific variance-covariance matrices for omics
data (Z). If user wants to tune the penalty, use the wrapper function |
family |
The distribution of the outcome |
seed |
Random seed to initialize the EM algorithm |
init_impute |
Method to initialize the imputation of missing values in
LUCID. |
init_par |
For "early", an interface to initialize EM algorithm, if mclust,
initiate the parameters using the |
verbose |
Logging level for fitting progress. If |
Value
A list contains the object below:
res_Beta: estimation for G->X associations
res_Mu: estimation for the mu of the X->Z associations
res_Sigma: estimation for the sigma of the X->Z associations
res_Gamma: estimation for X->Y associations
inclusion.p: inclusion probability of cluster assignment for each observation
K: number of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"
var.names: names for the G, Z, Y variables
init_omic.data.model: pre-specified geometric model of multi-omics data
likelihood: converged LUCID model log likelihood
family: the distribution of the outcome
select: for "early" and "parallel", feature-selection indicators. For "parallel",
select$selectGis the exposure-wise union across layers (selected in at least one layer) andselect$selectG_layerstores per-layer exposure selection.useY: whether this LUCID model is supervised
Z: multi-omics data
init_impute: pre-specified imputation method
init_par: pre-specified parameter initialization method
Rho: for "early" and "parallel", pre-specified regularity tuning parameters
N: number of observations
submodel: for LUCID in serial only, storing all the submodels
em_control: EM stopping controls used for fitting (
tol,max_itr,max_tot.itr); used by bootstrap refits
Examples
i <- 1008
set.seed(i)
G <- matrix(rnorm(500), nrow = 100)
Z1 <- matrix(rnorm(1000),nrow = 100)
Z2 <- matrix(rnorm(1000), nrow = 100)
Z3 <- matrix(rnorm(1000), nrow = 100)
Z4 <- matrix(rnorm(1000), nrow = 100)
Z5 <- matrix(rnorm(1000), nrow = 100)
Z <- list(Z1 = Z1, Z2 = Z2, Z3 = Z3, Z4 = Z4, Z5 = Z5)
Y <- rnorm(100)
CoY <- matrix(rnorm(200), nrow = 100)
CoG <- matrix(rnorm(200), nrow = 100)
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y, K = list(2,2,2,2,2),
lucid_model = "serial",
family = "normal",
seed = i,
CoG = CoG, CoY = CoY,
useY = TRUE)
Impute missing data by optimizing the likelihood function
Description
Impute missing data by optimizing the likelihood function
Usage
fill_data(obs, mu, sigma, p, index, lucid_model)
Arguments
obs |
a vector of length M |
mu |
a matrix of size K x M for parallel model, M x K for early model |
sigma |
a matrix of size M x M x K |
p |
a vector of length K |
index |
a vector of length M, indicating whether a value is missing |
lucid_model |
Specifying LUCID model |
Value
an observation with updated imputed values
generate bootstrp ci (normal, basic and percentile)
Description
generate bootstrp ci (normal, basic and percentile)
Usage
gen_ci(x, conf = 0.95)
Arguments
x |
an object return by boot function |
conf |
A numeric scalar between 0 and 1 to specify confidence level(s) of the required interval(s). |
Value
a matrix, the first column is the point estimate from original model
Fit a lucid model for integrated analysis on exposure, outcome and multi-omics data, allowing for tuning
Description
Fit a lucid model for integrated analysis on exposure, outcome and multi-omics data, allowing for tuning
Usage
lucid(
G,
Z,
Y,
CoG = NULL,
CoY = NULL,
family = c("normal", "binary"),
K = 2,
lucid_model = c("early", "parallel", "serial"),
Rho_G = 0,
Rho_Z_Mu = 0,
Rho_Z_Cov = 0,
verbose_tune = FALSE,
...
)
Arguments
G |
Exposures, a numeric vector, matrix, or data frame. Categorical variable should be transformed into dummy variables. If a matrix or data frame, rows represent observations and columns correspond to variables. |
Z |
Omics data. If "early", an N by M matrix. If "parallel", a list, each element i is a matrix with N rows and P_i features. If "serial", a list, each element i is either a matrix with N rows and p_i features, or a list with two or more matrices with N rows. |
Y |
Outcome, a numeric vector. Categorical variable is not allowed. Binary outcome should be coded as 0 and 1. |
CoG |
Optional, covariates to be adjusted for estimating the latent cluster. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
CoY |
Optional, covariates to be adjusted for estimating the association between latent cluster and the outcome. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
family |
Distribution of outcome. For continuous outcome, use "normal"; for binary outcome, use "binary". Default is "normal". |
K |
Number of latent clusters to be tuned. For lucid_model = "early", number of latent clusters (should be greater or equal than 2). Either an integer or a vector of integer. If K is a vector, model selection on K is performed. For lucid_model = "parallel",a list with vectors of integers or just integers, same length as Z, if the element itself is a vector, model selection on K is performed; For lucid_model = "serial", a list, each element is either an integer or an list of integers, same length as Z, if the smallest element (integer) itself is a vector, model selection on K is performed |
lucid_model |
Specifying LUCID model, "early" for early integration, "parallel" for lucid in parallel, "serial" for lucid in serial |
Rho_G |
A scalar or a vector. This parameter is the LASSO penalty to regularize
exposure coefficients in the G-to-X model; |
Rho_Z_Mu |
A scalar or a vector. This parameter is the LASSO penalty to
regularize cluster-specific means for omics data (Z). If it is a vector,
|
Rho_Z_Cov |
A scalar or a vector. This parameter is the graphical LASSO
penalty to estimate sparse cluster-specific variance-covariance matrices for omics
data (Z). If it is a vector, |
verbose_tune |
A flag to print details of tuning process. |
... |
Other parameters passed to |
Value
An optimal LUCID model
res_Beta: estimation for G->X associations
res_Mu: estimation for the mu of the X->Z associations
res_Sigma: estimation for the sigma of the X->Z associations
res_Gamma: estimation for X->Y associations
inclusion.p: inclusion probability of cluster assignment for each observation
K: number of latent clusters for "early"/list of numbers of latent clusters for "parallel" and "serial"
var.names: names for the G, Z, Y variables
init_omic.data.model: pre-specified geometric model of multi-omics data
likelihood: converged LUCID model log likelihood
family: the distribution of the outcome
select: for "early" and "parallel", feature-selection indicators. For "parallel",
select$selectGis the exposure-wise union across layers andselect$selectG_layerstores per-layer exposure selection.useY: whether this LUCID model is supervised
Z: multi-omics data
init_impute: pre-specified imputation method
init_par: pre-specified parameter initialization method
Rho: for "early" and "parallel", pre-specified regularity tuning parameters
N: number of observations
submodel: for LUCID in serial only, storing all the submodels
Examples
# LUCID early integration (quick smoke example)
G <- sim_data$G[1:80, , drop = FALSE]
Z <- sim_data$Z[1:80, , drop = FALSE]
Y <- sim_data$Y_normal[1:80]
fit_early <- lucid(
G = G, Z = Z, Y = Y,
lucid_model = "early", family = "normal", K = 2,
max_itr = 30, max_tot.itr = 60, seed = 1008
)
# LUCID in parallel (two layers)
i <- 1008
set.seed(i)
G <- matrix(rnorm(240), nrow = 80)
Z1 <- matrix(rnorm(320), nrow = 80)
Z2 <- matrix(rnorm(320), nrow = 80)
Z <- list(Z1 = Z1, Z2 = Z2)
CoY <- matrix(rnorm(160), nrow = 80)
CoG <- matrix(rnorm(160), nrow = 80)
Y <- rnorm(80)
fit_parallel <- lucid(
G = G, Z = Z, Y = Y, K = list(2, 2),
CoG = CoG, CoY = CoY, lucid_model = "parallel",
family = "normal", seed = i,
max_itr = 30, max_tot.itr = 60
)
Visualize LUCID model through a Sankey diagram
Description
In the Sankey diagram, each node either represents a variable (exposure, omics or outcome) or a latent cluster. Each line represents an association. The color of the node represents variable type, either exposure, omics or outcome. The width of the line represents the effect size of a certain association; the color of the line represents the direction of a certain association. Only work for LUCID early for now.
Usage
plot(x, ...)
Arguments
x |
A LUCID model fitted by |
... |
Additional arguments to specify colors and fontsize |
Value
A DAG graph created by sankeyNetwork
Examples
# prepare data
G <- sim_data$G
Z <- sim_data$Z
Y_normal <- sim_data$Y_normal
Y_binary <- sim_data$Y_binary
cov <- sim_data$Covariate
# plot lucid model
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early",
CoY = NULL, family = "normal", K = 2, seed = 1008)
plot(fit1)
# change node color
plot(fit1, G_color = "yellow")
plot(fit1, Z_color = "red")
# change link color
plot(fit1, pos_link_color = "red", neg_link_color = "green")
Predict Cluster Assignment and Outcome From a Fitted LUCID Model
Description
Predict cluster assignment and outcome using new data on G, Z, and optional Y.
If g_computation = TRUE, prediction uses only the G-to-X path from the
fitted model and returns counterfactual-style predictions under modified G.
This function can also be used to extract latent cluster assignments when using
the training data as input.
Usage
predict_lucid(
model,
lucid_model = c("early", "parallel", "serial"),
G,
Z = NULL,
Y = NULL,
CoG = NULL,
CoY = NULL,
response = TRUE,
g_computation = FALSE,
verbose = FALSE
)
Arguments
model |
A model fitted and returned by |
lucid_model |
Specifying LUCID model, "early" for early integration, "parallel" for LUCID in parallel, and "serial" for LUCID in serial. |
G |
Exposures, a numeric vector, matrix, or data frame. Categorical variable should be transformed into dummy variables. If a matrix or data frame, rows represent observations and columns correspond to variables. |
Z |
Omics data. If "early", an N by M matrix. If "parallel", a list, each
element i is a matrix with N rows and P_i features. If "serial", a list, each
element i is a matrix with N rows and p_i features (or a list with two or more
matrices with N rows and a certain number of features). For
|
Y |
Outcome, a numeric vector. Categorical variable is not allowed. Binary outcome should be coded as 0 and 1. |
CoG |
Optional, covariates to be adjusted for estimating the latent cluster. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
CoY |
Optional, covariates to be adjusted for estimating the association between latent cluster and the outcome. A numeric vector, matrix or data frame. Categorical variable should be transformed into dummy variables. |
response |
If |
g_computation |
If |
verbose |
A flag indicates whether detailed information is printed in console. Default is FALSE. |
Value
A list containing:
inclusion.p |
Posterior inclusion probabilities for latent clusters (a matrix for "early"; a list by layer for "parallel" and "serial"). |
pred.x |
Predicted latent-cluster labels (a numeric vector for "early"; a list by layer for "parallel" and "serial"). |
pred.y |
Predicted outcome values. For binary outcomes, this is class
labels when |
pred.z |
Predicted omics means under g-computation mode
( |
Examples
# prepare data
G <- sim_data$G
Z <- sim_data$Z
Y_normal <- sim_data$Y_normal
# fit lucid model
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early", K = 2, family = "normal")
# prediction on training set
pred1 <- predict_lucid(model = fit1, G = G, Z = Z, Y = Y_normal, lucid_model = "early")
pred2 <- predict_lucid(model = fit1, G = G, Z = Z, lucid_model = "early")
# g-computation style prediction using only G
pred_g <- predict_lucid(model = fit1, G = G, Z = NULL, g_computation = TRUE, lucid_model = "early")
Print the output of LUCID in a nicer table
Description
Print the output of LUCID in a nicer table
Usage
## S3 method for class 'sumlucid_early'
print(x, ...)
Arguments
x |
An object returned by |
... |
Other parameters to be passed to |
Value
Prints a structured model summary, including model specification, missing-data
profile, feature-selection overview, model fit statistics, regularization settings,
and detailed parameter estimates. If boot.se is provided in summary(),
bootstrap CI tables are shown for sections (1) Y, (2) Z, and (3) E.
Examples
# use simulated data
G <- sim_data$G[1:300, , drop = FALSE]
Z <- sim_data$Z[1:300, , drop = FALSE]
Y_normal <- sim_data$Y_normal[1:300]
# fit lucid model
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early", family = "normal", K = 2,
seed = 1008)
# conduct bootstrap resampling
boot1 <- suppressWarnings(
boot_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early", model = fit1, R = 5)
)
# print the summary of the lucid model in a table
temp <- summary(fit1)
print(temp)
Print the output of LUCID in a nicer table
Description
Print the output of LUCID in a nicer table
Usage
## S3 method for class 'sumlucid_parallel'
print(x, ...)
Arguments
x |
An object returned by |
... |
Other parameters to be passed to |
Value
Prints a structured parallel-model summary, including per-layer missing-data
profile, overall and per-layer feature-selection overview, model fit statistics,
regularization settings, and detailed parameter estimates for Y, Z, and E.
If boot.se is provided in summary(), bootstrap CI tables are
shown for sections (1) Y, (2) Z, and (3) E.
Print the output of LUCID in a nicer table
Description
Print the output of LUCID in a nicer table
Usage
## S3 method for class 'sumlucid_serial'
print(x, ...)
Arguments
x |
An object returned by |
... |
Other parameters to be passed to |
Value
A nice table/several nice tables of the summary of the LUCID model
Safe imputation for edge cases
Description
Safe imputation for edge cases
Usage
safe_impute(Z, method = c("mean", "median", "lod"))
Arguments
Z |
Matrix with missing values |
method |
Imputation method ("mean", "median", "lod") |
Value
Imputed matrix
Safe log-sum-exp computation
Description
Safe log-sum-exp computation
Usage
safe_log_sum_exp(x)
Arguments
x |
Numeric vector |
Value
Numerically stable result
Safe probability normalization
Description
Safe probability normalization
Usage
safe_normalize(x)
Arguments
x |
Vector of probabilities |
Value
Normalized probabilities
Safe matrix inversion with stability checks
Description
Safe matrix inversion with stability checks
Usage
safe_solve(matrix, tol = 1e-06)
Arguments
matrix |
Matrix to invert |
tol |
Tolerance for numerical stabilization |
Value
Inverted matrix or NULL if inversion fails
A simulated dataset for LUCID
Description
This is an example dataset to illustrate LUCID model. It is simulated by assuming there are 2 latent clusters in the data. We assume the exposures are associated with latent cluster which ultimately affects the PFAS concentration and liver injury in children. The latent clusters are also characterized by differential levels of metabolites.
Usage
sim_data
Format
A list with 5 matrices corresponding to exposures (G), omics data (Z), a continuous outcome, a binary outcome and 2 covariates (can be used either as CoX or CoY). Each matrice contains 2000 observations.
- G
10 exposures
- Z
10 metabolites
- Y_normal
Outcome, PFAS concentration in children
- Y_bninary
Bianry outcome, liver injury status
- Covariates
2 continous covariates, can be treated as either CoX or CoY
- X
Latent clusters
A simulated HELIX dataset for LUCID
Description
The Human Early-Life Exposome (HELIX) project is multi-center research project that aims to characterize early-life environmental exposures and associate these with omics biomarkers and child health outcomes (Vrijheid, 2014. doi: 10.1289/ehp.1307204). We used a subset of HELIX data from Exposome Data Challenge 2021 (hold by ISGlobal) as an example to illustrate LUCID model.
Usage
simulated_HELIX_data
Format
A list with 4 matrices corresponding to exposures (G), omics data (Z), outcome (Y) and covariates (CoY), a total of 420 observations
- exposure
1 exposures measuring the maternal exposure to utero mercury.
- omics
10 methylomics, 10 transcriptomics, 10 miRNA
- outcome
A continuous outcome as an indicator of metabolic-dysfunciton-associated fatty liver disease (MAFLD)
- covariate
3 covariates including fish_preg_ter, child sex, maternal age
Summarize missing-data patterns from check_na output
Description
Summarize missing-data patterns from check_na output
Usage
summarize_missing_stats(na_pattern, lucid_model = c("early", "parallel"))
Arguments
na_pattern |
Output list from check_na |
lucid_model |
Model type ("early" or "parallel") |
Value
Missing-data summary list
Summarize results of the parallel LUCID model
Description
Summarize results of the parallel LUCID model
Usage
## S3 method for class 'lucid_parallel'
summary(object, ...)
Arguments
object |
A LUCID model fitted by |
... |
Additional argument |
Value
A list containing model information, fit statistics, feature-selection summaries (overall + by layer), detailed parameter estimates (by layer), missing-data summaries (by layer), and optional bootstrap CI tables.
Summarize results of the serial LUCID model
Description
Summarize results of the serial LUCID model
Usage
## S3 method for class 'lucid_serial'
summary(object, ...)
Arguments
object |
A LUCID model fitted by |
... |
Additional argument of |
Value
A list, containing the extracted key parameters from the LUCID model that can be used to print the summary table
Summarize results of the early LUCID model
Description
Summarize results of the early LUCID model
Usage
summary_lucid(object, ...)
## S3 method for class 'early_lucid'
summary(object, ...)
Arguments
object |
A LUCID model fitted by |
... |
Additional argument |
Value
A list containing model information, fit statistics, feature-selection summaries, detailed parameter estimates, missing-data summaries, and optional bootstrap CI tables.
Examples
# use simulated data
G <- sim_data$G[1:300, , drop = FALSE]
Z <- sim_data$Z[1:300, , drop = FALSE]
Y_normal <- sim_data$Y_normal[1:300]
# fit lucid model
fit1 <- estimate_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early", family = "normal", K = 2,
seed = 1008)
# conduct bootstrap resampling
boot1 <- suppressWarnings(
boot_lucid(G = G, Z = Z, Y = Y_normal, lucid_model = "early", model = fit1, R = 5)
)
# summarize lucid model
summary(fit1)
# summarize lucid model with bootstrap CIs
summary(fit1, boot.se = boot1)
Wrapper for LUCID Model and Penalty Tuning
Description
Fit a grid of LUCID models over candidate numbers of latent
clusters K and (optionally) L1 penalties Rho_G,
Rho_Z_Mu, and Rho_Z_Cov. The input format for K differs
by lucid_model. For "early", use an integer vector (for example,
2:4). For "parallel", use a list of vectors/integers, one per layer
(for example, list(2:3, 2:3, 2)). For "serial", use a nested list as
required by the serial model.
Usage
tune_lucid(
G,
Z,
Y,
CoG = NULL,
CoY = NULL,
family = c("normal", "binary"),
K,
lucid_model = c("early", "parallel", "serial"),
Rho_G = 0,
Rho_Z_Mu = 0,
Rho_Z_Cov = 0,
verbose_tune = FALSE,
...
)
Arguments
G |
Exposures, a numeric vector, matrix, or data frame. Categorical variables should be transformed into dummy variables. |
Z |
Omics data. If "early", an N by M matrix. If "parallel", a list of matrices (same N). If "serial", a list matching the serial model structure. |
Y |
Outcome, a numeric vector. Binary outcomes should be coded as 0/1. |
CoG |
Optional covariates for the G-to-X model. |
CoY |
Optional covariates for the X-to-Y model. |
family |
Outcome family: "normal" or "binary". |
K |
Candidate latent-cluster values in model-specific format. |
lucid_model |
LUCID model type: "early", "parallel", or "serial". |
Rho_G |
Scalar or vector penalty for exposure coefficients in the
G-to-X model. |
Rho_Z_Mu |
Scalar or vector penalty for cluster-specific Z means. Vector tuning is supported for "early" and "parallel". For "serial", only scalar inputs are supported. |
Rho_Z_Cov |
Scalar or vector penalty for cluster-specific Z covariance matrices. Vector tuning is supported for "early" and "parallel". For "serial", only scalar inputs are supported. |
verbose_tune |
Logical; print tuning progress if |
... |
Additional arguments passed to |
Value
A list containing the tuning table, fitted models, and the selected optimal model. Returned element names differ slightly by model:
"early":
best_model,tune_list,res_model"parallel"/"serial":
model_opt,tune_K,model_list
Examples
## Not run:
G <- sim_data$G
Z <- sim_data$Z
Y <- sim_data$Y_normal
tune_early <- tune_lucid(G = G, Z = Z, Y = Y, lucid_model = "early", K = 2:3)
tune_rho <- tune_lucid(
G = G, Z = Z, Y = Y, lucid_model = "early", K = 2,
Rho_G = c(0, 0.1), Rho_Z_Mu = c(0, 5), Rho_Z_Cov = c(0, 0.1)
)
## End(Not run)