Estimating causal effects from high-dimensional, network-structured exposures is a core challenge in neuroscience, genomics, and environmental science. Standard penalized regression recovers associations, not causal effects: when unmeasured confounders jointly influence both the exposures and the outcome, ordinary regression coefficients are biased. Instrumental variables (IVs) break this bias by exploiting external perturbations — such as genetic variants in brain-imaging studies — that shift exposures without directly affecting the outcome.
The ivgls package combines IV identification with graph-structured regularization. When the exposures are nodes in a known network (e.g., brain regions linked by white-matter tracts, genes linked by a pathway graph), borrowing strength across connected nodes improves support recovery. In high-dimensional genetic applications, some candidate instruments may violate the exclusion restriction through horizontal pleiotropy; ivgls corrects for this via a sisVIVE-style alternating update.
The package provides three estimators:
| Estimator | Graph penalty | Invalid-IV correction | Requires glmgraph |
|---|---|---|---|
iv_lasso |
No | No | No |
ivgl |
Yes | No | Yes |
ivgl_s |
Yes | Yes | Yes |
iv_lasso is a pure-glmnet two-stage LASSO
baseline. ivgl adds a graph-Laplacian penalty in the second
stage via glmgraph. ivgl_s further estimates
and removes direct instrument-to-outcome effects.
# Step 1: install glmgraph (not on CRAN; required for ivgl and ivgl_s)
devtools::install_github("cran/glmgraph")
# Step 2: install ivgls
install.packages("ivgls")iv_lasso() and all graph construction, simulation, and
evaluation utilities work without glmgraph. Calling
ivgl() or ivgl_s() without it installed
produces an informative error with the install command above.
generate_beta() places \(s_0\) active nodes on the graph according
to one of three signal patterns. The smooth pattern
places active nodes in a spatially contiguous cluster with
similarly-signed effects.
set.seed(1)
bobj <- generate_beta(A, s2 = 4, signal = 2, pattern = "smooth")
cat("Active nodes:", sort(bobj$active_set), "\n")
#> Active nodes: 23 24 25 26
cat("True effects at active nodes:\n")
#> True effects at active nodes:
print(round(bobj$beta_true[bobj$active_set], 3))
#> [1] 2.037 1.833 2.319 2.066generate_data() draws instruments \(Z \in \mathbb{R}^{n \times q}\), constructs
exposures with latent confounding, and introduces s_alpha
invalid instruments whose direct effects on \(Y\) violate the exclusion restriction.
dat <- generate_data(
n = 100,
p = 30,
q = 100,
s_alpha = 5,
alpha_strength = 3,
beta_true = bobj$beta_true
)
cat("Y:", length(dat$Y),
"| X:", nrow(dat$X), "x", ncol(dat$X),
"| Z:", nrow(dat$Z), "x", ncol(dat$Z), "\n")
#> Y: 100 | X: 100 x 30 | Z: 100 x 100
cat("Invalid instrument indices:", which(dat$alpha_true != 0), "\n")
#> Invalid instrument indices: 1 2 3 4 5beta_ivl <- iv_lasso(dat$Y, dat$X, dat$Z)
cat("IV-LASSO selected nodes:", which(abs(beta_ivl) > 1e-4), "\n")
#> IV-LASSO selected nodes: 2 3 4 5 6 8 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28
cat("IV-LASSO MCC:",
round(get_mcc(bobj$active_set, which(abs(beta_ivl) > 1e-4), p = 30), 3), "\n")
#> IV-LASSO MCC: 0.196ivgl() and ivgl_s() require the
glmgraph package. Install it with:
devtools::install_github("cran/glmgraph").
if (requireNamespace("glmgraph", quietly = TRUE)) {
beta_ivgl <- ivgl(dat$Y, dat$X, dat$Z, L)
fit_s <- ivgl_s(dat$Y, dat$X, dat$Z, L, max_iter = 10)
cat("IVGL selected nodes:", which(abs(beta_ivgl) > 1e-4), "\n")
cat("IVGL-S selected nodes:", which(abs(fit_s$beta) > 1e-4), "\n")
cat("IVGL MCC:",
round(get_mcc(bobj$active_set, which(abs(beta_ivgl) > 1e-4), p = 30), 3), "\n")
cat("IVGL-S MCC:",
round(get_mcc(bobj$active_set, which(abs(fit_s$beta) > 1e-4), p = 30), 3), "\n")
cat("Detected invalid instruments:", which(abs(fit_s$alpha) > 1e-4), "\n")
} else {
message(
"Package 'glmgraph' is not installed. ",
"ivgl() and ivgl_s() are unavailable.\n",
"Install with: devtools::install_github(\"cran/glmgraph\")"
)
}
#> IVGL selected nodes: 2 3 4 5 6 16 18 20 23 24 25 26 28
#> IVGL-S selected nodes: 23 24 25 26 28
#> IVGL MCC: 0.449
#> IVGL-S MCC: 0.877
#> Detected invalid instruments: 1 2 3 4 5 6 32 37 42 57 59 60 78 79 84 97eval_support() returns MCC, TPR, FPR, and number of
selected variables.
corrupt_graph() randomly swaps a fraction of edges,
useful for sensitivity analysis when the graph is estimated rather than
known exactly.
run_simulation() runs \(B\) independent replicates and returns a
long data frame with one row per method per replicate.
if (requireNamespace("glmgraph", quietly = TRUE)) {
res <- run_simulation(
B = 5,
n = 100,
p = 30,
q = 100,
graph_type = "chain",
signal_pattern = "smooth",
s2 = 4,
signal = 2,
s_alpha = 5,
alpha_strength = 3
)
aggregate(cbind(MCC, MSE, TPR, FPR) ~ Method, data = res, FUN = mean)
} else {
message(
"Package 'glmgraph' is not installed. ",
"run_simulation() is unavailable.\n",
"Install with: devtools::install_github(\"cran/glmgraph\")"
)
}
#> Method MCC MSE TPR FPR
#> 1 IV_LASSO 0.4268386 0.119422153 1 0.4692308
#> 2 IVGL 0.5894756 0.082105051 1 0.2461538
#> 3 IVGL_S 0.7632763 0.003185131 1 0.1153846The methods in this package were developed for neuroimaging-genetics
studies using data from the Alzheimer’s Disease Neuroimaging Initiative
(ADNI). Brain region volumes from T1-weighted MRI serve as the
network-structured exposures; SNPs from genome-wide genotyping serve as
candidate instruments; and MMSE score serves as the cognitive outcome.
The DKT atlas defines a graph over 62 cortical and subcortical regions
of interest. In that setting, ivgl_s() identifies a
conservative set of eight causal ROIs robust to pleiotropic SNP effects,
while iv_lasso() and ivgl() select broader
sets that include confounded associations. See the companion paper for
full details.