| Title: | Network-Aware IV Regression with Graph-Fused Lasso |
|---|---|
| Description: | Implements network-aware instrumental variable regression for causal node discovery in high-dimensional settings with graph-structured exposures. Provides IVGL and IVGL-S estimators combining graph-Laplacian penalization with IV-based identification, including correction for invalid instruments via a sisVIVE-style update. Methods are described in Pal and Ghosh (2026) <doi:10.48550/arXiv.2604.24969>. The 'glmgraph' package, required for the main estimators, is available at the additional repository <https://djghosh1123.r-universe.dev>. |
| Authors: | Dhrubajyoti Ghosh [aut, cre] (ORCID: <https://orcid.org/0000-0002-3360-3786>), Samhita Pal [aut] (ORCID: <https://orcid.org/0009-0001-4930-916X>) |
| Maintainer: | Dhrubajyoti Ghosh <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-24 13:27:50 UTC |
| Source: | https://github.com/cran/ivgls |
Corrupt a graph by random edge swaps
corrupt_graph(A, corruption_rate = 0.3)corrupt_graph(A, corruption_rate = 0.3)
A |
Symmetric p x p binary adjacency matrix. |
corruption_rate |
Proportion of edges to remove and replace. |
A corrupted adjacency matrix.
Compute performance metrics for support recovery
eval_support(true_support, estimated_support, p)eval_support(true_support, estimated_support, p)
true_support |
Integer vector of true active indices. |
estimated_support |
Integer vector of estimated active indices. |
p |
Total number of predictors. |
Named numeric vector with MCC, TPR, FPR, and Selected.
Generate a sparse true coefficient vector on a graph
generate_beta( A, s2 = 5, signal = 3, pattern = c("smooth", "nonsmooth", "community"), smooth_noise = 0.2 )generate_beta( A, s2 = 5, signal = 3, pattern = c("smooth", "nonsmooth", "community"), smooth_noise = 0.2 )
A |
Symmetric p x p adjacency matrix. |
s2 |
Number of active nodes. |
signal |
Causal effect magnitude. |
pattern |
One of |
smooth_noise |
SD of noise added around the base signal. |
A list with beta_true and active_set.
Simulate data for graph-IV regression
generate_data( n = 100, p = 70, q = 500, s1 = 0.1, s_alpha = 10, alpha_strength = 5, beta_true )generate_data( n = 100, p = 70, q = 500, s1 = 0.1, s_alpha = 10, alpha_strength = 5, beta_true )
n |
Sample size. |
p |
Number of exposures. |
q |
Number of instruments. |
s1 |
Fraction of instruments relevant for each exposure. |
s_alpha |
Number of invalid instruments. |
alpha_strength |
Direct-effect magnitude of invalid instruments. |
beta_true |
Numeric vector of length p of true causal effects. |
A list with Y, X, Z, A_true,
and alpha_true.
Compute the unnormalised graph Laplacian
get_laplacian(A)get_laplacian(A)
A |
Symmetric p x p binary adjacency matrix. |
A p x p Laplacian matrix.
Matthews Correlation Coefficient for support recovery
get_mcc(true_support, estimated_support, p)get_mcc(true_support, estimated_support, p)
true_support |
Integer vector of truly active indices. |
estimated_support |
Integer vector of estimated active indices. |
p |
Total number of predictors. |
A scalar between -1 and 1.
IV-LASSO: Two-stage LASSO without graph structure
iv_lasso(Y, X, Z)iv_lasso(Y, X, Z)
Y |
Numeric vector of length n. Outcome. |
X |
Numeric n x p matrix of endogenous exposures. |
Z |
Numeric n x q matrix of instruments. |
Numeric vector of length p of estimated causal effects.
IVGL: IV regression with graph-fused Lasso
ivgl(Y, X, Z, L)ivgl(Y, X, Z, L)
Y |
Numeric vector of length n. Outcome. |
X |
Numeric n x p matrix of endogenous exposures. |
Z |
Numeric n x q matrix of instruments. |
L |
Numeric p x p graph Laplacian (see |
Numeric vector of length p of estimated causal effects.
Extends IVGL with an alternating sisVIVE-style update to handle partially invalid instruments that violate the exclusion restriction.
ivgl_s(Y, X, Z, L, max_iter = 20, verbose = FALSE)ivgl_s(Y, X, Z, L, max_iter = 20, verbose = FALSE)
Y |
Numeric vector of length n. Outcome. |
X |
Numeric n x p matrix of endogenous exposures. |
Z |
Numeric n x q matrix of instruments. |
L |
Numeric p x p graph Laplacian. |
max_iter |
Maximum number of alternating iterations. Default 20. |
verbose |
Print CV loss at each iteration. Default FALSE. |
A list with beta (length p causal effects) and
alpha (length q direct IV-outcome effects).
Construct a graph adjacency matrix
make_graph( p = 70, type = c("proximity", "ring", "chain", "community", "disconnected") )make_graph( p = 70, type = c("proximity", "ring", "chain", "community", "disconnected") )
p |
Number of nodes. |
type |
One of |
A symmetric p x p binary adjacency matrix.
Run a single simulation replicate
run_one_replicate( n = 100, p = 70, q = 500, graph_type = "proximity", signal_pattern = "smooth", fit_graph_type = NULL, graph_corruption = 0, s2 = 5, signal = 3, s_alpha = 10, alpha_strength = 5, smooth_noise = 0.2, threshold = 1e-04 )run_one_replicate( n = 100, p = 70, q = 500, graph_type = "proximity", signal_pattern = "smooth", fit_graph_type = NULL, graph_corruption = 0, s2 = 5, signal = 3, s_alpha = 10, alpha_strength = 5, smooth_noise = 0.2, threshold = 1e-04 )
n |
Sample size. |
p |
Number of exposures. |
q |
Number of instruments. |
graph_type |
Graph topology passed to |
signal_pattern |
One of |
fit_graph_type |
Graph supplied to estimators. If NULL uses
|
graph_corruption |
Proportion of edges to corrupt. Default 0. |
s2 |
Number of active nodes. |
signal |
Causal effect magnitude. |
s_alpha |
Number of invalid instruments. |
alpha_strength |
Invalid-IV direct-effect magnitude. |
smooth_noise |
Noise on the smooth signal pattern. |
threshold |
Coefficients below this are treated as zero. |
A data.frame with one row per method and columns Method, MSE, MCC, TPR, FPR, Selected.
Run a simulation study with multiple replicates
run_simulation( B = 100, n = 100, p = 70, q = 500, graph_type = "proximity", signal_pattern = "smooth", fit_graph_type = NULL, graph_corruption = 0, s2 = 5, signal = 3, s_alpha = 10, alpha_strength = 5, smooth_noise = 0.2, threshold = 1e-04 )run_simulation( B = 100, n = 100, p = 70, q = 500, graph_type = "proximity", signal_pattern = "smooth", fit_graph_type = NULL, graph_corruption = 0, s2 = 5, signal = 3, s_alpha = 10, alpha_strength = 5, smooth_noise = 0.2, threshold = 1e-04 )
B |
Number of Monte Carlo replicates. |
n |
Sample size. |
p |
Number of exposures. |
q |
Number of instruments. |
graph_type |
Graph topology passed to |
signal_pattern |
One of |
fit_graph_type |
Graph supplied to estimators. If NULL uses
|
graph_corruption |
Proportion of edges to corrupt. Default 0. |
s2 |
Number of active nodes. |
signal |
Causal effect magnitude. |
s_alpha |
Number of invalid instruments. |
alpha_strength |
Invalid-IV direct-effect magnitude. |
smooth_noise |
Noise on the smooth signal pattern. |
threshold |
Coefficients below this are treated as zero. |
A data.frame with 3*B rows, one per method per replicate.