| Title: | Privacy-Preserving Meta-Analysis via Low-Rank Basis Hunting |
|---|---|
| Description: | Tools for privacy-preserving meta-analysis of function-valued quantities across heterogeneous studies. Implements the 'MetaHunt' pipeline, including the denoised functional Successive Projection Algorithm (d-fSPA) for basis hunting, constrained weight estimation, Dirichlet regression of weights on study-level covariates, target prediction, and split/cross conformal prediction intervals. Operates on aggregate-level function evaluations, so individual-level data from source studies are not required. Methodology described in Shi, Imai, and Zhang (2026) <doi:10.48550/arXiv.2604.23847>. |
| Authors: | Wenqi Shi [aut, cre], Kosuke Imai [aut], Yi Zhang [aut] |
| Maintainer: | Wenqi Shi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-12 23:04:19 UTC |
| Source: | https://github.com/cran/MetaHunt |
Many downstream quantities of interest (average treatment effect,
pointwise predictions, or other functionals of ) are scalar
summaries of the predicted function. apply_wrapper() applies any
user-supplied reduction to each row of a function matrix, with a default
of the weighted mean with respect to grid_weights (which coincides with
when grid_weights represents ).
apply_wrapper(F_mat, wrapper = NULL, grid_weights = NULL)apply_wrapper(F_mat, wrapper = NULL, grid_weights = NULL)
F_mat |
An |
wrapper |
Either |
grid_weights |
Optional length- |
A length-n numeric vector of scalar summaries.
F_mat <- matrix(1:12, nrow = 3, byrow = TRUE) # 3 "functions" on a 4-point grid apply_wrapper(F_mat) # row means (uniform grid weights) apply_wrapper(F_mat, wrapper = max) # row maxes apply_wrapper(F_mat, wrapper = function(f) f[2]) # point evaluation at grid idx 2F_mat <- matrix(1:12, nrow = 3, byrow = TRUE) # 3 "functions" on a 4-point grid apply_wrapper(F_mat) # row means (uniform grid weights) apply_wrapper(F_mat, wrapper = max) # row maxes apply_wrapper(F_mat, wrapper = function(f) f[2]) # point evaluation at grid idx 2
Constructs a data frame of grid points suitable for f_hat_from_models()
from any reference patient-level dataset. This is convenient when the
patient-level covariate space is multidimensional and there is no
obvious one-dimensional grid.
build_grid(reference_data, n_grid = NULL, seed = NULL)build_grid(reference_data, n_grid = NULL, seed = NULL)
reference_data |
A data frame (or matrix) of reference patient-level covariates. May be a held-out target-population sample, the pooled source covariates, or any plausible reference distribution. |
n_grid |
Optional integer giving the desired grid size. If |
seed |
Optional integer seed for reproducibility (used only when sub-sampling). |
If n_grid is NULL or at least nrow(reference_data), the reference
data is returned unchanged. Otherwise n_grid rows are sampled uniformly
at random (without replacement). The reference data should be on the
same scale and have the same columns as the data each centre's model
was fitted on.
The empirical distribution of the returned grid implicitly defines the
measure used downstream. Pass uniform grid_weights (the
default) to weight each grid point equally; pass non-uniform
grid_weights to weight by an external reference distribution.
A data frame of grid points.
set.seed(1) ref <- data.frame(age = rnorm(500, 60, 10), bp = rnorm(500, 130, 15), sex = sample(c("F", "M"), 500, replace = TRUE)) grid <- build_grid(ref, n_grid = 50, seed = 1) nrow(grid) head(grid)set.seed(1) ref <- data.frame(age = rnorm(500, 60, 10), bp = rnorm(500, 130, 15), sex = sample(c("F", "M"), 500, replace = TRUE)) grid <- build_grid(ref, n_grid = 50, seed = 1) nrow(grid) head(grid)
Returns the regression coefficients from the underlying weight-model fit.
For the default "dirichlet" method this delegates to
DirichletReg::DirichReg()'s coef() method.
## S3 method for class 'metahunt_weight_model' coef(object, ...)## S3 method for class 'metahunt_weight_model' coef(object, ...)
object |
A fitted |
... |
Passed through to |
The coefficient vector / matrix returned by
DirichletReg::DirichReg()'s coef() method (numeric vector or matrix
depending on the parametrisation used by the underlying fit).
set.seed(1) m <- 60; K <- 3 W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- cbind(0.5 * W$w1, -0.3 * W$w2, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) coef(model)set.seed(1) m <- 60; K <- 3 W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- cbind(0.5 * W$w1, -0.3 * W$w2, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) coef(model)
Lower-level entry point that builds split conformal intervals for new
target covariates using an already-fitted d-fSPA basis decomposition,
an already-fitted weight model, and a user-supplied calibration set.
Use this when you have independently tuned K or want to reuse a
pipeline fit; otherwise the high-level split_conformal() is usually
more convenient.
conformal_from_fit( dfspa_fit, weight_model, F_cal, W_cal, W_new, alpha = 0.05, wrapper = NULL, grid_weights = NULL )conformal_from_fit( dfspa_fit, weight_model, F_cal, W_cal, W_new, alpha = 0.05, wrapper = NULL, grid_weights = NULL )
dfspa_fit |
A |
weight_model |
A |
F_cal |
An |
W_cal |
A matrix or data frame of study-level covariates for the
calibration set, with the same columns used to fit |
W_new |
A matrix or data frame of study-level covariates for new target studies. |
alpha |
Miscoverage level; default |
wrapper |
Optional reduction function (see |
grid_weights |
Optional length- |
An object of class "metahunt_conformal"; see split_conformal()
for a description of its fields.
split_conformal() for the high-level version that splits and
fits internally, cross_conformal() for the K-fold variant.
set.seed(1) G <- 30; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.8 * W$w1, -0.4 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) # user-controlled split and fit tr <- 1:50; cal <- 51:70; new <- 71:80 fit <- dfspa(F_hat[tr, ], K = 3) pih <- project_to_simplex(F_hat[tr, ], fit$bases) wm <- fit_weight_model(pih, W[tr, , drop = FALSE]) res <- conformal_from_fit( fit, wm, F_cal = F_hat[cal, ], W_cal = W[cal, , drop = FALSE], W_new = W[new, , drop = FALSE], wrapper = mean ) resset.seed(1) G <- 30; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.8 * W$w1, -0.4 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) # user-controlled split and fit tr <- 1:50; cal <- 51:70; new <- 71:80 fit <- dfspa(F_hat[tr, ], K = 3) pih <- project_to_simplex(F_hat[tr, ], fit$bases) wm <- fit_weight_model(pih, W[tr, , drop = FALSE]) res <- conformal_from_fit( fit, wm, F_cal = F_hat[cal, ], W_cal = W[cal, , drop = FALSE], W_new = W[new, , drop = FALSE], wrapper = mean ) res
Computes empirical coverage indicators of a fitted
"metahunt_conformal" object against held-out observed study-level
functions. The held-out studies must correspond positionally to the
targets used to build object (i.e. F_obs[i, ] is the observed
function for the same target whose prediction is in
object$prediction[i, ] or object$prediction[i]).
coverage(object, F_obs, grid_weights = NULL)coverage(object, F_obs, grid_weights = NULL)
object |
A |
F_obs |
An |
grid_weights |
Optional length- |
In pointwise mode each entry (i, g) of F_obs is compared against
[object$lower[i, g], object$upper[i, g]]. In scalar mode F_obs
is first reduced to a length-n_target vector via
apply_wrapper() using object$wrapper and the supplied
grid_weights, and then compared against
[object$lower, object$upper].
Coverage is a finite-sample diagnostic. Nominal coverage is
1 - object$alpha; empirical coverage will fluctuate around this
value due to sampling.
A list. In pointwise mode the list contains
pointwisen_target-by-G_grid logical matrix of
coverage indicators.
per_targetLength-n_target numeric vector of mean
coverage across the grid for each target.
per_grid_pointLength-G_grid numeric vector of mean
coverage across targets at each grid point.
overallScalar mean coverage across all entries.
nominalNominal coverage 1 - object$alpha.
In scalar mode the list contains
pointwiseLength-n_target logical vector of coverage
indicators.
overallScalar mean coverage.
nominalNominal coverage 1 - object$alpha.
set.seed(1) G <- 25; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.6 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) # held-out test set: same data-generating process, same W test_idx <- 1:10 train_idx <- setdiff(seq_len(m), test_idx) res <- split_conformal(F_hat[train_idx, ], W[train_idx, , drop = FALSE], W[test_idx, , drop = FALSE], K = 3, dfspa_args = list(denoise = FALSE), seed = 1) cov <- coverage(res, F_obs = F_hat[test_idx, , drop = FALSE]) cov$overallset.seed(1) G <- 25; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.6 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) # held-out test set: same data-generating process, same W test_idx <- 1:10 train_idx <- setdiff(seq_len(m), test_idx) res <- split_conformal(F_hat[train_idx, ], W[train_idx, , drop = FALSE], W[test_idx, , drop = FALSE], K = 3, dfspa_args = list(denoise = FALSE), seed = 1) cov <- coverage(res, F_obs = F_hat[test_idx, , drop = FALSE]) cov$overall
Computes K-fold split conformal intervals in which the calibration scores
are pooled across folds, while point predictions for new targets are
produced from a final pipeline fit on all studies. Equivalent to running
split_conformal() n_folds times with different calibration sets and
pooling all conformity scores into a single empirical distribution.
cross_conformal( F_hat, W, W_new, K, alpha = 0.05, n_folds = 5L, wrapper = NULL, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )cross_conformal( F_hat, W, W_new, K, alpha = 0.05, n_folds = 5L, wrapper = NULL, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )
F_hat |
An |
W |
An |
W_new |
A matrix or data frame of new target covariates. Must
contain columns matching |
K |
Integer number of basis functions. |
alpha |
Miscoverage level; interval has nominal coverage
|
n_folds |
Integer number of folds (>= 2). Default |
wrapper |
Optional reduction function (see |
grid_weights |
Optional length- |
dfspa_args, weight_model_args
|
Named lists passed to |
seed |
Optional integer seed for reproducible train/calibration splits. |
For each fold, the MetaHunt pipeline is fit on the out-of-fold studies,
and conformity scores are computed on the in-fold studies. After all
folds complete, the pooled scores yield a single
-quantile (or one per grid point). Point predictions for
W_new use a pipeline refit on the full dataset. The interval at grid
point g for target j is
(or the
scalar analogue when wrapper is supplied).
This differs from Vovk's original cross-conformal predictor for classification. For regression, pooling scores across folds is a common practical extension of split conformal and reduces the variance due to the single calibration split. Exact finite-sample coverage is not guaranteed; see Barber et al. (2021, Jackknife+) for more conservative alternatives.
An object of class "metahunt_conformal" (see
split_conformal() for fields). method is "cross" and n_cal is
the number of pooled scores.
set.seed(1) G <- 30; m <- 60; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.8 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1)) res <- cross_conformal(F_hat, W, W_new, K = 3, n_folds = 4, dfspa_args = list(denoise = FALSE), seed = 1) resset.seed(1) G <- 30; m <- 60; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.8 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1)) res <- cross_conformal(F_hat, W, W_new, K = 3, n_folds = 4, dfspa_args = list(denoise = FALSE), seed = 1) res
For each candidate K, perform k-fold cross-validation at the study
level. Within each fold, the full MetaHunt pipeline (d-fSPA + constrained
projection + weight model) is refit on the training studies, and the
held-out studies' functions are predicted from their study-level
covariates. The prediction error is
averaged over
held-out studies and then over folds.
cv_error_curve( F_hat, W, K_range = NULL, n_folds = 5L, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )cv_error_curve( F_hat, W, K_range = NULL, n_folds = 5L, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )
F_hat |
An |
W |
An |
K_range |
Integer vector of candidate |
n_folds |
Integer number of CV folds (default |
grid_weights |
Optional length- |
dfspa_args |
Named list of extra arguments for |
weight_model_args |
Named list of extra arguments for
|
seed |
Optional integer seed for reproducible fold assignment; if
|
This is the supervised rank-selection criterion of Section 3.2 of the paper. Each held-out study is excluded from both basis hunting and weight-model fitting.
A data frame with columns K, cv_error (mean over folds),
and cv_se (standard error across folds). The per-fold error matrix
is attached as the attribute "fold_errors"
(length(K_range)-by-n_folds). Folds where the pipeline fails
contribute NA and are summarised in a single warning.
set.seed(1) G <- 40; m <- 80; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.02), m, G) cv <- cv_error_curve(F_hat, W, K_range = 2:5, n_folds = 4, seed = 1) cvset.seed(1) G <- 40; m <- 80; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.02), m, G) cv <- cv_error_curve(F_hat, W, K_range = 2:5, n_folds = 4, seed = 1) cv
Recovers a set of K latent basis functions from a collection of
study-level function estimates under the low-rank cross-study heterogeneity
assumption of Shi, Imai, and Zhang. Implements Algorithm 1 of the paper
("The d-fSPA Algorithm for basis hunting").
dfspa(F_hat, K, grid_weights = NULL, N = NULL, Delta = NULL, denoise = TRUE)dfspa(F_hat, K, grid_weights = NULL, N = NULL, Delta = NULL, denoise = TRUE)
F_hat |
An |
K |
Integer number of basis functions to recover. Must satisfy
|
grid_weights |
Optional length- |
N, Delta
|
Optional numeric tuning parameters controlling denoising. See Details. |
denoise |
Logical; if |
Each study-level function is represented by its evaluations on a shared
grid of G points. The (weighted) inner product is
, where the
grid_weights w_j are proportional to the measure . If not
supplied, uniform weights 1 / G are used.
Denoising follows Jin (2024): for each study i, let
. If , study i is discarded;
otherwise is replaced by the average of the functions
in .
After denoising, the functional SPA step iteratively selects, at each of
the K iterations, the remaining function with the largest norm after
projecting out the span of previously selected bases.
Default tuning parameters follow the heuristics of the paper:
N = 0.5 * log(m) and
.
An object of class "dfspa": a list containing
basesA K-by-G matrix whose rows are the recovered basis
functions evaluated on the grid (denoised, if applicable).
selectedLength-K integer vector of the selected row indices
into the post-denoising function matrix.
original_indicesLength-K integer vector of the selected
study indices in the original input F_hat (before any rows were
dropped by denoising).
keptInteger vector of row indices of F_hat that survived
denoising.
F_denoisedThe post-denoising function matrix
(length(kept)-by-G).
grid_weightsGrid weights used.
N, Delta
Tuning parameters actually used (or NA when
denoise = FALSE).
KNumber of bases requested.
callThe matched call.
set.seed(1) G <- 50 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) # 3 true bases pi_mat <- rbind(diag(3), # 3 pure studies c(0.5, 0.3, 0.2), c(0.2, 0.5, 0.3), c(0.3, 0.3, 0.4)) F_hat <- pi_mat %*% basis # m = 6, G = 50 fit <- dfspa(F_hat, K = 3, denoise = FALSE) fit$original_indices # should be a permutation of 1, 2, 3set.seed(1) G <- 50 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) # 3 true bases pi_mat <- rbind(diag(3), # 3 pure studies c(0.5, 0.3, 0.2), c(0.2, 0.5, 0.3), c(0.3, 0.3, 0.4)) F_hat <- pi_mat %*% basis # m = 6, G = 50 fit <- dfspa(F_hat, K = 3, denoise = FALSE) fit$original_indices # should be a permutation of 1, 2, 3
F_hat matrix from a list of fitted study-level modelsMany users arrive at MetaHunt with one fitted model per study (e.g. a
ranger::ranger() random forest or a grf::causal_forest()) and a
chosen evaluation grid. f_hat_from_models() evaluates each model on
the shared grid and stacks the predictions into the m-by-G_grid
matrix the rest of the package expects.
f_hat_from_models(models, grid, predict_fn = NULL)f_hat_from_models(models, grid, predict_fn = NULL)
models |
A non-empty list of fitted model objects, one per study. |
grid |
A data frame (or matrix) of grid points at which to evaluate each model. Same columns as the data each model was fitted on. |
predict_fn |
Optional |
By default the function dispatches on the model class:
ranger objects are evaluated as predict(model, data = grid)$predictions.
Objects inheriting from causal_forest or grf are evaluated as
predict(model, newdata = grid)$predictions.
All other classes fall through to
as.numeric(predict(model, newdata = grid)), which works for lm,
glm, randomForest, and most other R model objects.
Override the dispatch with predict_fn = function(model, grid) ... if
your models need bespoke handling. The function must return a length-G_grid
numeric vector for each model.
All rows of the returned matrix must have the same length (G_grid)
and contain no NA values; the function errors otherwise.
An length(models)-by-nrow(grid) numeric matrix; row i is
model i evaluated at every row of grid.
build_grid() to construct grid from a reference dataset.
# Toy example: each "centre" fits a polynomial regression set.seed(1) make_centre_data <- function(slope) { x <- runif(60) data.frame(x = x, y = slope * x + rnorm(60, sd = 0.1)) } models <- lapply(c(-1, 0, 1, 0.5, -0.5), function(s) stats::lm(y ~ poly(x, 2), data = make_centre_data(s))) grid <- data.frame(x = seq(0, 1, length.out = 30)) F_hat <- f_hat_from_models(models, grid) dim(F_hat) # 5 x 30# Toy example: each "centre" fits a polynomial regression set.seed(1) make_centre_data <- function(slope) { x <- runif(60) data.frame(x = x, y = slope * x + rnorm(60, sd = 0.1)) } models <- lapply(c(-1, 0, 1, 0.5, -0.5), function(s) stats::lm(y ~ poly(x, 2), data = make_centre_data(s))) grid <- data.frame(x = seq(0, 1, length.out = 30)) F_hat <- f_hat_from_models(models, grid) dim(F_hat) # 5 x 30
Given a matrix of simplex-valued weights
(e.g. from project_to_simplex()) and associated study-level covariates
, fit a model
.
The default method is Dirichlet regression via the DirichletReg package.
fit_weight_model( pi_hat, W, method = c("dirichlet"), boundary_eps = 1e-04, formula = NULL, ... )fit_weight_model( pi_hat, W, method = c("dirichlet"), boundary_eps = 1e-04, formula = NULL, ... )
pi_hat |
An |
W |
An |
method |
Weight-model method. Currently only |
boundary_eps |
Small positive scalar used to shrink weights away
from the simplex boundary before Dirichlet fitting. Defaults to |
formula |
Optional RHS-only formula (e.g. |
... |
Passed through to |
Dirichlet regression cannot handle weights exactly at the simplex boundary
(0 or 1), which frequently arise after constrained projection. Before
fitting, rows of pi_hat are shrunk toward the barycenter via
, with
set by boundary_eps.
An object of class "metahunt_weight_model": a list with the
fitted model, formula, method, K, and training covariate names.
set.seed(1) m <- 80; K <- 3; p <- 2 W <- matrix(rnorm(m * p), m, p); colnames(W) <- c("w1", "w2") # generate simplex weights driven by W eta <- cbind(0.5 * W[, 1], -0.3 * W[, 2], rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) predict(model, newdata = matrix(c(0, 0), 1, 2, dimnames = list(NULL, c("w1","w2"))))set.seed(1) m <- 80; K <- 3; p <- 2 W <- matrix(rnorm(m * p), m, p); colnames(W) <- c("w1", "w2") # generate simplex weights driven by W eta <- cbind(0.5 * W[, 1], -0.3 * W[, 2], rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) predict(model, newdata = matrix(c(0, 0), 1, 2, dimnames = list(NULL, c("w1","w2"))))
End-to-end convenience wrapper that runs the three training-time steps
of the MetaHunt pipeline in sequence:
(1) dfspa() for basis hunting,
(2) project_to_simplex() for per-study weight recovery, and
(3) fit_weight_model() for modelling the weight-to-covariate map.
The result supports predict.metahunt() for generating target-function
predictions on new study-level covariates.
metahunt( F_hat, W, K, grid_weights = NULL, dfspa_args = list(), weight_model_args = list() )metahunt( F_hat, W, K, grid_weights = NULL, dfspa_args = list(), weight_model_args = list() )
F_hat |
An |
W |
An |
K |
Integer number of basis functions. |
grid_weights |
Optional length- |
dfspa_args |
Named list of extra arguments for |
weight_model_args |
Named list of extra arguments for
|
For uncertainty quantification, pair a "metahunt" fit with
conformal_from_fit() (requires a separate calibration set). The
high-level split_conformal() and cross_conformal() functions
perform their own fitting and do not consume a pre-fit "metahunt"
object.
An object of class "metahunt": a list with the dfspa_fit,
weight_model, training pi_hat, K, and a stored copy of
grid_weights.
predict.metahunt(), split_conformal(),
cross_conformal(), conformal_from_fit(),
reconstruction_error_curve(), cv_error_curve().
set.seed(1) G <- 40; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3) fit f_pred <- predict(fit, newdata = W[1:3, ]) dim(f_pred) # 3 x G: predicted functions predict(fit, newdata = W[1:3, ], wrapper = mean) # scalar summariesset.seed(1) G <- 40; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3) fit f_pred <- predict(fit, newdata = W[1:3, ]) dim(f_pred) # 3 x G: predicted functions predict(fit, newdata = W[1:3, ], wrapper = mean) # scalar summaries
Implements the minimax-regret estimator of Zhang, Huang, and Imai
(arXiv:2412.11136) for
aggregating site-level function estimates. Given study-level functions
, the estimator is
yielding the predicted target function
.
Unlike metahunt(), this method does not use study-level covariates;
the target is the worst-case-regret aggregator over the convex hull of
source functions.
minmax_regret(F_hat, grid_weights = NULL, ridge = 1e-10, wrapper = NULL)minmax_regret(F_hat, grid_weights = NULL, ridge = 1e-10, wrapper = NULL)
F_hat |
An |
grid_weights |
Optional length- |
ridge |
Non-negative scalar; replaces |
wrapper |
Optional reduction function (see |
The simplex-constrained QP is solved with quadprog::solve.QP().
A small ridge is added to for numerical stability when
source functions are highly collinear. The resulting q is clipped to
be non-negative and renormalised to sum to 1 to absorb floating-point
drift.
An object of class "minmax_regret": a list with
predictionPredicted target. Length-G_grid vector when
wrapper = NULL; scalar otherwise.
qLength-m simplex weights from the minimax-regret QP.
GammaThe m-by-m Gram matrix used (post-ridge).
dLength-m linear coefficient vector.
grid_weightsGrid weights used.
ridgeRidge value used.
wrapperWrapper function or NULL.
Zhang, Y., Huang, M., and Imai, K. (2024). Minimax regret estimation for generalizing heterogeneous treatment effects with multisite data. arXiv:2412.11136.
set.seed(1) G <- 30; m <- 6 x <- seq(0, 1, length.out = G) F_hat <- rbind( sin(pi * x), cos(pi * x), x, 0.5 * sin(pi * x) + 0.5 * x, 0.3 * cos(pi * x) + 0.7 * x, 0.4 * sin(pi * x) + 0.4 * cos(pi * x) + 0.2 * x ) fit <- minmax_regret(F_hat) fit$q # simplex weights over sources length(fit$prediction) # G: predicted target function # ATE-style scalar via wrapper minmax_regret(F_hat, wrapper = mean)$predictionset.seed(1) G <- 30; m <- 6 x <- seq(0, 1, length.out = G) F_hat <- rbind( sin(pi * x), cos(pi * x), x, 0.5 * sin(pi * x) + 0.5 * x, 0.3 * cos(pi * x) + 0.7 * x, 0.4 * sin(pi * x) + 0.4 * cos(pi * x) + 0.2 * x ) fit <- minmax_regret(F_hat) fit$q # simplex weights over sources length(fit$prediction) # G: predicted target function # ATE-style scalar via wrapper minmax_regret(F_hat, wrapper = mean)$prediction
Plot recovered basis functions from a MetaHunt fit
## S3 method for class 'metahunt' plot(x, x_axis = NULL, ...)## S3 method for class 'metahunt' plot(x, x_axis = NULL, ...)
x |
A |
x_axis |
Optional numeric vector of length |
... |
Passed to |
Invisibly returns x.
set.seed(1) G <- 25; m <- 40 x_grid <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x_grid), cos(pi * x_grid), x_grid) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) plot(fit) plot(fit, x_axis = x_grid)set.seed(1) G <- 25; m <- 40 x_grid <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x_grid), cos(pi * x_grid), x_grid) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) plot(fit) plot(fit, x_axis = x_grid)
For pointwise objects (no wrapper was used), draws the predicted
function for one target study together with its pointwise band.
For scalar objects, draws point predictions with whisker error bars
over all targets.
## S3 method for class 'metahunt_conformal' plot( x, target_idx = 1L, x_axis = NULL, fill = grDevices::adjustcolor("steelblue", 0.2), line_col = "steelblue", ... )## S3 method for class 'metahunt_conformal' plot( x, target_idx = 1L, x_axis = NULL, fill = grDevices::adjustcolor("steelblue", 0.2), line_col = "steelblue", ... )
x |
A |
target_idx |
For pointwise objects, the integer index of the
target study to plot (default |
x_axis |
Optional numeric vector of length |
fill |
Polygon fill colour for the band. Default semi-transparent steel blue. |
line_col |
Line colour for the predicted function (or points in scalar mode). Default steel blue. |
... |
Additional graphical parameters passed to the underlying plotting calls. |
Invisibly returns x.
Given a fitted d-fSPA basis decomposition and a fitted weight model, compute the predicted target function on the shared grid as
predict_target(dfspa_fit, weight_model, W_new)predict_target(dfspa_fit, weight_model, W_new)
dfspa_fit |
A |
weight_model |
A |
W_new |
A matrix or data frame of study-level covariates for the new
target studies, with columns matching those used to fit |
An nrow(W_new)-by-G_grid numeric matrix; row j is the
predicted target function on the grid for the j-th new study.
apply_wrapper() to reduce predicted functions to scalars.
set.seed(1) G <- 40; m <- 60; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) # generate study-level covariates and softmax weights W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) beta <- cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) eta <- as.matrix(W) %*% beta pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.02), m, G) fit <- dfspa(F_hat, K = K_true) pi_hat <- project_to_simplex(F_hat, fit$bases) wm <- fit_weight_model(pi_hat, W) W_new <- data.frame(w1 = c(0, 1), w2 = c(0, -1)) f_pred <- predict_target(fit, wm, W_new) dim(f_pred) # 2 x Gset.seed(1) G <- 40; m <- 60; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) # generate study-level covariates and softmax weights W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) beta <- cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) eta <- as.matrix(W) %*% beta pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.02), m, G) fit <- dfspa(F_hat, K = K_true) pi_hat <- project_to_simplex(F_hat, fit$bases) wm <- fit_weight_model(pi_hat, W) W_new <- data.frame(w1 = c(0, 1), w2 = c(0, -1)) f_pred <- predict_target(fit, wm, W_new) dim(f_pred) # 2 x G
Predict target functions (or scalar summaries) from a MetaHunt fit
## S3 method for class 'metahunt' predict(object, newdata, wrapper = NULL, grid_weights = NULL, ...)## S3 method for class 'metahunt' predict(object, newdata, wrapper = NULL, grid_weights = NULL, ...)
object |
A |
newdata |
A matrix or data frame of new study-level covariates. |
wrapper |
Optional reduction function. If |
grid_weights |
Optional length- |
... |
Ignored. |
Either an nrow(newdata)-by-G_grid matrix of predicted
functions (when wrapper = NULL) or a length-nrow(newdata) numeric
vector of scalar summaries.
set.seed(1) G <- 25; m <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) f_pred <- predict(fit, newdata = W[1:3, ]) dim(f_pred) # 3 x G predict(fit, newdata = W[1:3, ], wrapper = mean) # scalar summariesset.seed(1) G <- 25; m <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) f_pred <- predict(fit, newdata = W[1:3, ]) dim(f_pred) # 3 x G predict(fit, newdata = W[1:3, ], wrapper = mean) # scalar summaries
Predict simplex weights for new study-level covariates
## S3 method for class 'metahunt_weight_model' predict(object, newdata, ...)## S3 method for class 'metahunt_weight_model' predict(object, newdata, ...)
object |
A fitted |
newdata |
A matrix or data frame of new study-level covariates with the same columns used at fitting. |
... |
Ignored. |
An nrow(newdata)-by-K numeric matrix of predicted simplex
weights (component means); rows sum to 1.
set.seed(1) m <- 40; K <- 3 W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- cbind(0.5 * W$w1, -0.3 * W$w2, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) predict(model, newdata = data.frame(w1 = c(0, 1), w2 = c(0, -1)))set.seed(1) m <- 40; K <- 3 W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- cbind(0.5 * W$w1, -0.3 * W$w2, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) pi_hat <- pi_true + matrix(rnorm(m * K, sd = 0.01), m, K) pi_hat <- pmax(pi_hat, 0); pi_hat <- pi_hat / rowSums(pi_hat) model <- fit_weight_model(pi_hat, W) predict(model, newdata = data.frame(w1 = c(0, 1), w2 = c(0, -1)))
Print method for d-fSPA denoising parameter search results
## S3 method for class 'metahunt_denoising_search' print(x, ...)## S3 method for class 'metahunt_denoising_search' print(x, ...)
x |
An object of class |
... |
Unused; present for S3 generic compatibility. |
Invisibly returns x.
summary.metahunt objectPrint a summary.metahunt object
## S3 method for class 'summary.metahunt' print(x, ...)## S3 method for class 'summary.metahunt' print(x, ...)
x |
A |
... |
Ignored. |
Invisibly returns x.
set.seed(1) G <- 25; m <- 40 x_grid <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x_grid), cos(pi * x_grid), x_grid) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) print(summary(fit))set.seed(1) G <- 25; m <- 40 x_grid <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x_grid), cos(pi * x_grid), x_grid) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) print(summary(fit))
For each study i, solves the constrained projection
where the norm is the weighted norm defined by grid_weights.
This is Equation (3) of the paper and yields the study-specific weights
hat pi_i used downstream for weight-model fitting and prediction.
project_to_simplex(F_hat, bases, grid_weights = NULL, ridge = 1e-10)project_to_simplex(F_hat, bases, grid_weights = NULL, ridge = 1e-10)
F_hat |
An |
bases |
A |
grid_weights |
Optional length- |
ridge |
Small non-negative scalar added to the diagonal of the QP
Hessian for numerical stability. Defaults to |
The projection reduces to the quadratic program
with and , where
is the K-by-G_grid basis matrix, grid_weights,
and is the i-th row of F_hat. Solved via
quadprog::solve.QP(). A tiny ridge is added to D for numerical stability.
An m-by-K numeric matrix of simplex weights; rows sum to 1
and entries are non-negative.
set.seed(1) G <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) true_pi <- rbind(diag(3), c(0.4, 0.3, 0.3), c(0.1, 0.7, 0.2)) F_hat <- true_pi %*% basis fit <- dfspa(F_hat, K = 3, denoise = FALSE) pi_hat <- project_to_simplex(F_hat, fit$bases) round(pi_hat, 3)set.seed(1) G <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) true_pi <- rbind(diag(3), c(0.4, 0.3, 0.3), c(0.1, 0.7, 0.2)) F_hat <- true_pi %*% basis fit <- dfspa(F_hat, K = 3, denoise = FALSE) pi_hat <- project_to_simplex(F_hat, fit$bases) round(pi_hat, 3)
For each candidate number of bases K, run dfspa() followed by
project_to_simplex() and report the average projection residual
Plotting error against K typically shows an elbow.
reconstruction_error_curve( F_hat, K_range = NULL, grid_weights = NULL, dfspa_args = list() )reconstruction_error_curve( F_hat, K_range = NULL, grid_weights = NULL, dfspa_args = list() )
F_hat |
An |
K_range |
Integer vector of candidate |
grid_weights |
Optional length- |
dfspa_args |
Named list of extra arguments passed to |
This is the unsupervised rank-selection criterion of Section 3.2 of the
paper (Equation for ). It does not require
study-level covariates.
A data frame with columns K (integer) and error (numeric).
Rows where dfspa() or the projection fails are reported with
error = NA and a single warning summarising the failures.
set.seed(1) G <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) m <- 50 pi_mat <- matrix(stats::rgamma(m * 3, shape = 0.5), m, 3) pi_mat <- pi_mat / rowSums(pi_mat) F_hat <- pi_mat %*% basis + matrix(stats::rnorm(m * G, sd = 0.02), m, G) elbow <- reconstruction_error_curve(F_hat, K_range = 2:6) elbowset.seed(1) G <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) m <- 50 pi_mat <- matrix(stats::rgamma(m * 3, shape = 0.5), m, 3) pi_mat <- pi_mat / rowSums(pi_mat) F_hat <- pi_mat %*% basis + matrix(stats::rnorm(m * G, sd = 0.02), m, G) elbow <- reconstruction_error_curve(F_hat, K_range = 2:6) elbow
At a fixed K, performs k-fold cross-validation over a grid of denoising
parameter pairs (N, Delta) for dfspa(). For each candidate pair and
each fold, the full MetaHunt pipeline is fit on the out-of-fold studies
and predicts the held-out studies' functions. The pair with the lowest
average prediction error is selected.
select_denoising_params( F_hat, W, K, N_grid = NULL, Delta_grid = NULL, n_folds = 5L, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )select_denoising_params( F_hat, W, K, N_grid = NULL, Delta_grid = NULL, n_folds = 5L, grid_weights = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )
F_hat |
An |
W |
An |
K |
Integer number of basis functions (fixed during this search). |
N_grid |
Optional numeric vector of candidate |
Delta_grid |
Optional numeric vector of candidate |
n_folds |
Integer number of folds (default |
grid_weights |
Optional length- |
dfspa_args |
Named list of additional arguments for |
weight_model_args |
Named list of additional arguments for
|
seed |
Optional integer seed for reproducible fold assignment. |
This is the cross-validated tuning of the denoising parameters discussed
in Section 3.1 of the paper. Joint tuning over (K, N, Delta) is not
supported because it scales poorly; if you also want to choose K, do
it first via cv_error_curve() and then call this function at the
selected K.
Default candidate grids:
N_grid = m * c(NA, NA, NA) resolved at runtime to
c(0.2, 0.5, 1.0) * log(m).
Delta_grid = max_pairwise_dist * c(0.05, 0.10, 0.20, 0.30).
Pass your own N_grid / Delta_grid (in original units) to override.
An object of class metahunt_denoising_search: a list with
gridA data frame with one row per (N, Delta) pair,
columns N, Delta, cv_error, cv_se, n_folds_ok.
bestA list with the (N, Delta) minimising cv_error.
K, n_folds, grid_weights
Inputs echoed back for traceability.
cv_error_curve() for selecting K, dfspa() for the
underlying basis-hunting algorithm.
set.seed(1) G <- 30; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.5), c(-0.4, 1), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) tune <- select_denoising_params(F_hat, W, K = 3, n_folds = 4, seed = 1) tune$grid tune$bestset.seed(1) G <- 30; m <- 80 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.5), c(-0.4, 1), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) tune <- select_denoising_params(F_hat, W, K = 3, n_folds = 4, seed = 1) tune$grid tune$best
Implements Algorithm 2 of the paper (split conformal prediction) over the MetaHunt pipeline. Studies are partitioned into training and calibration sets. The training set is used to fit d-fSPA, the constrained projection, and the weight model; the calibration set supplies conformity scores, which determine the width of the intervals.
split_conformal( F_hat, W, W_new, K, alpha = 0.05, cal_frac = 0.3, wrapper = NULL, grid_weights = NULL, calibration_idx = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )split_conformal( F_hat, W, W_new, K, alpha = 0.05, cal_frac = 0.3, wrapper = NULL, grid_weights = NULL, calibration_idx = NULL, dfspa_args = list(), weight_model_args = list(), seed = NULL )
F_hat |
An |
W |
An |
W_new |
A matrix or data frame of new target covariates. Must
contain columns matching |
K |
Integer number of basis functions. |
alpha |
Miscoverage level; interval has nominal coverage
|
cal_frac |
Numeric in |
wrapper |
Optional reduction function (see |
grid_weights |
Optional length- |
calibration_idx |
Optional integer vector of row indices in |
dfspa_args, weight_model_args
|
Named lists passed to |
seed |
Optional integer seed for reproducible train/calibration splits. |
Given a target function, one can either construct intervals pointwise
at every grid point (when wrapper = NULL) or for a scalar summary
of the target function (when wrapper is a function).
Pointwise (wrapper = NULL): for each grid point g the
conformity score is across calibration studies i. A separate
-quantile is computed per grid point, and
the interval at grid point g for target j is
.
Scalar (wrapper supplied): conformity scores are
with
s = wrapper, and the interval for each target is
with a single
shared quantile .
The finite-sample quantile is
with ;
if , q = Inf and intervals are
.
An object of class "metahunt_conformal": a list with
predictionPoint predictions for W_new. A numeric vector
of length nrow(W_new) in the scalar case, or an
nrow(W_new)-by-G_grid matrix in the pointwise case.
lower, upper
Interval endpoints, same shape as
prediction.
alphaMiscoverage level used.
method"split".
n_calCalibration sample size.
quantileThe conformal quantile: a scalar (scalar case)
or a length-G_grid vector (pointwise case).
wrapperThe wrapper used, or NULL.
set.seed(1) G <- 40; m <- 80; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1), w2 = c(0, -1)) # pointwise intervals at every grid point pi_grid <- split_conformal(F_hat, W, W_new, K = 3, seed = 1) dim(pi_grid$lower) # 2 x 40 # scalar intervals for the grid-weighted mean (ATE-style) pi_ate <- split_conformal(F_hat, W, W_new, K = 3, wrapper = mean, seed = 1) pi_ate$predictionset.seed(1) G <- 40; m <- 80; K_true <- 3 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1), w2 = c(0, -1)) # pointwise intervals at every grid point pi_grid <- split_conformal(F_hat, W, W_new, K = 3, seed = 1) dim(pi_grid$lower) # 2 x 40 # scalar intervals for the grid-weighted mean (ATE-style) pi_ate <- split_conformal(F_hat, W, W_new, K = 3, wrapper = mean, seed = 1) pi_ate$prediction
Produces a compact summary of a "metahunt" object, including study/grid
sizes, the weight-model method, per-basis summary statistics for the
training simplex weights pi_hat, and denoising bookkeeping from the
underlying dfspa() fit.
## S3 method for class 'metahunt' summary(object, ...)## S3 method for class 'metahunt' summary(object, ...)
object |
A |
... |
Ignored. |
An object of class "summary.metahunt": a list with components
mNumber of studies.
G_gridGrid size.
KNumber of basis functions.
weight_methodMethod used by the weight model.
predictor_namesCharacter vector of covariate names.
pi_summaryA K-by-5 numeric matrix; each row gives min,
mean, median, max, and sd of the corresponding column of
object$pi_hat.
n_keptNumber of studies retained after denoising.
n_droppedNumber of studies dropped (m - n_kept).
denoisingList with N and Delta from the dfspa fit
(both NA when denoise = FALSE).
set.seed(1) G <- 25; m <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) summary(fit)set.seed(1) G <- 25; m <- 40 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m), w2 = rnorm(m)) eta <- as.matrix(W) %*% cbind(c(1, -0.8), c(-0.5, 1.2), c(0, 0)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) fit <- metahunt(F_hat, W, K = 3, dfspa_args = list(denoise = FALSE)) summary(fit)
Produces a small list of descriptive statistics about a
"metahunt_conformal" object: interval widths, quantile summaries, and
calibration diagnostics. Returns an object of class
"summary.metahunt_conformal" with a matching print method.
## S3 method for class 'metahunt_conformal' summary(object, ...)## S3 method for class 'metahunt_conformal' summary(object, ...)
object |
A |
... |
Unused; present for S3 generic consistency. |
A list of class "summary.metahunt_conformal". In pointwise mode
(no wrapper) the list contains n_targets, G_grid, n_cal, alpha,
method, mean_interval_width, frac_finite_quantile,
quantile_summary, and wrapper. In scalar mode (wrapper supplied)
the list contains n_targets, n_cal, alpha, method,
mean_interval_width, quantile, quantile_finite, and wrapper.
set.seed(1) G <- 25; m <- 60 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.6 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1)) res <- split_conformal(F_hat, W, W_new, K = 3, dfspa_args = list(denoise = FALSE), seed = 1) summary(res)set.seed(1) G <- 25; m <- 60 x <- seq(0, 1, length.out = G) basis <- rbind(sin(pi * x), cos(pi * x), x) W <- data.frame(w1 = rnorm(m)) eta <- cbind(0.6 * W$w1, -0.3 * W$w1, rep(0, m)) pi_true <- exp(eta) / rowSums(exp(eta)) F_hat <- pi_true %*% basis + matrix(rnorm(m * G, sd = 0.05), m, G) W_new <- data.frame(w1 = c(0, 1)) res <- split_conformal(F_hat, W, W_new, K = 3, dfspa_args = list(denoise = FALSE), seed = 1) summary(res)