| Title: | Partial Transfer Learning for Causal Estimation |
|---|---|
| Description: | Implements partial transfer learning (PTL) for causal effect estimation using source and target data, with bootstrap-based source detection. Provides data generating processes and nuisance functions for simulation. |
| Authors: | Xinhao Qu [aut, cre] |
| Maintainer: | Xinhao Qu <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.0 |
| Built: | 2026-05-23 09:34:15 UTC |
| Source: | https://github.com/cran/PartialTL |
Implements partial transfer learning (PTL) and heterogeneous partial transfer learning (HPTL) for causal effect estimation using source and target data. Uses double machine learning for nuisance estimation and cross-fitting. Provides bootstrap-based source detection to identify which sources are transferable to the target. Includes data generating processes and nuisance functions for simulation.
Main functions:
fit_PTL: PTL causal estimate (single source).
fit_HPTL: Heterogeneous PTL with multiple sources and
covariate modules.
boot_detection: Bootstrap-based source detection.
run_bootstrap_E_hat: Bootstrap for nuisance function
estimation.
DGP: Data generating process for simulations.
RMSE: Root mean squared error for evaluation.
f_0, g_0, f_k, g_k:
Nuisance functions for target and source models.
Run demo(package = "PartialTL") for demo scripts.
Author [email protected]
(Add references to the partial transfer learning and DML literature.)
Identifies which sources are transferable to the target by comparing bootstrap estimates of the nuisance function on the target and each source. A source is detected as transferable if the difference is below a threshold proportional to the combined standard error.
boot_detection(D_t, X_t, Y_t, D_s_all, X_s_all, Y_s_all, source_sizes, B, ml_f, ml_g)boot_detection(D_t, X_t, Y_t, D_s_all, X_s_all, Y_s_all, source_sizes, B, ml_f, ml_g)
D_t |
Target treatment; |
X_t |
Target design matrix; |
Y_t |
Target outcome; |
D_s_all |
Source treatments concatenated by row; rows split by |
X_s_all |
Source design matrices concatenated by row. |
Y_s_all |
Source outcomes concatenated by row. |
source_sizes |
Integer vector of length K: sample size of each source. |
B |
Number of bootstrap replications. |
ml_f |
Outcome learner for DoubleML. |
ml_g |
Treatment learner for DoubleML. |
A data frame with columns Source (label) and detected.source
(list of logical vectors of length B per source).
run_bootstrap_E_hat, fit_PTL, fit_HPTL
Generates for simulations: multivariate normal X with
AR(1)-type covariance, treatment D from a linear confounding model plus
noise, and outcome Y = rho*D + f(X) + error.
DGP(n, q, p, p.nonzero, rho, Beta, Gamma, mu = rep(10, p), sigma = 0.5, f_func = f_0, g_func = g_0, seed = NULL)DGP(n, q, p, p.nonzero, rho, Beta, Gamma, mu = rep(10, p), sigma = 0.5, f_func = f_0, g_func = g_0, seed = NULL)
n |
Sample size. |
q |
Causal dimension (kept for interface; unused in current body). |
p |
Nuisance dimension (number of covariates). |
p.nonzero |
Number of non-zero coefficients for Beta and Gamma. |
rho |
Causal effect (scalar). |
Beta |
Nuisance coefficient vector for outcome; |
Gamma |
Nuisance coefficient vector for treatment; |
mu |
Mean vector of X; length |
sigma |
Base for AR(1)-type covariance: |
f_func |
Function |
g_func |
Function |
seed |
Optional random seed. |
A list with components D (treatment), X (design matrix),
Y (outcome), and data (cbind(Y, D, X)).
f_0, g_0, f_k, g_k,
fit_PTL, RMSE
set.seed(1) n <- 50 p <- 10 p.nz <- 3 Beta <- matrix(c(rep(0.5, p.nz), rep(0, p - p.nz))) Gamma <- matrix(c(rep(0.3, p.nz), rep(0, p - p.nz))) dat <- DGP(n, q = 1, p, p.nz, rho = 0.5, Beta, Gamma, seed = 1) str(dat)set.seed(1) n <- 50 p <- 10 p.nz <- 3 Beta <- matrix(c(rep(0.5, p.nz), rep(0, p - p.nz))) Gamma <- matrix(c(rep(0.3, p.nz), rep(0, p - p.nz))) dat <- DGP(n, q = 1, p, p.nz, rho = 0.5, Beta, Gamma, seed = 1) str(dat)
Linear outcome nuisance function for the target model: .
f_0(X, Beta)f_0(X, Beta)
X |
Design matrix; |
Beta |
Coefficient vector (column matrix); |
Numeric vector of outcome nuisance values; length n.
Linear outcome nuisance function for source models: .
Same form as f_0; used in DGP for source data.
f_k(X, Beta)f_k(X, Beta)
X |
Design matrix; |
Beta |
Coefficient vector (column matrix); |
Numeric vector of outcome nuisance values; length n.
Fits heterogeneous partial transfer learning with multiple sources and covariate modules. Each source has its own module of covariates; PTL is applied per source and results are combined for the target causal estimate.
fit_HPTL(D_t, X_t, Y_t, D_s_all, X_s_all, Y_s_all, source_sizes, module_sizes, ml_f, ml_g, fold = 5)fit_HPTL(D_t, X_t, Y_t, D_s_all, X_s_all, Y_s_all, source_sizes, module_sizes, ml_f, ml_g, fold = 5)
D_t |
Target treatment; |
X_t |
Target design matrix; |
Y_t |
Target outcome; |
D_s_all |
Source treatments concatenated by row; dimension is (sum of |
X_s_all |
Source design matrices concatenated by row; rows split by |
Y_s_all |
Source outcomes concatenated by row; rows split by |
source_sizes |
Integer vector of length K: sample size of each source. Must sum to |
module_sizes |
Integer vector of length K: covariate module sizes. The k-th source uses columns |
ml_f |
Outcome learner for DoubleML. |
ml_g |
Treatment learner for DoubleML. |
fold |
Number of folds for cross-fitting (default 5). |
A list with component hat_rho_HPTL: HPTL causal estimate on target.
Fits partial transfer learning with cross-fitting and outcome reconstruction. Uses double machine learning on the source and SCAD on the outcome nuisance, then transfers to the target via the partial transfer formula.
fit_PTL(D_t, X_t, Y_t, D_s, X_s, Y_s, ml_f, ml_g, fold = 5L)fit_PTL(D_t, X_t, Y_t, D_s, X_s, Y_s, ml_f, ml_g, fold = 5L)
D_t |
Target treatment; |
X_t |
Target design matrix; |
Y_t |
Target outcome; |
D_s |
Source treatment; |
X_s |
Source design matrix; |
Y_s |
Source outcome; |
ml_f |
Outcome learner for DoubleML (e.g. |
ml_g |
Treatment learner for DoubleML (same type). |
fold |
Number of folds for cross-fitting (default 5). |
A list with components:
hat_rho_s |
Source causal estimate. |
beta_hat_s |
Source nuisance coefficient estimate. |
E_s |
Estimated |
hat_rho_PTL |
PTL causal estimate on target. |
Linear treatment/confounding function for the target model: .
g_0(X, Gamma)g_0(X, Gamma)
X |
Design matrix; |
Gamma |
Coefficient vector (column matrix); |
Numeric vector of treatment equation values; length n.
Linear treatment/confounding function for source models: .
Same form as g_0; used in DGP for source data.
g_k(X, Gamma)g_k(X, Gamma)
X |
Design matrix; |
Gamma |
Coefficient vector (column matrix); |
Numeric vector of treatment equation values; length n.
Computes the root mean squared error of parameter estimates across
simulations: for each component, .
RMSE(Theta, Theta.hat)RMSE(Theta, Theta.hat)
Theta |
True parameter vector; |
Theta.hat |
Matrix of estimates; |
Numeric vector of length p: RMSE per parameter component.
th <- c(1, 2) th_hat <- matrix(c(1.1, 2.2, 0.9, 1.8), nrow = 2) RMSE(th, th_hat)th <- c(1, 2) th_hat <- matrix(c(1.1, 2.2, 0.9, 1.8), nrow = 2) RMSE(th, th_hat)
Runs bootstrap replications to estimate the nuisance quantity
using double machine learning, and returns the
bootstrap distribution (mean and variance) of the estimate.
run_bootstrap_E_hat(D, X, Y, B, ml_f, ml_g)run_bootstrap_E_hat(D, X, Y, B, ml_f, ml_g)
D |
Treatment variable(s); matrix. |
X |
Design matrix. |
Y |
Outcome variable. |
B |
Number of bootstrap replications. |
ml_f |
Outcome learner for DoubleML. |
ml_g |
Treatment learner for DoubleML. |
A list with components:
E_hat |
Numeric vector of length B: bootstrap estimates. |
E_hat_mean |
Mean of |
E_hat_var |
Variance of |