---
title: "Introduction to HOIF: Higher-Order Influence Function Estimators for the ATE"
author: "Xingyu Chen"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Introduction to HOIF: Higher-Order Influence Function Estimators for the ATE}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
# Code chunks are not evaluated when this vignette is built: the default
# computation backend requires a Python runtime with u-stats / numpy /
# torch, which is not available (and should not be downloaded) on the
# machines that build this vignette.
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  eval = FALSE
)
```

# Introduction

The **HOIF** package implements Higher-Order Influence Function (HOIF) estimators for Average Treatment Effect (ATE) estimation in causal inference. This methodology extends the doubly robust (DR) estimator—also known as the Augmented Inverse Propensity Weighted (AIPW) or Double Machine Learning (DML) estimator—by systematically accounting for higher-order bias terms.

## Background

The HOIF methodology was developed through a series of seminal works by James M. Robins and collaborators:

- Robins et al. (2008): *Higher order influence functions and minimax estimation of nonlinear functionals*
- Robins et al. (2017): *Minimax estimation of a functional on a structured high-dimensional model*
- Liu et al. (2017): *Semiparametric efficient empirical higher order influence function estimators*
- Liu & Li (2023): *New √n-consistent, numerically stable higher-order influence function estimators*

The key idea is to represent the bias of the standard DR estimator as a sum of higher-order influence functions, which can be estimated and used to debias the original estimator.

The computation of the higher-order U-statistics that arise in HOIF estimators is delegated to the [`ustats`](https://github.com/cxy0714/U-Statistics-R) package; the algorithm and its computational complexity are analyzed in:

- Chen, Zhang & Liu (2025): *On computing and the complexity of computing higher-order U-statistics, exactly*

## Key Features

- **Flexible basis expansion**: Supports B-splines, Fourier basis, or raw covariates
- **Multiple inversion methods**: Direct, nonlinear shrinkage, or corpcor regularization
- **Sample splitting support**: Optional K-fold cross-fitting between the
  estimation of the inverse weighted Gram matrix and the computation of the
  U-statistics (eHOIF; see the Sample Splitting section)
- **Dual computational backends**: Fast Python backend via the `ustats` package, or a pure R fallback
- **Any order with the Python backend** (pure R mode supports orders up to 6)

# Mathematical Background

## The HOIF Framework

Let $\widehat{\psi}_a^{\mathrm{AIPW}}$ denote the standard first-order doubly
robust (AIPW/DML) estimator of $\psi_a = E[Y(a)]$, and
$\widehat{\mathrm{ATE}}^{\mathrm{AIPW}} = \widehat{\psi}_1^{\mathrm{AIPW}} -
\widehat{\psi}_0^{\mathrm{AIPW}}$. Conditional on the estimated nuisance
functions, this estimator is biased; HOIF estimates the *estimable part* of
that bias.

For each treatment arm $a \in \{0,1\}$, the $m$-th order HOIF
**bias-correction term** is

$$
\mathbb{HOIF}^a_m(\hat{\Omega}^a) = \sum_{j=2}^m \mathbb{IF}^a_j(\hat{\Omega}^a) = \sum_{i=2}^m \sum_{j=2}^{i} \binom{i-2}{i-j} \mathbb{U}^a_j(\hat{\Omega}^a),
$$

and the corresponding correction for the ATE is their difference:

$$
\mathbb{BC}^{\mathrm{ATE}}_m = \mathbb{HOIF}^1_m(\hat{\Omega}^1) - \mathbb{HOIF}^0_m(\hat{\Omega}^0).
$$

$\mathbb{BC}^{\mathrm{ATE}}_m$ is the $m$-th order HOIF estimator of the
estimable bias of the AIPW estimator of the ATE — it is **not** itself an
estimator of the ATE. The sign convention is such that *adding* it to the
AIPW estimate removes that bias:

$$
\widehat{\mathrm{ATE}}_m = \widehat{\mathrm{ATE}}^{\mathrm{AIPW}} + \mathbb{BC}^{\mathrm{ATE}}_m .
$$

In the output of `hoif_ate()`, the components `HOIF1`, `HOIF0` and `ATE`
correspond to $\mathbb{HOIF}^1_m$, $\mathbb{HOIF}^0_m$ and
$\mathbb{BC}^{\mathrm{ATE}}_m$ for $m = 2, \ldots$ (the component name `ATE`
is kept for backward compatibility — it holds the ATE *bias correction*, not
the ATE itself).

## U-Statistics Formulation

The core computational building block is the U-statistic:

$$
\mathbb{U}^a_m(\hat{\Omega}^a) = (-1)^m \frac{(n-m)!}{n!} \sum_{\substack{(i_1, \ldots, i_m) : \\ i_1 \neq \cdots \neq i_m}} r^a_{i_1} \prod_{s=1}^{m-1} \{Z_{i_s}^\top \hat{\Omega}^a s^a_{i_{s+1}} Z_{i_{s+1}}\} R^a_{i_m}
$$

where:

- $s^a_i = A_i^a (1-A_i)^{1-a}$ is the treatment indicator
- $r^a_i = 1 - s^a_i / \pi^a(X_i)$ is the inverse propensity weight residual
- $R^a_i = Y_i - \hat{\mu}(a, X_i)$ is the outcome residual
- $Z_i$ are the transformed covariates
- $\hat{\Omega}^a$ is the inverse weighted Gram matrix

The full algorithmic workflow, mathematical formulas, and all parameters are
documented in the PDF shipped with the package
(`system.file("extdoc", "HOIF.pdf", package = "HOIF")`, also available
[on GitHub](https://github.com/cxy0714/HOIF/blob/master/inst/extdoc/HOIF.pdf)).

## Sample Splitting

Conceptually, HOIF estimation involves **three** separate estimation tasks,
and ideally each would use its own, independent part of the data:

1. **The nuisance functions**: estimating $\hat{\mu}(1, X)$,
   $\hat{\mu}(0, X)$ and $\hat{\pi}(X)$;
2. **The inverse weighted Gram matrix** $\hat{\Omega}^a$;
3. **The higher-order U-statistics** built from $\hat{\Omega}^a$.

This package deliberately does **not** implement task 1: `hoif_ate()` only
takes the *predicted values* `mu1`, `mu0`, `pi` as inputs, and how (and on
which sample) the nuisance functions are estimated is entirely up to the
user. Accordingly, the full three-way cross-fitting is **not** performed by
the package; the `sample_split` argument only controls the split between
tasks 2 and 3:

- **`sample_split = TRUE` (eHOIF)**: the sample passed to `hoif_ate()` is
  split into `n_folds` folds $(I_1, \ldots, I_K)$. For each fold $j$,
  $\hat{\Omega}^a$ is estimated on the observations **not** in $I_j$, the
  U-statistics are computed on the observations **in** $I_j$, and the
  results are averaged across the folds.
- **`sample_split = FALSE` (sHOIF)**: $\hat{\Omega}^a$ and the U-statistics
  are computed on the same sample, without distinction.

The Quick Start example below follows this structure: the nuisance functions
are fitted on one half of the data, and `hoif_ate()` is run on the other
half.

# Installation

```{r}
# From CRAN (once accepted):
# install.packages("HOIF")

# Development version from GitHub (also installs the ustats R package):
# install.packages("devtools")
devtools::install_github("cxy0714/HOIF")
```

## Setting up the Python backend

The default backend computes the higher-order U-statistics in Python via the
`ustats` package. The Python dependencies (`u-stats`, `numpy`, `torch`) are
**provisioned automatically**: with `reticulate` (>= 1.41) installed, the
first call that needs Python (e.g. the first `hoif_ate()` call) downloads a
private Python together with the required packages into a cached environment
and reuses it in later sessions. No manual setup is needed in most cases:

```{r}
library(HOIF)
# First call provisions the Python environment automatically:
# results <- hoif_ate(...)
```

If you prefer a persistent, dedicated environment, or want to control how
PyTorch is installed (CPU-only build vs. CUDA), use the helpers from
`ustats`:

```{r}
ustats::setup_ustats()             # CPU-only PyTorch (small download)
ustats::setup_ustats(gpu = TRUE)   # default PyPI PyTorch (CUDA on Linux)
ustats::check_ustats_setup()       # verify the environment
```

You can also point `reticulate` to your own Python/conda environment (with
`u-stats` installed via `pip install u-stats`) **before** Python initializes:

```{r}
reticulate::use_condaenv("your_env_name", required = TRUE)
# or set the RETICULATE_PYTHON environment variable
```

See `vignette("ustats", package = "ustats")` for a complete installation
guide and troubleshooting tips.

## No Python at all?

Set `pure_R_code = TRUE` in `hoif_ate()`: the U-statistics are then computed
with a pure R implementation (orders up to 6) and no Python runtime is
required.

# Quick Start Example

Let's demonstrate the recommended workflow with a simulated example,
following the three-task structure described in the Sample Splitting section
above: the nuisance functions are fitted on one half of the data (the
*nuisance sample*), `hoif_ate()` is run on the other half (the *estimation
sample*), and within the estimation sample we compute both the eHOIF and the
sHOIF estimators.

To make the role of the HOIF correction visible, the nuisance models are
**deliberately misspecified** (they use only 3 of the 10 confounders), so
the first-order AIPW estimator carries a clear bias — which the HOIF terms
then estimate and remove.

## Generate Simulated Data

```{r}
library(HOIF)

set.seed(123)
n <- 2000
p <- 10

# Covariates (all of them are confounders)
X <- matrix(rnorm(n * p), ncol = p)

# True propensity score and outcome regressions: linear, loading on
# ALL covariates
beta_pi <- c(0.3, -0.2, 0.2, rep(0.25, 7))
beta1   <- c(0.5,  0.4, 0.3, rep(0.4, 7))
beta0   <- c(0.3,  0.2, 0.1, rep(0.3, 7))

true_pi <- plogis(as.vector(X %*% beta_pi))
A <- rbinom(n, 1, true_pi)

mu1_true <- as.vector(1 + X %*% beta1)
mu0_true <- as.vector(X %*% beta0)
Y <- A * mu1_true + (1 - A) * mu0_true + rnorm(n, 0, 0.2)

# True targets
psi1_true <- mean(mu1_true)
psi0_true <- mean(mu0_true)
true_ate  <- psi1_true - psi0_true
cat("True E[Y(1)]:", round(psi1_true, 4), "\n")
#> True E[Y(1)]: 0.9799
cat("True E[Y(0)]:", round(psi0_true, 4), "\n")
#> True E[Y(0)]: -0.016
cat("True ATE:    ", round(true_ate, 4), "\n")
#> True ATE:     0.9958
```

## Split the Sample

The nuisance sample is used **only** to fit the nuisance functions; the
estimation sample is used **only** by `hoif_ate()`. This keeps the nuisance
predictions independent of the sample on which the HOIF estimator is
computed.

```{r}
idx_nuis <- sample(n, n / 2)
idx_est  <- setdiff(seq_len(n), idx_nuis)

X_nuis <- X[idx_nuis, , drop = FALSE]
A_nuis <- A[idx_nuis]
Y_nuis <- Y[idx_nuis]

X_est <- X[idx_est, , drop = FALSE]
A_est <- A[idx_est]
Y_est <- Y[idx_est]
```

## Estimate (Misspecified) Nuisance Functions on the Nuisance Sample

The package **only consumes the predicted values** of the nuisance
functions; in practice, any flexible machine learning method can be used to
produce them. Here, to mimic misspecification, the working models use
**only the first 3 covariates** and ignore the remaining confounders:

```{r}
S <- 1:3  # the working models ignore covariates 4..10

# Outcome regressions, fitted on the nuisance sample only
fit_mu1 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 1)
fit_mu0 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 0)

# Propensity score, fitted on the nuisance sample only
fit_pi <- glm(A_nuis ~ X_nuis[, S], family = binomial)

# Predict all nuisance functions on the estimation sample
mu1_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu1))
mu0_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu0))
pi_hat  <- as.vector(plogis(cbind(1, X_est[, S]) %*% coef(fit_pi)))

# Ensure propensity scores are bounded away from 0 and 1
pi_hat <- pmax(pmin(pi_hat, 0.95), 0.05)
```

## The First-Order AIPW Estimator and its Bias

The HOIF terms estimate the **estimable bias** of the standard first-order
doubly robust (AIPW/DML) estimator, so let us compute that estimator first,
together with its actual error:

```{r}
psi1_aipw <- mean(mu1_hat + A_est / pi_hat * (Y_est - mu1_hat))
psi0_aipw <- mean(mu0_hat + (1 - A_est) / (1 - pi_hat) * (Y_est - mu0_hat))
ate_aipw  <- psi1_aipw - psi0_aipw

cat("AIPW estimates: E[Y(1)] =", round(psi1_aipw, 4),
    "  E[Y(0)] =", round(psi0_aipw, 4),
    "  ATE =", round(ate_aipw, 4), "\n")
#> AIPW estimates: E[Y(1)] = 1.1749   E[Y(0)] = -0.2676   ATE = 1.4425
cat("AIPW errors:    E[Y(1)] =", round(psi1_aipw - psi1_true, 4),
    "  E[Y(0)] =", round(psi0_aipw - psi0_true, 4),
    "  ATE =", round(ate_aipw - true_ate, 4), "\n")
#> AIPW errors:    E[Y(1)] = 0.195   E[Y(0)] = -0.2516   ATE = 0.4466
```

Because of the misspecified nuisance models, the AIPW estimator misses the
true ATE by about 0.45 — a bias that no amount of data will remove. We now
estimate this bias with HOIF.

## Compute the eHOIF Estimator (with sample splitting)

Within the estimation sample, `hoif_ate()` cross-fits the inverse weighted
Gram matrix against the U-statistics over `n_folds` folds:

```{r}
results_ehoif <- hoif_ate(
  X = X_est,
  A = A_est,
  Y = Y_est,
  mu1 = mu1_hat,
  mu0 = mu0_hat,
  pi = pi_hat,
  transform_method = "none",  # Use raw covariates
  inverse_method = "direct",
  m = 7,                      # Compute up to 7th order
  sample_split = TRUE,
  n_folds = 2,
  seed = 42,
  backend = "torch"           # Use Python backend
)

print(results_ehoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#>   Order     ATE   HOIF1  HOIF0
#> 1     2 -0.4621 -0.2615 0.2006
#> 2     3 -0.4050 -0.2341 0.1709
#> 3     4 -0.4335 -0.2480 0.1855
#> 4     5 -0.4272 -0.2443 0.1829
#> 5     6 -0.4289 -0.2456 0.1833
#> 6     7 -0.4293 -0.2455 0.1838
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4293
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)
```

## Compute the sHOIF Estimator (without sample splitting)

Still using the independent nuisance predictions from above; only the split
between the Gram matrix and the U-statistics is dropped:

```{r}
results_shoif <- hoif_ate(
  X = X_est,
  A = A_est,
  Y = Y_est,
  mu1 = mu1_hat,
  mu0 = mu0_hat,
  pi = pi_hat,
  transform_method = "none",
  inverse_method = "direct",
  m = 7,
  sample_split = FALSE,
  backend = "torch"
)

print(results_shoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#>   Order     ATE   HOIF1  HOIF0
#> 1     2 -0.4318 -0.2496 0.1821
#> 2     3 -0.4458 -0.2568 0.1890
#> 3     4 -0.4373 -0.2520 0.1853
#> 4     5 -0.4362 -0.2515 0.1848
#> 5     6 -0.4366 -0.2517 0.1849
#> 6     7 -0.4367 -0.2517 0.1849
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4367
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)
```

## Debias the AIPW Estimator

The reported `HOIF1`/`HOIF0`/`ATE` values are the higher-order influence
function terms of orders 2 to $m$ — they are *not* by themselves point
estimates of $E[Y(1)]$, $E[Y(0)]$ or the ATE. They estimate the estimable
bias of the AIPW estimator, with the sign convention that **adding** them to
the AIPW estimates removes it:

```{r}
psi1_ehoif <- psi1_aipw + tail(results_ehoif$HOIF1, 1)
psi0_ehoif <- psi0_aipw + tail(results_ehoif$HOIF0, 1)
psi1_shoif <- psi1_aipw + tail(results_shoif$HOIF1, 1)
psi0_shoif <- psi0_aipw + tail(results_shoif$HOIF0, 1)

comparison <- data.frame(
  row.names = c("E[Y(1)]", "E[Y(0)]", "ATE"),
  Truth = c(psi1_true, psi0_true, true_ate),
  AIPW  = c(psi1_aipw, psi0_aipw, ate_aipw),
  eHOIF = c(psi1_ehoif, psi0_ehoif, psi1_ehoif - psi0_ehoif),
  sHOIF = c(psi1_shoif, psi0_shoif, psi1_shoif - psi0_shoif)
)
round(comparison, 4)
#>           Truth    AIPW   eHOIF   sHOIF
#> E[Y(1)]  0.9799  1.1749  0.9294  0.9232
#> E[Y(0)] -0.0160 -0.2676 -0.0838 -0.0826
#> ATE      0.9958  1.4425  1.0132  1.0058

# Errors relative to the truth
round(sweep(comparison[, -1], 1, comparison$Truth, "-"), 4)
#>            AIPW   eHOIF   sHOIF
#> E[Y(1)]  0.1950 -0.0505 -0.0567
#> E[Y(0)] -0.2516 -0.0678 -0.0666
#> ATE      0.4466  0.0174  0.0100
```

The HOIF correction removes essentially all of the AIPW bias for the ATE
(error $0.4466 \rightarrow$ about $0.01$) and the bulk of it for each
treatment arm.

## Visualize Convergence

```{r}
plot(results_ehoif)
```

# Main Function: `hoif_ate()`

The primary interface for computing HOIF estimators.

## Arguments

- **X**: Covariate matrix ($n \times p$)
- **A**: Treatment vector (0 or 1)
- **Y**: Outcome vector
- **mu1, mu0**: Predicted outcomes under treatment and control
- **pi**: Predicted propensity scores
- **transform_method**: Covariate transformation method
  - `"none"`: No transformation (use raw covariates)
  - `"splines"`: B-splines basis expansion
  - `"fourier"`: Fourier basis expansion
- **basis_dim**: Number of basis functions (required if using splines or Fourier)
- **inverse_method**: Method for Gram matrix inversion
  - `"direct"`: Cholesky-based inversion (Moore-Penrose fallback)
  - `"nlshrink"`: Nonlinear shrinkage (Ledoit-Wolf)
  - `"corpcor"`: Shrinkage via corpcor package
- **m**: Maximum order (any order for the Python backend, 2-6 for pure R)
- **sample_split**: Logical; whether to cross-fit the Gram matrix against
  the U-statistics (eHOIF vs sHOIF, see the Sample Splitting section)
- **n_folds**: Number of folds for sample splitting
- **backend**: `"torch"` (default) or `"numpy"` for the Python backend
- **dtype**: Numeric precision of the Python backend (`"float32"`,
  `"float64"`, or `NULL` for automatic selection)
- **pure_R_code**: Use pure R implementation (limited to order 6)
- **seed**: Random seed for reproducibility

## Return Value

A list of class `"hoif_ate"` containing:

- **ATE**: Vector of ATE estimates for orders 2 to m
- **HOIF1**: Vector of treatment arm HOIF estimates
- **HOIF0**: Vector of control arm HOIF estimates
- **IIFF1**: Vector of treatment arm incremental influence function terms
- **IIFF0**: Vector of control arm incremental influence function terms
- **orders**: Vector of orders (2:m)
- **convergence_data**: Data frame for plotting

# Advanced Usage

## Using Basis Expansion

For more flexible modeling of covariate relationships:

```{r}
# B-splines basis
results_splines <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "splines",
  basis_dim = 5,        # 5 basis functions per covariate
  degree = 3,           # Cubic splines
  m = 5,
  sample_split = FALSE
)

# Fourier basis
results_fourier <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "fourier",
  basis_dim = 4,        # 4 Fourier components per covariate
  period = 1,
  m = 5,
  sample_split = FALSE
)
```

## Sample Splitting (Cross-Fitting)

To cross-fit the estimation of the inverse weighted Gram matrix against the
computation of the U-statistics — the eHOIF estimator (see the Sample
Splitting section under Mathematical Background for what exactly is split):

```{r}
results_cf <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "none",
  m = 5,
  sample_split = TRUE,
  n_folds = 5,          # 5-fold cross-fitting
  seed = 42
)
```

## Regularized Gram Matrix Inversion

For high-dimensional settings or when facing numerical instability:

```{r}
# Nonlinear shrinkage
results_nlshrink <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "nlshrink",
  m = 5
)

# corpcor shrinkage
results_corpcor <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "corpcor",
  m = 5
)
```

## Pure R Backend

If the Python environment is not available or for smaller problems:

```{r}
results_pure_r <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  pure_R_code = TRUE,
  m = 6  # Maximum 6 for pure R
)
```

# Computational Details

## Python Backend (ustats)

The default backend uses the [`ustats`](https://github.com/cxy0714/U-Statistics-R) R package, an interface to the Python package [`u-stats`](https://pypi.org/project/u-stats/), for efficient U-statistic computation:

- Supports PyTorch and NumPy backends (GPU acceleration with CUDA-enabled PyTorch)
- Handles arbitrary order U-statistics
- Optimized for memory and speed via Einstein summation

For HOIF estimators of the ATE, a key takeaway from Chen, Zhang & Liu (2025) is:

- when the order $m \le 7$, the computational complexity is
  $O(n^3 + nk^2 + k^3 + n^2 k)$
- when the order $m > 7$, the computational complexity exceeds
  $O(n^4 + nk^2 + k^3 + n^2 k)$

where $n$ is the sample size and $k$ is the dimension of the transformed covariates.

## Pure R Implementation

The pure R fallback (`pure_R_code = TRUE`):

- Implements orders 2-6 using recursive matrix operations
- Uses the `SMUT` package for fast matrix multiplication
- Removes diagonal contributions to ensure proper degenerate U-statistics
- Suitable for smaller datasets or when Python is unavailable

## Performance Considerations

- **Order**: Higher orders increase computation time
- **Basis dimension**: Larger basis increases memory and computation
- **Cross-fitting**: Multiplies computation by number of folds
- **Precision**: On a CUDA GPU the Python backend defaults to `float32`;
  pass `dtype = "float64"` for maximum numerical precision

# Practical Recommendations

## Choosing the Transformation Method

- **"none"**: Use the covariates as provided (e.g. when you have already
  constructed your own basis)
- **"splines"**: Good for smooth nonlinear relationships
- **"fourier"**: Suitable for periodic patterns or high-frequency features

## Choosing the Order

For orders $m \le 7$ the computational complexity does not exceed that of
ordinary matrix computations, $O(n^3 + nk^2 + k^3 + n^2 k)$ (Chen, Zhang &
Liu, 2025), so going up to order 7 is cheap. A practical default is
therefore to compute all orders up to $m = 7$ (the package default) and
inspect the convergence plot. Beyond order 7 the complexity exceeds
$O(n^4 + nk^2 + k^3 + n^2 k)$, so expect substantially longer run times.

## Use a GPU if Available

With `backend = "torch"` and a CUDA-enabled PyTorch (e.g. installed via
`ustats::setup_ustats(gpu = TRUE)`), the U-statistics are computed on the
GPU automatically — by far the most effective speedup for large $n$ or $k$.
On a GPU the default precision is `float32`; pass `dtype = "float64"` if
you need maximum numerical precision.

## Sample Splitting

See the Sample Splitting section under Mathematical Background: the package
only cross-fits the inverse weighted Gram matrix against the U-statistics
(`sample_split = TRUE`, eHOIF). The nuisance functions are inputs —
estimating them, ideally on a separate, independent sample, is the user's
responsibility (as demonstrated in the Quick Start example).

# Troubleshooting

## Python Backend Issues

Verify the Python environment used by `ustats`:

```{r}
ustats::check_ustats_setup()
```

If the environment cannot be repaired easily, switch to the pure R implementation:

```{r}
results <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  pure_R_code = TRUE,
  m = 6  # Limited to order 6
)
```

## Numerical Instability

If you see warnings about matrix inversion:

```{r}
# Try shrinkage methods
results <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "nlshrink"  # or "corpcor"
)
```

## Extreme Propensity Scores

Always trim propensity scores to avoid numerical issues:

```{r}
# Trim to [0.05, 0.95]
pi_hat <- pmax(pmin(pi_hat, 0.95), 0.05)
```

# References

1. Robins, J. M., Li, L., Tchetgen Tchetgen, E., & van der Vaart, A. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. *Probability and Statistics: Essays in Honor of David A. Freedman*, 335-421. <doi:10.1214/193940307000000527>

2. Robins, J. M., Li, L., Mukherjee, R., Tchetgen Tchetgen, E., & van der Vaart, A. (2017). Minimax estimation of a functional on a structured high-dimensional model. *Annals of Statistics*, 45(5), 1951-1987.

3. Liu, L., Mukherjee, R., Newey, W. K., & Robins, J. M. (2017). Semiparametric efficient empirical higher order influence function estimators. *arXiv preprint* <doi:10.48550/arXiv.1705.07577>

4. Liu, L., & Li, C. (2023). New √n-consistent, numerically stable higher-order influence function estimators. *arXiv preprint* <doi:10.48550/arXiv.2302.08097>

5. Chen, X., Zhang, R., & Liu, L. (2025). On computing and the complexity of computing higher-order U-statistics, exactly. *arXiv preprint* <doi:10.48550/arXiv.2508.12627>