| Title: | Semiparametric Ecological Inference |
|---|---|
| Description: | Efficient and user-friendly routines for modern ecological inference. Implements the methods described in McCartan & Kuriwaki (2025+) <doi:10.48550/arXiv.2509.20194>, which generalize ecological regression as introduced by Goodman (1953) <doi:10.2307/2088121>. Includes routines for preprocessing, synthetic data generation, double/debiased machine learning (DML) estimation, partial identification bounds, and sensitivity analysis. |
| Authors: | Cory McCartan [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-6251-669X>), Shiro Kuriwaki [aut, cph] (ORCID: <https://orcid.org/0000-0002-5687-2647>) |
| Maintainer: | Cory McCartan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-06-08 20:29:32 UTC |
| Source: | https://github.com/cran/seine |
Produces a table of benchmark values for c_outcome and c_predictor in
ei_sens() for each covariate, following the methodology of Chernozhukov
et al. (2024).
ei_bench(spec, subset = NULL, contrast = NULL)ei_bench(spec, subset = NULL, contrast = NULL)
spec |
An ei_spec object. |
subset |
< |
contrast |
If provided, a list containing entries |
A data frame of benchmark covariate R-squared values.
Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., & Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning (No. w30302). National Bureau of Economic Research.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm)) ei_bench(spec) # with preprocessed covariates spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm), preproc = ~ model.matrix(~ .^2 - 1, data = .x) ) ei_bench(spec) ei_bench(spec, subset = pop_urban > 0.5) # with contrasts spec = ei_spec(elec_1968, vap_white:vap_other, pres_rep_nix:pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm)) ei_bench(spec, contrast = list(predictor = c(1, -1, 0))) ei_bench(spec, contrast = list(outcome = c(1, -1)))data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm)) ei_bench(spec) # with preprocessed covariates spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm), preproc = ~ model.matrix(~ .^2 - 1, data = .x) ) ei_bench(spec) ei_bench(spec, subset = pop_urban > 0.5) # with contrasts spec = ei_spec(elec_1968, vap_white:vap_other, pres_rep_nix:pres_ind_wal, total = pres_total, covariates = c(educ_elem, pop_urban, farm)) ei_bench(spec, contrast = list(predictor = c(1, -1, 0))) ei_bench(spec, contrast = list(outcome = c(1, -1)))
For each observation, computes the minimum and maximum value of each local estimand that is consistent with the accounting identity and the bounds on the outcome. Optionally aggregate the computed bounds across units.
ei_bounds( x, ..., total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE ) ## S3 method for class 'ei_spec' ei_bounds( x, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'formula' ei_bounds( formula, data, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'data.frame' ei_bounds( x, y, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'matrix' ei_bounds( x, y, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## Default S3 method: ei_bounds(x, ...) ## S3 method for class 'ei_bounds' as.array(x, ...)ei_bounds( x, ..., total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE ) ## S3 method for class 'ei_spec' ei_bounds( x, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'formula' ei_bounds( formula, data, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'data.frame' ei_bounds( x, y, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## S3 method for class 'matrix' ei_bounds( x, y, total, contrast = NULL, bounds = c(0, 1), sum_one = NULL, subset = NULL, global = FALSE, ... ) ## Default S3 method: ei_bounds(x, ...) ## S3 method for class 'ei_bounds' as.array(x, ...)
x |
An object of class |
... |
Additional arguments (ignored) |
total |
< |
contrast |
If provided, a list containing entries |
bounds |
A vector |
sum_one |
If |
subset |
< |
global |
If |
formula |
A formula such as |
data |
When a formula is used, |
y |
When
When the outcome is a proportion, you can use |
A data frame with bounds. The .row column in the output
corresponds to the observation index in the input. The min and max
columns contain the minimum and maximum values for each local estimand.
The weight column contains the product of the predictor variable and total
for each observation, where applicable. Taking a weighted average of the
bounds against this column will produce global bounds. It has class ei_bounds.
as.array(ei_bounds): Format bounds as an array with dimensions
<rows>*<predictors>*<outcomes>*2. Does not work if the object has been sorted.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) ei_bounds(spec, bounds = c(0, 1)) ei_bounds(spec, bounds = c(0, 1), global = TRUE) # Infer bounds ei_bounds(pres_ind_wal ~ vap_white, data = elec_1968, total = pres_total, bounds = NULL) # With contrast ei_bounds( spec, bounds = c(0, 1), contrast = list(predictor = c(1, -1, 0), outcome = c(1, -1, 0, 0)) ) # manually aggregate min/max # easier with dplyr: # summarize(across(min:max, ~ weighted.mean(.x, weight)), .by=c(predictor, outcome)) grp_units = split(ei_bounds(spec, bounds = c(0, 1)), ~ predictor + outcome) do.call(rbind, lapply(grp_units, function(b) { tibble::tibble( predictor = b$predictor[1], outcome = b$outcome[1], min = weighted.mean(b$min, b$weight), max = weighted.mean(b$max, b$weight) ) }))data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) ei_bounds(spec, bounds = c(0, 1)) ei_bounds(spec, bounds = c(0, 1), global = TRUE) # Infer bounds ei_bounds(pres_ind_wal ~ vap_white, data = elec_1968, total = pres_total, bounds = NULL) # With contrast ei_bounds( spec, bounds = c(0, 1), contrast = list(predictor = c(1, -1, 0), outcome = c(1, -1, 0, 0)) ) # manually aggregate min/max # easier with dplyr: # summarize(across(min:max, ~ weighted.mean(.x, weight)), .by=c(predictor, outcome)) grp_units = split(ei_bounds(spec, bounds = c(0, 1)), ~ predictor + outcome) do.call(rbind, lapply(grp_units, function(b) { tibble::tibble( predictor = b$predictor[1], outcome = b$outcome[1], min = weighted.mean(b$min, b$weight), max = weighted.mean(b$max, b$weight) ) }))
Produces estimates of overall conditional means from a fitted ecological inference model or Riesz representer. If both a regression model and a Riesz representer are provided, a debiased machine learning (DML) estimate is produced.
ei_est( regr = NULL, riesz = NULL, data, total, subset = NULL, contrast = NULL, outcome = NULL, conf_level = 0.95, use_student = TRUE ) ## S3 method for class 'ei_est' as.matrix(x, which = "estimate", ...) ## S3 method for class 'ei_est' vcov(object, ...) ## S3 method for class 'ei_est' nobs(object, ...)ei_est( regr = NULL, riesz = NULL, data, total, subset = NULL, contrast = NULL, outcome = NULL, conf_level = 0.95, use_student = TRUE ) ## S3 method for class 'ei_est' as.matrix(x, which = "estimate", ...) ## S3 method for class 'ei_est' vcov(object, ...) ## S3 method for class 'ei_est' nobs(object, ...)
regr |
A fitted regression model, from |
riesz |
A fitted Riesz representer, from |
data |
The data frame, matrix, or ei_spec object that was used to fit the regression or Riesz representer. |
total |
< |
subset |
< |
contrast |
If provided, a list containing entries |
outcome |
< |
conf_level |
A numeric specifying the level for confidence intervals.
If |
use_student |
If |
x, object
|
An object of class |
which |
Which column of |
... |
Additional arguments (ignored) |
A data frame with estimates. It has class ei_est, supporting
several methods, and two additional attributes: vcov, containing the
estimated covariance matrix for the estimates, and n, containing the
number of aggregate units used in estimation (the number of rows in
data).
as.matrix(ei_est): Format estimates, standard errors, or other columns as a matrix.
vcov(ei_est): Extract full covariance matrix of estimates
nobs(ei_est): Extract number of units covered by estimates
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) ei_est(regr = m, data = spec, conf_level = 0.95) # Plug-in estimate ei_est(riesz = rr, data = spec) # Weighted (Riesz) estimate est = ei_est(regr = m, riesz = rr, data = spec) # Double/debiased ML estimate # Working with the output as.matrix(est) as.matrix(est, "std.error") vcov(est)[1:4, 1:4] # Contrasts ei_est(regr = m, riesz = rr, data = spec, contrast = list(predictor = c(1, -1, 0))) ei_est(regr = m, riesz = rr, data = spec, contrast = list(predictor = c(-1, 1, 0), outcome = c(1, -1, 0, 0))) # Subsetting est = ei_est(m, rr, data = spec, subset = (state == "Alabama")) as.matrix(est) nobs(est)data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) ei_est(regr = m, data = spec, conf_level = 0.95) # Plug-in estimate ei_est(riesz = rr, data = spec) # Weighted (Riesz) estimate est = ei_est(regr = m, riesz = rr, data = spec) # Double/debiased ML estimate # Working with the output as.matrix(est) as.matrix(est, "std.error") vcov(est)[1:4, 1:4] # Contrasts ei_est(regr = m, riesz = rr, data = spec, contrast = list(predictor = c(1, -1, 0))) ei_est(regr = m, riesz = rr, data = spec, contrast = list(predictor = c(-1, 1, 0), outcome = c(1, -1, 0, 0))) # Subsetting est = ei_est(m, rr, data = spec, subset = (state == "Alabama")) as.matrix(est) nobs(est)
Projects predictions from a fitted regression model onto the accounting constraint using a provided residual covariance matrix. This ensures that each set of local estimates satisfies the accounting identity. Local estimates may be truncated to variable bounds.
ei_est_local( regr, data, total, b_cov, contrast = NULL, bounds = regr$blueprint$bounds, sum_one = NULL, subset = NULL, sample = FALSE, conf_level = 0.95, regr_var = TRUE, unimodal = TRUE, gaussian = FALSE ) ## S3 method for class 'ei_est_local' as.array(x, ...)ei_est_local( regr, data, total, b_cov, contrast = NULL, bounds = regr$blueprint$bounds, sum_one = NULL, subset = NULL, sample = FALSE, conf_level = 0.95, regr_var = TRUE, unimodal = TRUE, gaussian = FALSE ) ## S3 method for class 'ei_est_local' as.array(x, ...)
regr |
A fitted regression model, from |
data |
The data frame, matrix, or ei_spec object that was used to fit the regression. |
total |
< |
b_cov |
A covariance matrix to use in projecting the
local estimates |
contrast |
If provided, a list containing entries |
bounds |
A vector |
sum_one |
If |
subset |
< |
sample |
If |
conf_level |
A numeric specifying the level for confidence intervals.
If |
regr_var |
If |
unimodal |
If |
gaussian |
If |
x |
An object of class |
... |
Additional arguments (ignored) |
Local estimates are produced jointly across outcome variables. When bounds
are applied, unless sum_one = TRUE, the estimates for each observation may
not satisfy logical constraints, including the accounting identity.
Projections are done obliquely in accordance with b_cov via quadratic
programming. Occasionally, the quadratic program may be infeasible due to
the specific data, features of b_cov, or numerical errors. Various
relaxations of the accounting identity and b_cov are attempted in these cases;
indices where relaxations of b_cov were used are stored in the proj_relax
attribute of the output, and indices of infeasible projections are stored in
the proj_misses attribute.
A data frame with estimates. The .row column in the output
corresponds to the observation index in the input. The weight column contains
the product of the predictor variable and total for each observation.
Taking a weighted average of the estimate against this column will produce
a global estimate. It has class ei_est_local.
as.array(ei_est_local): Format estimates an array with dimensions
<rows>*<predictors>*<outcomes>. Does not work if the object has been sorted.
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) ei_est_local(m, spec, b_cov = 0, bounds = c(0, 1), sum_one = TRUE, conf_level = 0.99) b_cov = ei_local_cov(m, spec) e_orth = ei_est_local(m, spec, b_cov = 0, bounds = c(0, 1), sum_one = TRUE) e_nbhd = ei_est_local(m, spec, b_cov = 0.95, bounds = c(0, 1), sum_one = TRUE) e_rcov = ei_est_local(m, spec, b_cov = b_cov, bounds = c(0, 1), sum_one = TRUE) # average interval width c( e_orth = mean(e_orth$conf.high - e_orth$conf.low), e_nbhd = mean(e_nbhd$conf.high - e_nbhd$conf.low), e_rcov = mean(e_rcov$conf.high - e_rcov$conf.low) )data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) ei_est_local(m, spec, b_cov = 0, bounds = c(0, 1), sum_one = TRUE, conf_level = 0.99) b_cov = ei_local_cov(m, spec) e_orth = ei_est_local(m, spec, b_cov = 0, bounds = c(0, 1), sum_one = TRUE) e_nbhd = ei_est_local(m, spec, b_cov = 0.95, bounds = c(0, 1), sum_one = TRUE) e_rcov = ei_est_local(m, spec, b_cov = b_cov, bounds = c(0, 1), sum_one = TRUE) # average interval width c( e_orth = mean(e_orth$conf.high - e_orth$conf.low), e_nbhd = mean(e_nbhd$conf.high - e_nbhd$conf.low), e_rcov = mean(e_rcov$conf.high - e_rcov$conf.low) )
Under a slightly stronger coarsening at random assumption (applying to
second moments), and an assumption of homoskedasticity in the covariates,
this function estimates the covariance matrix of the local estimands
around their local mean.
See the reference for more detail.
ei_local_cov(regr, data, subset = NULL, prior_obs = 10)ei_local_cov(regr, data, subset = NULL, prior_obs = 10)
regr |
A fitted regression model, from |
data |
The data frame, matrix, or ei_spec object that was used to fit the regression or Riesz representer. |
subset |
< |
prior_obs |
The effective sample size of the inverse-Wishart conjugate prior, which shrinks the estimate towards the covariance of the regression residuals. Smaller values mean less shrinkage. |
Homoskedasticity in the covariates implies that the variance of the residuals
depends linearly on the entries of . This function
fits an auto-tuned ridge regression of the empirical second moments of the
residuals on these predictors, and uses the polarization identity discussed
in the references to estimate the covariance for each local estimand. When
the estimated covariance is not positive semidefinite, it is projected onto
the cone of positive semidefinite matrices.
A small amount of shrinkage is applied towards a naive estimator (the
covariance of the regression residuals) under an inverse-Wishart conjugate prior,
whose effective sample size is given by prior_obs.
A covariance matrix. The variables are ordered by predictor within outcome, e.g. (Y1|X1, Y1|X2, ..., Y2|X1, Y2|X2, ...).
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
Divides counts in specified columns by a specified total or by their sum, possibly storing the result in a new column. Also checks for out-of-bounds and missing values, and can create a column containing the remainder amount so that all proportions sum to 1.
ei_proportions(data, ..., .total = ".total", .other = ".other", clamp = 0.001)ei_proportions(data, ..., .total = ".total", .other = ".other", clamp = 0.001)
data |
A data frame. |
... |
< |
.total |
< |
.other |
< |
clamp |
Proportions that are |
A modified data frame. Unselected columns are unmodified.
data(elec_1968) # Make a data frame with counts d_unnorm = with(head(elec_1968, 10), data.frame( vap = vap, vap_white = vap * vap_white, vap_black = vap * vap_black, vap_other = vap * vap_other )) ei_proportions(d_unnorm, vap_white:vap_black, .total=vap) # `.other` column created ei_proportions(d_unnorm, vap_white:vap_other) # no total provided # renaming allowed ei_proportions(d_unnorm, white=vap_white, black=vap_black, .total=c(total=vap), .other="vap_other")data(elec_1968) # Make a data frame with counts d_unnorm = with(head(elec_1968, 10), data.frame( vap = vap, vap_white = vap * vap_white, vap_black = vap * vap_black, vap_other = vap * vap_other )) ei_proportions(d_unnorm, vap_white:vap_black, .total=vap) # `.other` column created ei_proportions(d_unnorm, vap_white:vap_other) # no total provided # renaming allowed ei_proportions(d_unnorm, white=vap_white, black=vap_black, .total=c(total=vap), .other="vap_other")
Fits a penalized regression model for ecological inference, allowing for
overall and unit-level estimates of conditional means using ei_est().
ei_ridge( x, ..., weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE ) ## S3 method for class 'formula' ei_ridge( formula, data, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'ei_spec' ei_ridge( x, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'data.frame' ei_ridge( x, y, z, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'matrix' ei_ridge( x, y, z, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## Default S3 method: ei_ridge(x, ...)ei_ridge( x, ..., weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE ) ## S3 method for class 'formula' ei_ridge( formula, data, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'ei_spec' ei_ridge( x, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'data.frame' ei_ridge( x, y, z, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## S3 method for class 'matrix' ei_ridge( x, y, z, weights, bounds = FALSE, sum_one = FALSE, penalty = NULL, scale = TRUE, vcov = TRUE, ... ) ## Default S3 method: ei_ridge(x, ...)
x |
Depending on the context:
Predictors must be proportions that sum to 1 across rows.
You can use |
... |
Not currently used, but required for extensibility. |
weights |
< |
bounds |
A vector |
sum_one |
If |
penalty |
The ridge penalty (a non-negative scalar). Set to
where |
scale |
If |
vcov |
If |
formula |
A formula such as |
data |
When a formula is used, |
y |
When
When the outcome is a proportion, you can use |
z |
When
These are shifted to have mean zero. If |
The regression is calculated using the singular value decomposition, which
allows for efficient recalculation under different penalty values as part
of leave-one-out cross-validation.
When bounds are provided, the regression is calculated via quadratic
programming, as there is no closed-form solution. The unbounded regression
is run to select the penalty automatically in this case, if it is not
provided. Estimation is still efficient, though somewhat slower than in the
unbounded case. The covariance matrix of the estimates is not available when
bounds are applied.
Note that supplying bounds does not guarantee those bounds will be respected
by ei_est(), and in general it is not necessarily recommended to use bounds
in ei_ridge() any time the outcomes are bounded. Bounds can be enforced
on local estimates in ei_est_local().
An ei_ridge object, which supports various ridge-methods.
The weakest identification result for ecological inference makes no assumption about the number of observations per aggregate unit (the totals). It requires, however, weighting the estimation steps according to the totals. This may reduce efficiency when the totals are variable and a slightly stronger condition holds.
Specifically, if the totals are conditionally mean-independent of the missing data (the aggregation-unit level means of the outcome within each predictor level), given covariates, then it is appropriate to use uniform weights in estimation, or any fixed set of weights.
In general, estimation efficiency is improved when units with larger variance
in the outcome receive less weight. Various built-in options are provided by
the helper functions in ei_wgt().
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_urban, farm)) ei_ridge(spec) ei_ridge(pres_dem_hum + pres_rep_nix + pres_ind_wal + pres_abs ~ vap_white + vap_black + vap_other | pop_urban + farm, data = elec_1968) # bounds inferred all.equal( fitted(ei_ridge(spec, bounds = NULL)), fitted(ei_ridge(spec, bounds = 0:1)) ) # bounds enforced min(fitted(ei_ridge(spec))) min(fitted(ei_ridge(spec, bounds = 0:1)))data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_urban, farm)) ei_ridge(spec) ei_ridge(pres_dem_hum + pres_rep_nix + pres_ind_wal + pres_abs ~ vap_white + vap_black + vap_other | pop_urban + farm, data = elec_1968) # bounds inferred all.equal( fitted(ei_ridge(spec, bounds = NULL)), fitted(ei_ridge(spec, bounds = 0:1)) ) # bounds enforced min(fitted(ei_ridge(spec))) min(fitted(ei_ridge(spec, bounds = 0:1)))
ei_ridge() and ei_riesz()
No checks are performed on the inputs.
Use of ei_ridge() and ei_riesz() is strongly recommended unless many
regressions must be fit, e.g., within a tight loop.
Only works for a single outcome, i.e., y must be a vector, not a matrix.
ei_ridge_impl( x, y, z, weights = rep(1, nrow(x)), bounds = c(-Inf, Inf), sum_one = NULL, penalty = NULL, vcov = TRUE ) ei_riesz_impl(x, z, total, weights = rep(1, nrow(x)), penalty)ei_ridge_impl( x, y, z, weights = rep(1, nrow(x)), bounds = c(-Inf, Inf), sum_one = NULL, penalty = NULL, vcov = TRUE ) ei_riesz_impl(x, z, total, weights = rep(1, nrow(x)), penalty)
x |
A matrix of predictors |
y |
A vector of outcomes |
z |
A matrix of covariates |
weights |
A vector of estimation weights |
bounds |
A vector |
sum_one |
If |
penalty |
The ridge penalty (a non-negative scalar), which must be
specified for |
vcov |
If |
total |
A vector of total observations per unit. |
A list with model components.
Fits a penalized Riesz regression for ecological inference, allowing for
overall estimates of conditional means using ei_est().
ei_riesz(x, ..., weights, penalty, scale = TRUE) ## S3 method for class 'formula' ei_riesz(formula, data, total, weights, penalty, scale = TRUE, ...) ## S3 method for class 'ei_spec' ei_riesz(x, weights, penalty, scale = TRUE, ...) ## S3 method for class 'data.frame' ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...) ## S3 method for class 'matrix' ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...) ## Default S3 method: ei_riesz(x, ...)ei_riesz(x, ..., weights, penalty, scale = TRUE) ## S3 method for class 'formula' ei_riesz(formula, data, total, weights, penalty, scale = TRUE, ...) ## S3 method for class 'ei_spec' ei_riesz(x, weights, penalty, scale = TRUE, ...) ## S3 method for class 'data.frame' ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...) ## S3 method for class 'matrix' ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...) ## Default S3 method: ei_riesz(x, ...)
x |
Depending on the context:
Predictors must be proportions that sum to 1 across rows.
You can use |
... |
Not currently used, but required for extensibility. |
weights |
< |
penalty |
The ridge penalty (a non-negative scalar), which must be
specified. Recommended value is the same penalty used in |
scale |
If |
formula |
A formula such as |
data |
When a formula is used, |
total |
< |
z |
When
These are shifted to have mean zero. If |
The regression is calculated using the singular value decomposition.
An ei_riesz object.
The weakest identification result for ecological inference makes no assumption about the number of observations per aggregate unit (the totals). It requires, however, weighting the estimation steps according to the totals. This may reduce efficiency when the totals are variable and a slightly stronger condition holds.
Specifically, if the totals are conditionally mean-independent of the missing data (the aggregation-unit level means of the outcome within each predictor level), given covariates, then it is appropriate to use uniform weights in estimation, or any fixed set of weights.
In general, estimation efficiency is improved when units with larger variance
in the outcome receive less weight. Various built-in options are provided by
the helper functions in ei_wgt().
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
data(elec_1968) # Recommended: get ridge penalty from ei_ridge() spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_urban, farm)) m = ei_ridge(spec) ei_riesz(spec, penalty = m$penalty) rr = ei_riesz(~ vap_white + vap_black + vap_other | pop_urban + farm, data = elec_1968, total = pres_total, penalty = m$penalty) summary(rr) # Examine the weights and check they have mean 1 head(weights(rr, group = "vap_black")) colMeans(weights(rr)) mean(elec_1968$pres_ind_wal * weights(rr, "vap_white"))data(elec_1968) # Recommended: get ridge penalty from ei_ridge() spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_urban, farm)) m = ei_ridge(spec) ei_riesz(spec, penalty = m$penalty) rr = ei_riesz(~ vap_white + vap_black + vap_other | pop_urban + farm, data = elec_1968, total = pres_total, penalty = m$penalty) summary(rr) # Examine the weights and check they have mean 1 head(weights(rr, group = "vap_black")) colMeans(weights(rr)) mean(elec_1968$pres_ind_wal * weights(rr, "vap_white"))
Relates confounding of an omitted variable with predictor or outcome to bias in ecological estimates, using the nonparametric sensitivity analysis of Chernozhukov et al. (2024).
ei_sens( est, c_outcome = seq(0, 1, 0.01)^2, c_predictor = seq(0, 1, 0.01)^2, bias_bound = NULL, confounding = 1, expand_ci = TRUE )ei_sens( est, c_outcome = seq(0, 1, 0.01)^2, c_predictor = seq(0, 1, 0.01)^2, bias_bound = NULL, confounding = 1, expand_ci = TRUE )
est |
A set of estimates from |
c_outcome |
The (nonparametric) partial |
c_predictor |
How much variation latent variables create in the Riesz
representer, i.e. |
bias_bound |
If provided, overrides |
confounding |
The confounding parameter ( |
expand_ci |
If |
The parameter c_predictor equals , where
is the true Riesz representer and is the Riesz
representer with the observed covariates. The RR can be equivalently
expressed as
where is the weighting term, is the unobserved
confounder, and is the conditional density.
The corresponding c_predictor is then
The bounds here are plug-in estimates and do not incorporate sampling uncertainty. As such, they may fail to cover the true value in finite samples, even under large enough sensitivity parameters; see Section 5 of Chernozhukov et al. (2024).
A data frame of the same format as est, but with additional
columns: c_outcome and c_predictor, matching all combinations of those
arguments, and bias_bound, containing the bound on the amount of bias.
The data frame has additional class ei_sens, which supports a
plot.ei_sens() method.
Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., & Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning (No. w30302). National Bureau of Economic Research.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) ei_sens(est, c_outcome=0.2) # How much variation would the regression residual need to explain of # Riesz representer to cause bias of 0.4? ei_sens(est, c_outcome=1, bias_bound=0.4) # Update confidence intervals and extract as matrix est = ei_est(m, rr, spec, conf_level=0.95) sens = ei_sens(est, c_outcome=0.5, c_predictor=0.2) as.matrix(sens, "conf.high") # Works for contrasts as well est = ei_est(m, rr, spec, contrast = list(predictor=c(1, -1, 0))) ei_sens(est, c_outcome=0.5, c_predictor=0.5)data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) ei_sens(est, c_outcome=0.2) # How much variation would the regression residual need to explain of # Riesz representer to cause bias of 0.4? ei_sens(est, c_outcome=1, bias_bound=0.4) # Update confidence intervals and extract as matrix est = ei_est(m, rr, spec, conf_level=0.95) sens = ei_sens(est, c_outcome=0.5, c_predictor=0.2) as.matrix(sens, "conf.high") # Works for contrasts as well est = ei_est(m, rr, spec, contrast = list(predictor=c(1, -1, 0))) ei_sens(est, c_outcome=0.5, c_predictor=0.5)
The robustness value is the minimum bound for both c_outcome and
c_predictor in ei_sens() such that the bias bound is a certain value.
For example, if the provided bias bound is 0.5, meaning a bias of magnitude
0.5 would be considered problematic, then the robustness value is the minimum
amount of confounding of outcome and predictor (more specifically, the Riesz
representer) that would lead to bias of magnitude 0.5.
ei_sens_rv(est, bias_bound, confounding = 1)ei_sens_rv(est, bias_bound, confounding = 1)
est |
A set of estimates from |
bias_bound |
< |
confounding |
The confounding parameter ( |
A data frame of the same format as est, but with a new rv column
containing the robustness values.
Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., & Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning (No. w30302). National Bureau of Economic Research.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) ei_sens_rv(est, 0.1) # how much confounding for bias of 0.1 ei_sens_rv(est, 2 * std.error) # how much confounding for bias of 2 SE # How much confounding to equalize all estimates (no polarization) y_avg = weighted.mean(elec_1968$pres_ind_wal, elec_1968$pres_total) ei_sens_rv(est, estimate - y_avg) # Extract as matrix as.matrix(ei_sens_rv(est, 0.2), "rv") # Works for contrasts as well est = ei_est(m, rr, spec, contrast = list(predictor=c(1, -1, 0))) ei_sens_rv(est, estimate) # how much to eliminate disparitydata(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) ei_sens_rv(est, 0.1) # how much confounding for bias of 0.1 ei_sens_rv(est, 2 * std.error) # how much confounding for bias of 2 SE # How much confounding to equalize all estimates (no polarization) y_avg = weighted.mean(elec_1968$pres_ind_wal, elec_1968$pres_total) ei_sens_rv(est, estimate - y_avg) # Extract as matrix as.matrix(ei_sens_rv(est, 0.2), "rv") # Works for contrasts as well est = ei_est(m, rr, spec, contrast = list(predictor=c(1, -1, 0))) ei_sens_rv(est, estimate) # how much to eliminate disparity
Uses tidy-select syntax to specify outcomes, predictors, and covariates.
The result of this function can be passed directly into ei_ridge()
or ei_riesz(), or plotted with plot().
ei_spec( data, predictors, outcome, total, covariates = NULL, preproc = NULL, strip = FALSE ) ## S3 method for class 'ei_spec' weights(object, normalize = TRUE, ...)ei_spec( data, predictors, outcome, total, covariates = NULL, preproc = NULL, strip = FALSE ) ## S3 method for class 'ei_spec' weights(object, normalize = TRUE, ...)
data |
A data frame. |
predictors |
< |
outcome |
< |
total |
< |
covariates |
< |
preproc |
An optional function which takes in a data frame of covariates
and returns a modeling-ready numeric matrix of covariates.
Useful to apply any preprocessing, such as a basis transformation, as part
of the estimation process. Passed to |
strip |
Whether to strip common prefixes from column names within each group.
For example, columns named |
object |
An ei_spec object. |
normalize |
If |
... |
Additional arguments (ignored). |
The function is lightweight and does not perform any checking of the
arguments, bounds, sum constraints, etc. All of these checks are performed
by functions that use ei_spec objects.
An ei_spec object, which is a data frame with additional
attributes recording predictors, outcomes, total, and covariates.
weights(ei_spec): Extract the totals from a specification
data(elec_1968) ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, pres_total) # basis expansion if (requireNamespace("bases", quietly = TRUE)) { spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_city:pop_rural, farm:educ_coll, starts_with("inc_")), preproc = ~ bases::b_bart(.x, trees = 500) ) }data(elec_1968) ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, pres_total) # basis expansion if (requireNamespace("bases", quietly = TRUE)) { spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_city:pop_rural, farm:educ_coll, starts_with("inc_")), preproc = ~ bases::b_bart(.x, trees = 500) ) }
Samples data from a following truncated Normal ecological model. The data can
be generated completely at random, or can be generated conditional on
provided predictors x and/or covariates z.
ei_synthetic( n, p = 0, n_x = 2, x = n_x:1, z = 0.25 * exp(-(seq_len(p) - 1)/2), r2_xz = 0.5, r2_bz = 0.5, b_loc = NULL, b_cov = NULL )ei_synthetic( n, p = 0, n_x = 2, x = n_x:1, z = 0.25 * exp(-(seq_len(p) - 1)/2), r2_xz = 0.5, r2_bz = 0.5, b_loc = NULL, b_cov = NULL )
n |
The number of rows (geographies) to generate. Defaults to the number
of rows in |
p |
The number of covariates. Defaults to the number of columns in |
n_x |
The number of predictor variables. Defaults to the number of
columns in |
x |
Either a matrix or data frame containing the predictor percentages in each row, or a vector containing Dirichlet parameters to use in sampling predictor percentages. |
z |
A matrix or data frame containing geography-level covariates, or a vector of values to form a Toeplitz covariance matrix for the random covariates. |
r2_xz |
The approximate |
r2_bz |
The approximate |
b_loc |
The center of the distribution of geography-level parameters. Defaults to a linearly spaced sequence across groups from 0.5 to 0.9. Because of the truncation, this will not exactly be the mean of the geography-level parameters. |
b_cov |
The residual covariance matrix for geography-level parameters.
Defaults to |
This function samples data from the following truncated Normal ecological model:
where and are the mean and covariance of the
Normal approximation to a Dirichlet distribution with parameters supplied by
the x argument below, and , , and are
matrices sampled to have certain properties, as described below.
The subscripts on indicate truncation; i.e., both the
predictors x and the unit-level parameters b are truncated to the
n_x-dimensional hypercube.
The matrix is a symmetric Toeplitz matrix with diagonals provided by
the z argument. Generally, a decreasing set of nonnegative values will be
sufficient for a positive definite .
The matrices and are initially filled with
independent samples from a standard Normal distribution. is then
projected so that its rows sum to zero, preserving the sum-to-1 requirement
on x, and so that its columns are scaled to produce the correct
value matching r2_xz. The matrix is likewise scaled to
produce the correct value matching r2_bz. Due to the truncation
in the sampling of x and b, the in-sample values will generally
be slightly smaller than the provided arguments.
Aspects of the model can be replaced with data provided to the function.
If x or z is provided as a matrix or data frame, then the other value is
sampled from its marginal distribution. If both are provided, then the first
row of the model is skipped.
An ei_spec object with additional attributes:
b_loc and b_cov
Lambda with the coefficients of z
eta, the linear predictor for b
est_true, the mean of the geography-level parameters, formatted
similarly to the output from ei_est()
r2_xz_act and r2_bz_act, containing the actual (sample)
values for x and z, and b and z, respectively.
ei_synthetic(n = 10) ei_synthetic(n = 10, p = 2, n_x = 3) # Manual hyperparameters: x2 dominant and z1, z2 very correlated ei_synthetic(n = 10, x = c(1, 95, 4), z = c(10, 9.999)) # Condition on provided x but not z data(elec_1968) ei_synthetic( x = cbind(elec_1968$pop_white, 1 - elec_1968$pop_white), p = 5, b_loc = c(0.3, 0.9), b_cov = matrix(c(0.02, 0.016, 0.016, 0.2), nrow=2) )ei_synthetic(n = 10) ei_synthetic(n = 10, p = 2, n_x = 3) # Manual hyperparameters: x2 dominant and z1, z2 very correlated ei_synthetic(n = 10, x = c(1, 95, 4), z = c(10, 9.999)) # Condition on provided x but not z data(elec_1968) ei_synthetic( x = cbind(elec_1968$pop_white, 1 - elec_1968$pop_white), p = 5, b_loc = c(0.3, 0.9), b_cov = matrix(c(0.02, 0.016, 0.016, 0.2), nrow=2) )
The CAR assumption, which is required for accurate estimates with ei_est(),
implies that the conditional expectation function (CEF) of the aggregate
outcomes will take a certain partially linear form. This function tests that
implication by comparing a fully nonparametric estimate of the CEF to one
in the partially linear form implied by CAR, and comparing their goodness-of-fit.
It is very important that preproc be used in the spec to perform a
rich basis transformation of the covariates and predictors; a missing
preproc will lead to a warning.
ei_test_car( spec, weights, iter = 1000, undersmooth = 1.5, use_chisq = nrow(spec) >= 2000, use_hc = FALSE )ei_test_car( spec, weights, iter = 1000, undersmooth = 1.5, use_chisq = nrow(spec) >= 2000, use_hc = FALSE )
spec |
An |
weights |
< |
iter |
The number of permutations to use when estimating the null
distribution, including the original identity permutation.
Ignored when |
undersmooth |
A value to divide the estimated ridge penalty by when partialling out the partially linear component of the model. A larger value will smooth the partially linear component less, which may improve Type I error control in finite samples at the cost of power. |
use_chisq |
If |
use_hc |
If |
The test is a Kennedy-Cade (1996) style permutation test on a Wald statistic
for the coefficients not included in the "reduced" model that would be fit
by ei_ridge(). The test statistic is asymptotically chi-squared under the
null and may be anti-conservative in small samples, especially when the
dimensionality of the basis expansion is large.
A tibble with one row per outcome variable and columns describing
the test results. The p.value column contains the p-values for the test.
P-values are not adjusted by default; pass them to stats::p.adjust() if
desired.
Helwig, N. E. (2022). Robust Permutation Tests for Penalized Splines. Stats, 5(3), 916-933.
Kennedy, P. E., & Cade, B. S. (1996). Randomization tests for multiple regression. Communications in Statistics-Simulation and Computation, 25(4), 923-936.
McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.
data(elec_1968) # basis expansion: poly() with degree=2 not recommended in practice preproc = if (requireNamespace("bases", quietly = TRUE)) { ~ bases::b_bart(.x, trees = 100) } else { ~ poly(as.matrix(.x), degree=2, simple=TRUE) } spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_city:pop_rural, farm:educ_coll, starts_with("inc_")), preproc = preproc ) ei_test_car(spec, iter=20) # use a larger number in practicedata(elec_1968) # basis expansion: poly() with degree=2 not recommended in practice preproc = if (requireNamespace("bases", quietly = TRUE)) { ~ bases::b_bart(.x, trees = 100) } else { ~ poly(as.matrix(.x), degree=2, simple=TRUE) } spec = ei_spec( data = elec_1968, predictors = vap_white:vap_other, outcome = pres_dem_hum:pres_abs, total = pres_total, covariates = c(pop_city:pop_rural, farm:educ_coll, starts_with("inc_")), preproc = preproc ) ei_test_car(spec, iter=20) # use a larger number in practice
Several built-in helper functions to generate estimation weights from a
vector of unit totals, or an existing ei_spec() object.
ei_wgt_unif(x) ei_wgt_prop(x) ei_wgt_sqrt(x)ei_wgt_unif(x) ei_wgt_prop(x) ei_wgt_sqrt(x)
x |
A numeric vector of unit totals, or an existing |
A numeric vector of estimation weights with the same number of
observations as x. These will have mean 1.
ei_wgt_unif(): Uniform weights across units with any population.
Appropriate if the unit-level variance is constant, i.e., homoskedastic.
ei_wgt_prop(): Weights proportional to the totals. Appropriate if the
unit-level variance is inversely proportional to the number of
observations.
ei_wgt_sqrt(): Weights proportional to the square root of the totals.
Appropriate if the unit-level variance is inversely proportional to the
square root of the number of observations.
data(elec_1968) ei_wgt_unif(head(elec_1968$pres_total)) spec = ei_spec(head(elec_1968), predictors = vap_white:vap_other, outcome = pres_ind_wal, total = pres_total) ei_wgt_prop(spec) ei_wgt_sqrt(spec)data(elec_1968) ei_wgt_unif(head(elec_1968$pres_total)) spec = ei_spec(head(elec_1968), predictors = vap_white:vap_other, outcome = pres_ind_wal, total = pres_total) ei_wgt_prop(spec) ei_wgt_sqrt(spec)
ei_est
Stores additional data and attributes on a generic model class so that it
can be used as the regr argument to ei_est(). Given the wide variety of
model classes, there is no guarantee this function will work. However, most
model classes supporting a fitted() and predict() method will work as long
as there is no transformation of the predictor variables as part of the model
formula or fitting.
ei_wrap_model(x, data, predictors = NULL, outcome = NULL, ...)ei_wrap_model(x, data, predictors = NULL, outcome = NULL, ...)
x |
|
data |
A data frame or matrix containing the data used to fit the model,
or an |
predictors |
< |
outcome |
< |
... |
Additional arguments passed to the |
An ei_wrapped object, which has the information required to use
the provided x with ei_est().
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, pres_total, covariates = c(pop_urban, farm)) # Note: this is not a model recommended for valid ecological inference! m = suppressWarnings( glm(pres_ind_wal ~ 0 + vap_white + vap_black + vap_other + pop_urban + farm, data = spec, family = "binomial") ) m_wrap = ei_wrap_model(m, spec, type = "response") print(m_wrap) ei_est(m_wrap, data = spec) # notice all estimates nonnegativedata(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, pres_total, covariates = c(pop_urban, farm)) # Note: this is not a model recommended for valid ecological inference! m = suppressWarnings( glm(pres_ind_wal ~ 0 + vap_white + vap_black + vap_other + pop_urban + farm, data = spec, family = "binomial") ) m_wrap = ei_wrap_model(m, spec, type = "response") print(m_wrap) ei_est(m_wrap, data = spec) # notice all estimates nonnegative
County-level dataset containing election results and demographics from the 1968 U.S. presidential election in the states of Virginia, North Carolina, South Carolina, Georgia, Florida, Alabama, Mississippi, Louisiana, Arkansas, Tennessee, and Texas.
elec_1968elec_1968
A data frame with 1,143 rows and 41 variables:
fipsCounty FIPS code
stateState name
abbrState abbreviation
regionCensus region
divisionCensus division
countyCounty name
popTotal population at the 1970 census
pop_city, pop_urban, pop_rural
Proportion of the population in city, urban and rural areas
pop_white, pop_black, pop_aian, pop_asian, pop_hisp
Proportion of the population in each racial group.
The first four columns sum to 1, but pop_hisp should be considered
separately.
vapVoting-age population at the 1970 census
vap_white, vap_black, vap_other
Proportion of the voting-age population in each racial group
farm, nonfarm
Proportion of the population in farm and non-farm households
educ_elem, educ_hsch, educ_coll
Proportion of the population with elementary, high school, and college education
cvapCitizen voting-age population at the 1970 census
cvap_white, cvap_black, cvap_other
Proportion of the citizen voting-age population in each racial group
inc_00_03k, inc_03_08k, inc_08_25k, inc_25_99k
Proportion of the population in households with income in each bracket. Median household income in 1970 was $9,870
pres_dem_hum, pres_rep_nix, pres_ind_wal, pres_abs, pres_oth
Proportion of votes for Humphrey, Nixon, and Wallace, and other candidates.
pres_abs is calculated as one minus the Humphrey, Nixon, and Wallace vote shares.
pres_totalTotal number of votes cast for president.
pres_turnProportion of the voting-age population that turned out to vote.
pres_turn_icpsrProportion of the voting-age population that turned out to vote using ICPSR data in the denominator.
adjA FIPS-indexed adjacency list capturing the geographic (rook)
adjacency between counties. To convert to a 1-indexed adjacency list,
run lapply(elec_1968$adj, function(x) match(x, elec_1968$fips)).
Census data: Steven Manson, Jonathan Schroeder, David Van Riper, Katherine Knowles, Tracy Kugler, Finn Roberts, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 19.0. 1970 Decennial Census. Minneapolis, MN: IPUMS. 2024. doi:10.18128/D050.V19.0
Presidential election data: Clubb, Jerome M., Flanigan, William H., and Zingale, Nancy H. Electoral Data for Counties in the United States: Presidential and Congressional Races, 1840-1972. Inter-university Consortium for Political and Social Research (distributor), 2006-11-13. doi:10.3886/ICPSR08611.v1
Displays bias bound as a function of c_outcome and c_predictor in
ei_sens() on a contour plot. Bounds on the outcome, and standard errors of
the point estimate, can be overlaid as contours on the plot to aid in
interpretation. Benchmarked values of c_outcome and c_predictor based on
the observed covariates can also be overlaid.
## S3 method for class 'ei_sens' plot( x, y = NULL, predictor = NULL, bounds = NULL, bench = NULL, plot_se = 1:3, contour_exp = -2:-1, ..., lwd = 1, pch = 8, cex = 1 )## S3 method for class 'ei_sens' plot( x, y = NULL, predictor = NULL, bounds = NULL, bench = NULL, plot_se = 1:3, contour_exp = -2:-1, ..., lwd = 1, pch = 8, cex = 1 )
x |
An ei_sens object |
y |
An outcome variable, as a character vector. Defaults to first. |
predictor |
A predictor variable to plot, as a character vector. Defaults to first. |
bounds |
A vector |
bench |
A data frame of benchmark values, from |
plot_se |
A vector of multiples of the standard error to plot as contours. |
contour_exp |
Powers of 10 for which to plot contours of the bias bound. |
... |
Additional arguments passed on to |
lwd |
Scaling factor for the contour line widths |
pch |
The point type (see |
cex |
Scaling factor for the benchmark points and labels, if provided |
None, called for side effects (plotting).
Chernozhukov, V., Cinelli, C., Newey, W., Sharma, A., & Syrgkanis, V. (2024). Long story short: Omitted variable bias in causal machine learning (No. w30302). National Bureau of Economic Research.
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) sens = ei_sens(est) plot(sens) plot(sens, bench = ei_bench(spec), plot_se=NULL)data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_ind_wal, total = pres_total, covariates = c(state, pop_urban, farm)) m = ei_ridge(spec) rr = ei_riesz(spec, penalty = m$penalty) est = ei_est(m, rr, spec) sens = ei_sens(est) plot(sens) plot(sens, bench = ei_bench(spec), plot_se=NULL)
Useful for quickly visualizing scatterplots of outcome versus predictor variables.
## S3 method for class 'ei_spec' plot(x, ..., pch = 16, cex = 0.2)## S3 method for class 'ei_spec' plot(x, ..., pch = 16, cex = 0.2)
x |
An ei_spec object. |
... |
Additional arguments passed to |
pch, cex
|
As in |
None, called for side effects (plotting).
data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, pres_total) plot(spec)data(elec_1968) spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs, pres_total) plot(spec)
ei_ridge modelsModels fitted with ei_ridge() support various generic methods.
## S3 method for class 'ei_ridge' predict(object, new_data, type = "numeric", ...) ## S3 method for class 'ei_ridge' fitted(object, ...) ## S3 method for class 'ei_ridge' residuals(object, ...) ## S3 method for class 'ei_ridge' vcov(object, ...) ## S3 method for class 'ei_ridge' summary(object, ...) ## S3 method for class 'ei_ridge' weights(object, normalize = TRUE, ...)## S3 method for class 'ei_ridge' predict(object, new_data, type = "numeric", ...) ## S3 method for class 'ei_ridge' fitted(object, ...) ## S3 method for class 'ei_ridge' residuals(object, ...) ## S3 method for class 'ei_ridge' vcov(object, ...) ## S3 method for class 'ei_ridge' summary(object, ...) ## S3 method for class 'ei_ridge' weights(object, normalize = TRUE, ...)
object |
A fitted ei_ridge model |
new_data |
A data frame, matrix, or ei_spec of new predictors. |
type |
The type of predictions to generate; only |
... |
Additional arguments (ignored) |
normalize |
If |
See individual methods.
predict(ei_ridge): Predict from an ei_ridge model.
fitted(ei_ridge): Extract fitted values.
residuals(ei_ridge): Extract residuals.
vcov(ei_ridge): Extract unscaled covariance of coefficient estimates.
Covariance estimate is not currently heteroskedasticity-robust.
Multiply by sigma2 from the fitted model to get the covariance matrix for
a particular outcome variable.
summary(ei_ridge): Summarize the model's fitted values and .
weights(ei_ridge): Extract estimation weights from a fitted model.
Extracts a single set of Riesz representer weights from an ei_riesz object,
for a selected group.
## S3 method for class 'ei_riesz' weights(object, group = TRUE, loo = FALSE, ...)## S3 method for class 'ei_riesz' weights(object, group = TRUE, loo = FALSE, ...)
object |
An |
group |
The group for which to extract the weights, as a numeric index
or a character column name. The special (default) value |
loo |
If |
... |
Additional arguments (ignored) |
A numeric vector of weights