--- title: "Variance estimation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Variance estimation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(weightflow) has_survey <- requireNamespace("survey", quietly = TRUE) has_srvyr <- requireNamespace("srvyr", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE) ``` weightflow computes weights and also estimates their variances. This vignette shows two ways to obtain standard errors from a weightflow recipe, and how they relate. ## Why the adjustments matter for variance A weighting recipe rarely stops at the design weight. It redistributes unknown eligibility, drops out-of-scope units, adjusts for nonresponse and calibrates to known totals. Each of those stages is *estimated from the sample*, so each one adds (or, for calibration, often removes) variability. A linearization that takes the final weights as fixed and applies the ultimate-cluster formula ignores that the nonresponse and calibration steps were themselves estimated. The cleanest way to account for them is to **re-run the whole recipe on each replicate**, so the replicate weights carry the variability of every stage. ## Method 1: a PSU bootstrap that re-applies the recipe `bootstrap_weights()` resamples primary sampling units (PSUs) with replacement within strata and re-runs the recipe on each replicate. Pass the **inert** recipe (do not call `prep()` first): the bootstrap preps it once per replicate. ```{r recipe} dat <- sample_one dat$age_grp <- cut(dat$age, c(0, 30, 45, 60, Inf), labels = c("18-30", "31-45", "46-60", "60+")) spec <- weighting_spec(dat, base_weights = pw) |> step_unknown_eligibility(unknown = unknown_elig, by = "region") |> step_drop_ineligible(ineligible = ineligible) |> step_nonresponse(respondent = hh_responded, method = "weighting_class", by = "region") |> step_select_within(prob = p_within) |> step_nonresponse(respondent = responded, method = "weighting_class", by = c("region", "sex", "age_grp")) |> step_calibrate(method = "raking", margins = list(region = c(table(population$region)), sex = c(table(population$sex)))) boot <- bootstrap_weights(spec, replicates = 200, strata = "region", psu = "psu", seed = 2024, progress = FALSE) boot ``` The multiplier is the **Rao-Wu rescaling bootstrap**. Within a stratum with `n` PSUs, `m` PSUs are drawn with replacement (by default `m = n - 1`), and a unit whose PSU was drawn `t` times is rescaled by ``` lambda = 1 - sqrt(m / (n - 1)) + sqrt(m / (n - 1)) * (n / m) * t ``` which has expectation one and never turns negative. Whole PSUs are kept together (every unit in a drawn PSU is retained), as the design's clustering requires. ### Estimates with bootstrap standard errors The bootstrap variance is `(1 / B) * sum((theta_b - theta_hat)^2)`. ```{r estimates} boot_mean(boot, "income") # mean income boot_total(boot, "employed") # total employed boot_mean(boot, "employed") # employment rate ``` For any other statistic, pass a function of the weights and the data to `bootstrap_estimate()`: ```{r custom} bootstrap_estimate(boot, function(w, d) { ok <- !is.na(d$income) & w > 0 stats::median(rep(d$income[ok], times = round(w[ok]))) # weighted median (approx.) }) ``` ## Method 2: hand the weights to the survey package `as_svydesign()` builds an ultimate-cluster linearization design from a prepped recipe. It is fast, but treats the calibration as fixed. ```{r survey, eval = has_survey} fitted <- prep(spec) des <- as_svydesign(fitted, ids = "psu", strata = "region") survey::svymean(~income, des, na.rm = TRUE) ``` To keep the recipe's adjustments in the variance while still using survey, feed it the bootstrap replicate weights from method 1: ```{r svrep, eval = has_survey} rep_des <- as_svrepdesign(boot) survey::svymean(~income, rep_des, na.rm = TRUE) ``` This matches `boot_mean(boot, "income")` exactly, because `as_svrepdesign()` sets `scale = 1 / B`, `rscales = 1` and `mse = TRUE`. ## Replicate weights for a tidyverse workflow `collect_replicate_weights()` attaches the point weight (`.weight`) and the replicate weights (`rep_1` ... `rep_B`) to the active respondents, ready for srvyr. ```{r srvyr, eval = has_srvyr} df <- collect_replicate_weights(boot) d_rep <- srvyr::as_survey_rep(df, weights = .weight, repweights = dplyr::starts_with("rep_"), type = "bootstrap", combined.weights = TRUE, scale = 1 / attr(df, "R"), rscales = 1, mse = TRUE) srvyr::summarise(d_rep, mean_income = srvyr::survey_mean(income, na.rm = TRUE)) ``` ## Which one to use Use the **recipe-aware bootstrap** (method 1, in any of its three forms) when the nonresponse and calibration steps are a meaningful part of the design and you want their uncertainty reflected; it is the more honest variance. Use the **linearization** (method 2) for a quick, well-understood standard error when the adjustments are minor or you only need the design-and-clustering part. A few practical notes. More replicates give a more stable bootstrap SE; 200 is fine for exploration, 500-1000 for final figures. Each stratum needs at least two PSUs to be resampled (single-PSU strata are left untouched, with a warning). If a replicate leaves a calibration or weighting-class cell empty it is dropped with a warning; coarser `by` cells make the bootstrap more robust.