The goal of surveysd is to combine all necessary steps to
use calibrated bootstrapping with custom estimation functions. This
vignette will cover the usage of the most important functions. For
insights in the theory used in this package, refer to
vignette("methodology").
A test data set based on
data(eusilc, package = "laeken") can be created with
demo.eusilc()
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc[1:5, .(year, povertyRisk, gender, pWeight)]## year povertyRisk gender pWeight
## <num> <lgcl> <fctr> <num>
## 1: 2010 FALSE female 504.5696
## 2: 2010 FALSE male 504.5696
## 3: 2010 FALSE male 504.5696
## 4: 2010 FALSE female 493.3824
## 5: 2010 FALSE male 493.3824
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
Calibrate each sample according to the distribution of
gender (on a personal level) and region (on a
household level).
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]## year povertyRisk gender pWeight w1 w2 w3
## <num> <lgcl> <fctr> <num> <num> <num> <num>
## 1: 2010 FALSE female 504.5696 1005.6993 1008.6905620 0.4519513
## 2: 2010 FALSE male 504.5696 1005.6993 1008.6905620 0.4519513
## 3: 2010 FALSE male 504.5696 1005.6993 1008.6905620 0.4519513
## 4: 2010 FALSE female 493.3824 983.0797 0.4387304 0.4419026
## 5: 2010 FALSE male 493.3824 983.0797 0.4387304 0.4419026
## w4
## <num>
## 1: 0.4486785
## 2: 0.4486785
## 3: 0.4486785
## 4: 986.3259754
## 5: 986.3259754
Estimate relative amount of persons at risk of poverty per period and
gender.
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates## Key: <year, n, N, gender, estimate_type>
## year n N gender estimate_type val_povertyRisk stE_povertyRisk
## <num> <int> <num> <fctr> <char> <num> <num>
## 1: 2010 7267 3979572 male direct 12.02660 0.5924341
## 2: 2010 7560 4202650 female direct 16.73351 0.8298591
## 3: 2010 14827 8182222 <NA> direct 14.44422 0.6448369
## 4: 2011 7267 3979572 male direct 12.81921 0.4973061
## 5: 2011 7560 4202650 female direct 16.62488 0.8255612
## 6: 2011 14827 8182222 <NA> direct 14.77393 0.6445203
The output contains estimates (val_povertyRisk) as well
as standard errors (stE_povertyRisk) measured in percent.
The rows with gender = NA denotes the aggregate over all
genders for the corresponding year.
Estimate relative amount of persons at risk of poverty per period for
each region, gender, and combination of
both.
group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)## Key: <year, n, N, gender, region, estimate_type>
## year n N gender region estimate_type val_povertyRisk
## <num> <int> <num> <fctr> <fctr> <char> <num>
## 1: 2010 261 122741.8 male Burgenland direct 17.414524
## 2: 2010 288 137822.2 female Burgenland direct 21.432598
## 3: 2010 359 182732.9 male Vorarlberg direct 12.973259
## 4: 2010 374 194622.1 female Vorarlberg direct 19.883637
## 5: 2010 440 253143.7 male Salzburg direct 9.156964
## 6: 2010 484 282307.3 female Salzburg direct 17.939382
## stE_povertyRisk
## <num>
## 1: 4.414071
## 2: 3.523587
## 3: 3.403444
## 4: 1.950587
## 5: 1.840554
## 6: 2.043186