Title: | Robust Survey Statistics Estimation |
---|---|
Description: | Robust (outlier-resistant) estimators of finite population characteristics like of means, totals, ratios, regression, etc. Available methods are M- and GM-estimators of regression, weight reduction, trimming, and winsorization. The package extends the 'survey' <https://CRAN.R-project.org/package=survey> package. |
Authors: | Beat Hulliger [aut] , Tobias Schoch [aut, cre] , Martin Sterchi [ctr, com], R-core [ctb, cph] (for zeroin2.c) |
Maintainer: | Tobias Schoch <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7 |
Built: | 2024-12-21 06:54:33 UTC |
Source: | CRAN |
A key design pattern of the package is that the majority of the estimating methods is available in two "flavors":
bare-bone methods
survey methods
Bare-bone methods are stripped-down versions of the survey methods in terms of functionality and informativeness. These functions may serve users and package developers as building blocks. In particular, bare-bone functions cannot compute variances.
The survey methods are much more capable and depend, for variance estimation, on the survey package.
Bare-bone methods: weighted_mean_trimmed
and
weighted_total_trimmed
Survey methods: svymean_trimmed
and
svytotal_trimmed
Bare-bone methods:
Survey methods:
Bare-bone methods: weighted_mean_dalen
and
weighted_total_dalen
Survey methods: svymean_dalen
and
svytotal_dalen
Bare-bone methods:
huber2
(weighted Huber Proposal 2
estimator)
Survey methods:
mer
(minimum estimated risk estimator)
Regression M-estimators: svyreg_huberM
and
svyreg_tukeyM
Regression GM-estimators (Mallows and Schweppe):
svyreg_huberGM
and svyreg_tukeyGM
Ratio M-estimators:
svyratio_huber
and svyratio_tukey
Note: The functions svyreg_huber
and
svyreg_tukey
are deprecated, use instead
svyreg_huberM
and svyreg_tukeyM
, respectively;
see also robsurvey-deprecated.
Regression predictors: svymean_reg
and
svytotal_reg
Ratio predictors: svymean_ratio
and
svytotal_ratio
Methods and utility functions for objects of class svyreg_rob
.
## S3 method for class 'svyreg_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' summary(object, mode = c("design", "model", "compound"), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' coef(object, ...) ## S3 method for class 'svyreg_rob' vcov(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' SE(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' residuals(object, ...) ## S3 method for class 'svyreg_rob' fitted(object, ...) ## S3 method for class 'svyreg_rob' robweights(object) ## S3 method for class 'svyreg_rob' plot(x, which = 1L:4L, hex = FALSE, caption = c("Standardized residuals vs. Fitted Values", "Normal Q-Q", "Response vs. Fitted values", "Sqrt of abs(Residuals) vs. Fitted Values"), panel = if (add.smooth) function(x, y, ...) panel.smooth(x, y, iter = iter.smooth, ...) else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, add.smooth = getOption("add.smooth"), iter.smooth = 3, label.pos = c(4, 2), cex.caption = 1, cex.oma.main = 1.25)
## S3 method for class 'svyreg_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' summary(object, mode = c("design", "model", "compound"), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' coef(object, ...) ## S3 method for class 'svyreg_rob' vcov(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' SE(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' residuals(object, ...) ## S3 method for class 'svyreg_rob' fitted(object, ...) ## S3 method for class 'svyreg_rob' robweights(object) ## S3 method for class 'svyreg_rob' plot(x, which = 1L:4L, hex = FALSE, caption = c("Standardized residuals vs. Fitted Values", "Normal Q-Q", "Response vs. Fitted values", "Sqrt of abs(Residuals) vs. Fitted Values"), panel = if (add.smooth) function(x, y, ...) panel.smooth(x, y, iter = iter.smooth, ...) else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, add.smooth = getOption("add.smooth"), iter.smooth = 3, label.pos = c(4, 2), cex.caption = 1, cex.oma.main = 1.25)
x |
object of class |
digits |
|
... |
additional arguments passed to the method. |
object |
object of class |
mode |
|
which |
|
hex |
|
caption |
|
panel |
panel function. The useful alternative to
|
sub.caption |
|
main |
|
ask |
|
id.n |
|
labels.id |
|
cex.id |
|
qqline |
|
add.smooth |
|
iter.smooth |
|
label.pos |
|
cex.caption |
|
cex.oma.main |
|
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
For variance estimation (summary
, vcov
, and
SE
) three modes are available:
"design"
: design-based variance estimator using
linearization; see Binder (1983)
"model"
: model-based weighted variance estimator
(the sampling design is ignored)
"compound"
: design-model-based variance
estimator; see Rubin-Bleuer and Schiopu-Kratina (2005)
and Binder and Roberts (2009)
The following utility functions are available:
summary
gives a summary of the estimation
properties
plot
shows diagnostic plots for the estimated
regression model
robweights
extracts the robustness weights
(if available)
coef
extracts the estimated regression coefficients
vcov
extracts the (estimated) covariance matrix
residuals
extracts the residuals
fitted
extracts the fitted values
Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review 51, 279–292. doi:10.2307/1402588
Binder, D. A. and Roberts, G. (2009). Design- and Model-Based Inference for Model Parameters. In: Sample Surveys: Inference and Analysis ed. by Pfeffermann, D. and Rao, C. R. Volume 29B of Handbook of Statistics, Amsterdam: Elsevier, Chap. 24, 33–54 doi:10.1016/S0169-7161(09)00224-7
Rubin-Bleuer, S. and Schiopu-Kratina, I. (2005). On the Two-phase framework for joint model and design-based inference. The Annals of Statistics 33, 2789–2810. doi:10.1214/009053605000000651
Weighted least squares: svyreg
; robust weighted regression
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
head(workplace) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 14) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Utility functions summary(m) coef(m) SE(m) vcov(m) residuals(m) fitted(m) robweights(m)
head(workplace) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 14) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Utility functions summary(m) coef(m) SE(m) vcov(m) residuals(m) fitted(m) robweights(m)
Methods and utility functions for objects of class svystat_rob
.
mse(object, ...) ## S3 method for class 'svystat_rob' mse(object, ...) ## S3 method for class 'svystat' mse(object, ...) ## S3 method for class 'svystat_rob' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' coef(object, ...) ## S3 method for class 'svystat_rob' SE(object, ...) ## S3 method for class 'svystat_rob' vcov(object, ...) ## S3 method for class 'svystat_rob' scale(x, ...) ## S3 method for class 'svystat_rob' residuals(object, ...) ## S3 method for class 'svystat_rob' fitted(object, ...) robweights(object) ## S3 method for class 'svystat_rob' robweights(object) ## S3 method for class 'svystat_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...)
mse(object, ...) ## S3 method for class 'svystat_rob' mse(object, ...) ## S3 method for class 'svystat' mse(object, ...) ## S3 method for class 'svystat_rob' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' coef(object, ...) ## S3 method for class 'svystat_rob' SE(object, ...) ## S3 method for class 'svystat_rob' vcov(object, ...) ## S3 method for class 'svystat_rob' scale(x, ...) ## S3 method for class 'svystat_rob' residuals(object, ...) ## S3 method for class 'svystat_rob' fitted(object, ...) robweights(object) ## S3 method for class 'svystat_rob' robweights(object) ## S3 method for class 'svystat_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...)
object |
object of class |
digits |
|
... |
additional arguments passed to the method. |
x |
object of class |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
Utility functions:
mse
computes the estimated risk (mean square
error) in presence of representative outliers; see also
mer
summary
gives a summary of the estimation properties
robweights
extracts the robustness weights
coef
extracts the estimate of location
SE
extracts the (estimated) standard error
vcov
extracts the (estimated) covariance matrix
residuals
extracts the residuals
fitted
extracts the fitted values
svymean_dalen
, svymean_huber
,
svymean_ratio
, svymean_reg
,
svymean_tukey
, svymean_trimmed
,
svymean_winsorized
svytotal_dalen
, svytotal_huber
,
svytotal_ratio
, svytotal_reg
,
svytotal_tukey
, svytotal_trimmed
,
svytotal_winsorized
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated one-sided k winsorized population total (i.e., k = 2 observations # are winsorized at the top of the distribution) wtot <- svytotal_k_winsorized(~employment, dn, k = 2) # Show summary statistic of the estimated total summary(wtot) # Estimated mean square error (MSE) mse(wtot) # Estimate, std. err., variance, and the residuals coef(wtot) SE(wtot) vcov(wtot) residuals(wtot) # M-estimate of the total (Huber psi-function; tuning constant k = 3) mtot <- svytotal_huber(~employment, dn, k = 45) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(mtot), robweights(mtot))
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated one-sided k winsorized population total (i.e., k = 2 observations # are winsorized at the top of the distribution) wtot <- svytotal_k_winsorized(~employment, dn, k = 2) # Show summary statistic of the estimated total summary(wtot) # Estimated mean square error (MSE) mse(wtot) # Estimate, std. err., variance, and the residuals coef(wtot) SE(wtot) vcov(wtot) residuals(wtot) # M-estimate of the total (Huber psi-function; tuning constant k = 3) mtot <- svytotal_huber(~employment, dn, k = 45) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(mtot), robweights(mtot))
Data from a simple random sample (without replacement) of 100 of the 3141 counties in the United Stated (U.S. Bureau of the Census, 1994).
data(counties)
data(counties)
A data.frame
with 100 observations on the following variables:
state
state, [character]
.
county
county, [character]
.
landarea
land area, 1990 (square miles),
[double]
.
totpop
population total, 1992, [double]
.
unemp
number of unemployed persons, 1991,
[double]
.
farmpop
farm population, 1990, [double]
.
numfarm
number of farms, 1987, [double]
.
farmacre
acreage in farms, 1987, [double]
.
weights
sampling weight, [double]
.
fpc
finite population corretion, [double]
.
The data (and 10 additional variables) are published in Lohr (1999, Appendix C).
Lohr, S. L. (1999). Sampling: Design and Analysis, Pacific Grove (CA): Duxbury Press.
head(counties) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties) }
head(counties) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties) }
Measurement of copper content in wholemeal flour (measured in parts per million).
data(flour)
data(flour)
A data.frame
with 24 observations (sorted in ascending order)
on the following variables:
copper
copper content [double]
.
weight
weight [double]
.
The data are published in Maronna et al. (2019, p. 2).
Maronna, R. A., Martin, R. D., Yohai, V. J. and Salibián-Barrera, M. (2019). Robust Statistics: Theory and Methods (with R), Hoboken (NJ): John Wiley and Sons, 2nd edition. doi:10.1002/9781119214656
head(flour)
head(flour)
Weighted Huber Proposal 2 estimator of location and scatter.
huber2(x, w, k = 1.5, na.rm = FALSE, maxit = 50, tol = 1e-04, info = FALSE, k_Inf = 1e6, df_cor = TRUE)
huber2(x, w, k = 1.5, na.rm = FALSE, maxit = 50, tol = 1e-04, info = FALSE, k_Inf = 1e6, df_cor = TRUE)
x |
|
w |
|
k |
|
na.rm |
|
maxit |
|
tol |
|
info |
|
k_Inf |
|
df_cor |
|
Function huber2
computes the weighted Huber (1964) Proposal 2
estimates of location and scale.
The method is initialized by the weighted median (location) and the weighted interquartile range (scale).
The return value depends on info
:
info = FALSE
:estimate of mean or total [double]
info = TRUE
:a [list]
with items:
characteristic
[character]
,
estimator
[character]
,
estimate
[double]
,
variance
(default: NA
),
robust
[list]
,
residuals
[numeric vector]
,
model
[list]
,
design
(default: NA
),
[call]
The huber2
estimator is initialized by the weighted median and
the weighted (scaled) interquartile range. For unweighted data, this
estimator differs from hubers
in MASS,
which is initialized by mad
.
The difference between the estimators is usually negligible (for
sufficiently small values of tol
). See examples.
Huber, P. J. (1964). Robust Estimation of a Location Parameter. Annals of Mathematical Statistics 35, 73–101. doi:10.1214/aoms/1177703732
head(workplace) # Weighted "Proposal 2" estimator of the mean huber2(workplace$employment, workplace$weight, k = 8) # More information on the estimate, i.e., info = TRUE m <- huber2(workplace$employment, workplace$weight, k = 8, info = TRUE) # Estimate of scale m$scale # Comparison with MASS::hubers (without weights). We make a copy of MASS::hubers library(MASS) hubers_mod <- hubers # Then we replace mad by the (scaled) IQR as initial scale estimator body(hubers_mod)[[7]][[3]][[2]] <- substitute(s0 <- IQR(y, type = 2) * 0.7413) # Define the numerical tolerance TOLERANCE <- 1e-8 # Comparison m1 <- huber2(workplace$payroll, rep(1, 142), tol = TOLERANCE) m2 <- hubers_mod(workplace$payroll, tol = TOLERANCE)$mu m1 / m2 - 1 # The absolute relative difference is < 4.0-09 (smaller than TOLERANCE)
head(workplace) # Weighted "Proposal 2" estimator of the mean huber2(workplace$employment, workplace$weight, k = 8) # More information on the estimate, i.e., info = TRUE m <- huber2(workplace$employment, workplace$weight, k = 8, info = TRUE) # Estimate of scale m$scale # Comparison with MASS::hubers (without weights). We make a copy of MASS::hubers library(MASS) hubers_mod <- hubers # Then we replace mad by the (scaled) IQR as initial scale estimator body(hubers_mod)[[7]][[3]][[2]] <- substitute(s0 <- IQR(y, type = 2) * 0.7413) # Define the numerical tolerance TOLERANCE <- 1e-8 # Comparison m1 <- huber2(workplace$payroll, rep(1, 142), tol = TOLERANCE) m2 <- hubers_mod(workplace$payroll, tol = TOLERANCE)$mu m1 / m2 - 1 # The absolute relative difference is < 4.0-09 (smaller than TOLERANCE)
A simple random sample of 70 patients in inpatient hospital treatment.
data(losdata)
data(losdata)
A data.frame
with data on the following variables:
los
length of stay (days) [integer]
.
weight
sampling weight [double]
.
fpc
finite population correction [double]
.
The losdata
are a simple random sample without replacement (SRSWOR)
of size patients from the (fictive) population of
patients in inpatient hospital treatment. We have
constructed the
losdata
as a showcase; though, the LOS
measurements are real data that we have taken from the 201 observations
in Ruffieux et al. (2000). The original LOS data of Ruffieux et al.
(2000) are available in the R package robustbase; see
robustbase::data(los)
. Our losdata
are a SRSWOR of
size from the 201 original observations.
Ruffieux et al. (2000) and data.frame los
in the R
package robustbase.
Ruffieux, C., Paccaud, F. and Marazzi, A. (2000). Comparing rules for truncating hospital length of stay. Casemix Quarterly 2.
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) }
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) }
mer
is an adaptive M-estimator of the weighted mean or total. It
is defined as the estimator that minimizes the estimated mean square error,
mse
, of the estimator under consideration.
mer(object, verbose = TRUE, max_k = 10, init = 1, method = "Brent", optim_args = list())
mer(object, verbose = TRUE, max_k = 10, init = 1, method = "Brent", optim_args = list())
object |
an object of class |
verbose |
|
init |
|
method |
|
max_k |
|
optim_args |
|
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
MER-estimators are available for the methods svymean_huber
,
svytotal_huber
, svymean_tukey
and
svytotal_tukey
.
Object of class svystat_rob
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) } # M-estimator of the total with tuning constant k = 8 m <- svymean_huber(~los, dn, type = "rhj", k = 8) # MER estimator mer(m)
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) } # M-estimator of the total with tuning constant k = 8 m <- svymean_huber(~los, dn, type = "rhj", k = 8) # MER estimator mer(m)
Probability-proportional-to-size sample (PPS) without replacement of municipalities from the MU284 population in Särndal et al. (1992). The sample inclusion probabilities are proportional to the population size in 1975 (variable P75).
data(MU284pps)
data(MU284pps)
A data.frame
with 60 observations on the following variables:
LABEL
identifier variable, [integer]
.
P85
1985 population size (in thousands),
[double]
.
P75
1975 population size (in thousands),
[double]
.
RMT85
Revenues from the 1985 municipal taxation
(in millions of kronor), [double]
.
CS82
number of Conservative seats in municipal council,
[double]
.
SS82
number of Social-Democrat seats in municipal
council (1982), [double]
.
S82
total number of seats in municipal council (1982),
[double]
.
ME84
number of municipal employees in 1984,
[double]
.
REV84
real estate values according to 1984 assessment
(in millions of kronor), [double]
.
REG
geographic region indicator, [integer]
.
CL
cluster indicator (a cluster consists of a set of
neighbouring municipalities), [integer]
.
weights
sampling weights, [double]
.
pi
sample inclusion probability, [double]
.
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284
population data
are available in the sampling package of Tillé and Matei (2021).
The data frame MU284pps
is a probability-proportional-to-size
sample (PPS) without replacement from the MU284 population.
The sample inclusion probabilities are proportional to the
population size in 1975 (variable P75). The sample has been
selected by Brewer’s method; see Tillé (2006, Chap. 7).
The sampling weight (inclusion probabilities) are calibrated to
the population size and the population total of P75.
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
Tillé, Y. (2006). Sampling Algorithms. New York: Springer-Verlag.
head(MU284pps) library(survey) # Survey design with inclusion probabilities proportional to size dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer", calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer") }
head(MU284pps) library(survey) # Survey design with inclusion probabilities proportional to size dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer", calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer") }
Stratified simple random sample (without replacement) of municipalities from the MU284 population in Särndal et al. (1992). Stratification is by geographic region and a take-all stratum (by 1975 population size), which includes the big cities Stockholm, Göteborg, and Malmö.
data(MU284strat)
data(MU284strat)
A data.frame
with 60 observations on the following variables:
LABEL
identifier variable, [integer]
.
P85
1985 population size (in thousands),
[double]
.
P75
1975 population size (in thousands),
[double]
.
RMT85
Revenues from the 1985 municipal taxation
(in millions of kronor), [double]
.
CS82
number of Conservative seats in municipal council,
[double]
.
SS82
number of Social-Democrat seats in municipal
council (1982), [double]
.
S82
total number of seats in municipal council (1982),
[double]
.
ME84
number of municipal employees in 1984,
[double]
.
REV84
real estate values according to 1984 assessment
(in millions of kronor), [double]
.
CL
cluster indicator (a cluster consists of a set of
neighbouring municipalities), [integer]
.
REG
geographic region indicator, [integer]
.
Stratum
stratum indicator, [integer]
.
weights
sampling weights, [double]
.
fpc
finite population correction, [double]
.
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284
population data
are available in the sampling package of Tillé and Matei (2021).
The population is divided into two parts based on 1975 population
size (P75
):
the MU281 population, which consists of the 281 smallest municipalities;
the MU3 population of the three biggest municipalities/ cities in Sweden (Stockholm, Göteborg, and Malmö).
The three biggest cities take exceedingly large values (representative
outliers) on almost all of the variables. To account for this, a stratified
sample has been drawn from the MU284 population using a take-all stratum.
The sample data, MU284strat
, (of size ) consists of
a stratified simple random sample (without replacement)
from the MU281 population, where stratification is by
geographic region (REG
) with proportional sample
size allocation;
a take-all stratum that includes the three biggest cities/ municipalities (population M3).
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
head(MU284strat) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat, calibrate.formula = ~-1 + Stratum) } else { # legacy mode svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat) }
head(MU284strat) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat, calibrate.formula = ~-1 + Stratum) } else { # legacy mode svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat) }
Methods to compute the first-order sample inclusion probabilities (given a measure of size) and sampling mechanisms to draw samples with probabilities proportional to size (pps).
pps_probabilities(size, n) pps_draw(x, method = "brewer", sort = TRUE) ## S3 method for class 'prob_pps' print(x, ...)
pps_probabilities(size, n) pps_draw(x, method = "brewer", sort = TRUE) ## S3 method for class 'prob_pps' print(x, ...)
size |
|
n |
|
x |
object of class |
method |
|
sort |
|
... |
additional arguments. |
Function pps_probabilities
computes the first-order sample inclusion
probabilities for a given sample size n
; see e.g., Särndal et
al., 1992 (p. 90). The probabilities (and additional attributes) are
returned as a vector, more precisely as an object of class prob_pps
.
For an object of class prob_pps
(inclusion probabilities and
additional attributes), function pps_draw
draws a pps sample
without replacement and returns the indexes of the population elements.
Only the method of Brewer (1963, 1975) is currently implemented.
Function pps_probabilities
returns the probabilities (an object
of class (prob_pps
).
Function pps_draw
returns a pps sample of indexes from the
population elements.
Brewer, K. W. R. (1963). A Model of Systematic Sampling with Unequal Probabilities. Australian Journal of Statistics 5, 93–105. doi:10.1111/j.1467-842X.1963.tb00132.x
Brewer, K. W. R. (1975). A simple procedure for pswor,
Australian Journal of Statistics 17, 166–172.
doi:10.1111/j.1467-842X.1975.tb00954.x
Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
# We are going to pretend that the workplace sample is our population. head(workplace) # The population size is N = 142. We want to draw a pps sample (without # replacement) of size n = 10, where the variable employment is the measure of # size. The first-order sample inclusion probabilities are calculated as # follows p <- pps_probabilities(workplace$employment, n = 10) # Now, we draw a pps sample using Brewer's method. pps_draw(p, method = "brewer")
# We are going to pretend that the workplace sample is our population. head(workplace) # The population size is N = 142. We want to draw a pps sample (without # replacement) of size n = 10, where the variable employment is the measure of # size. The first-order sample inclusion probabilities are calculated as # follows p <- pps_probabilities(workplace$employment, n = 10) # Now, we draw a pps sample using Brewer's method. pps_draw(p, method = "brewer")
These functions are provided for compatibility with older versions of the package only, and may be defunct as soon as the next release.
Use instead:
Internal function to call the robust survey regression GM-estimator; this function is only intended for internal use. The function does not check or validate the arguments. In particular, missing values in the data may make the function crash.
robsvyreg(x, y, w, k, psi, type, xwgt, var = NULL, verbose = TRUE, ...) svyreg_control(tol = 1e-5, maxit = 100, k_Inf = 1e6, init = NULL, mad_center = TRUE, ...)
robsvyreg(x, y, w, k, psi, type, xwgt, var = NULL, verbose = TRUE, ...) svyreg_control(tol = 1e-5, maxit = 100, k_Inf = 1e6, init = NULL, mad_center = TRUE, ...)
x |
|
y |
|
w |
|
k |
|
psi |
|
type |
|
xwgt |
|
var |
|
verbose |
|
tol |
|
maxit |
|
k_Inf |
|
init |
either |
mad_center |
|
... |
additional arguments passed to the method
(see |
Not documented
[list]
Dalen's estimators Z2 and Z3 of the population mean and total; see
weighted_mean_dalen
for further details.
svymean_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...) svytotal_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...)
svymean_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...) svytotal_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...)
x |
a one-sided |
design |
an object of class |
censoring |
|
type |
|
na.rm |
|
verbose |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
type = "Z2"
or type = "Z3"
; see
weighted_mean_dalen
for more details.
See weighted_mean_dalen
and
weighted_total_dalen
.
Object of class svystat_rob
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
Overview (of all implemented functions)
svymean_trimmed
, svytotal_trimmed
,
svymean_winsorized
, svytotal_winsorized
,
svymean_huber
and svytotal_huber
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Dalen's estimator Z3 of the population total svytotal_dalen(~employment, dn, censoring = 20000, type = "Z3") # Dalen's estimator Z3 of the population mean m <- svymean_dalen(~employment, dn, censoring = 20000, type = "Z3") # Summarize summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Dalen's estimator Z3 of the population total svytotal_dalen(~employment, dn, censoring = 20000, type = "Z3") # Dalen's estimator Z3 of the population mean m <- svymean_dalen(~employment, dn, censoring = 20000, type = "Z3") # Summarize summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m)
Robust ratio predictor (M-estimator) of the population mean and total with Huber and Tukey biweight (bisquare) psi-function.
svytotal_ratio(object, total, variance = "wu", keep_object = TRUE, ...) svymean_ratio(object, total, N = NULL, variance = "wu", keep_object = TRUE, N_unknown = FALSE, ...)
svytotal_ratio(object, total, variance = "wu", keep_object = TRUE, ...) svymean_ratio(object, total, N = NULL, variance = "wu", keep_object = TRUE, N_unknown = FALSE, ...)
object |
an object of class |
total |
|
N |
|
variance |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The (robust) ratio predictor of the population total or mean is computed in two steps.
Step 1: Fit the ratio model associated with the predictor
by one of the functions svyratio_huber
or svyratio_tukey
. The fitted model is called
object
.
Step 2: Based on the fitted model obtained in the first step,
we predict the population total and mean, respectively, by
the predictors svytotal_ratio
and svymean_ratio
,
where object
is the fitted ratio model.
Two types of auxiliary variables are distinguished: (1)
population size and (2) the population total of the
auxiliary variable (denominator) used in the ratio model.
The option N_unknown = TRUE
can be used in the predictor
of the population mean if is unknown.
Three variance estimators are implemented (argument
variance
): "base"
, "wu"
, and "hajek"
.
These estimators correspond to the estimators v0
, v1
,
and v2
in Wu (1982).
The return value is an object of class svystat_rob
.
Thus, the utility functions summary
,
coef
,
SE
,
vcov
,
residuals
,
fitted
, and
robweights
are available.
Object of class svystat_rob
Wu, C.-F. (1982). Estimation of Variance of the Ratio Estimator. Biometrika 69, 183–189.
Overview (of all implemented functions)
svymean_reg
and svytotal_reg
for (robust) GREG
regression predictors
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression - and
-estimators
svymean_huber
, svytotal_huber
,
svymean_tukey
and svytotal_tukey
for
-estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust ratio M-estimator with Huber psi-function rat <- svyratio_huber(~payroll, ~ employment, dn, k = 5) # Summary of the ratio estimate summary(rat) # Diagnostic plots of the ration/regression M-estimate (e.g., # standardized residuals against fitted values) plot(rat, which = 1L) # Plot of the robustness weights of the ratio/regression M-estimate # against its residuals plot(residuals(rat), robweights(rat)) # Robust ratio predictor of the population mean m <- svymean_ratio(rat, total = 1001233, N = 90840) m # Summary of the ratio estimate of the population mean summary(m) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust ratio M-estimator with Huber psi-function rat <- svyratio_huber(~payroll, ~ employment, dn, k = 5) # Summary of the ratio estimate summary(rat) # Diagnostic plots of the ration/regression M-estimate (e.g., # standardized residuals against fitted values) plot(rat, which = 1L) # Plot of the robustness weights of the ratio/regression M-estimate # against its residuals plot(residuals(rat), robweights(rat)) # Robust ratio predictor of the population mean m <- svymean_ratio(rat, total = 1001233, N = 90840) m # Summary of the ratio estimate of the population mean summary(m) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
Generalized regression estimator (GREG) predictor of the mean and total,
and robust GREG -estimator predictor
svytotal_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, ...) svymean_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, N_unknown = FALSE, ...)
svytotal_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, ...) svymean_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, N_unknown = FALSE, ...)
object |
an object of class |
totals |
|
N |
|
type |
|
k |
|
check.names |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The (robust) GREG predictor of the population total or mean is computed in two steps.
Step 1: Fit the regression model associated with the GREG
predictor by one of the functions svyreg
,
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
or svyreg_tukeyGM
.
The fitted model is called object
.
Step 2: Based on the fitted model obtained in the first step,
we predict the population total and mean, respectively, by
the predictors svytotal_reg
and svymean_reg
,
where object
is the fitted regression model.
The following GREG predictors are available:
k = NULL
)The following non-robust GREG predictors are available:
type = "projective"
ignores the bias correction
term of the GREG predictor; see Särndal and Wright (1984).
type = "ADU"
is the "standard" GREG,
which is an asymptotically design unbiased (ADU)
predictor; see Särndal et al.(1992, Chapter 6).
If the fitted regression model (object
) does include
a regression intercept, the predictor types "projective"
and "ADU"
are identical because the bias correction
of the GREG is zero by design.
The following robust GREG predictors are available:
type = "huber"
and type = "tukey"
are,
respectively, the robust GREG predictors with Huber
and Tukey bisquare (biweight) psi-function. The tuning
constant must satisfy 0 < k <= Inf
.
We can use the Huber-type GREG predictor although the
model has been fitted by the regression estimator
with Tukey psi-function (and vice versa).
type = "BR"
is the bias-corrected robust GREG
predictor of Beaumont and Rivest (2009), which is
inspired by the bias-corrected robust predictor of
Chambers (1986). The tuning constant must satisfy
0 < k <= Inf
.
type = "lee"
is the bias-corrected predictor
of Lee (1991; 1992). Tthe tuning constant k
must
satisfy 0 <= k <= 1
.
type = "duchesne"
is the bias-corrected,
calibration-type estimator/ predictor of Duchesne (1999).
The tuning constant k
must be specified as a
vector k = c(a, b)
, where a
and b
are the tuning constants of Duchesne's modified Huber
psi-function (default values: a = 9
and
b = 0.25
).
Two types of auxiliary variables are distinguished: (1)
population size and (2) population totals of the
auxiliary variables used in the regression model (i.e.,
non-constant explanatory variables).
The option N_unknown = TRUE
can be used in the predictor
of the population mean if is unknown.
The names of the entries of totals
are checked against
the names of the regression fit (object
), unless we specify
check.names = FALSE
.
The return value is an object of class svystat_rob
.
Thus, the utility functions summary
,
coef
,
SE
,
vcov
,
residuals
,
fitted
, and
robweights
are available.
Object of class svystat_rob
Beaumont, J.-F. and Rivest, L.-P. (2009). Dealing with outliers in survey data. In: Sample Surveys: Theory, Methods and Inference ed. by Pfeffermann, D. and Rao, C. R. Volume 29A of Handbook of Statistics, Amsterdam: Elsevier, Chap. 11, 247–280. doi:10.1016/S0169-7161(08)00011-4
Chambers, R. (1986). Outlier Robust Finite Population Estimation. Journal of the American Statistical Association 81, 1063–1069. doi:10.1080/01621459.1986.10478374
Duchesne, P. (1999). Robust calibration estimators, Survey Methodology 25, 43–56.
Gwet, J.-P. and Rivest, L.-P. (1992). Outlier Resistant Alternatives to the Ratio Estimator. Journal of the American Statistical Association 87, 1174–1182. doi:10.1080/01621459.1992.10476275
Lee, H. (1991). Model-Based Estimators That Are Robust to Outliers, in Proceedings of the 1991 Annual Research Conference, Bureau of the Census, 178–202. Washington, DC, Department of Commerce.
Lee, H. (1995). Outliers in business surveys. In: Business survey methods ed. by Cox, B. G., Binder, D. A., Chinnappa, B. N., Christianson, A., Colledge, M. J. and Kott, P. S. New York: John Wiley and Sons, Chap. 26, 503–526. doi:10.1002/9781118150504.ch26
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer.
Särndal, C.-E. and Wright, R. L. (1984). Cosmetic Form of Estimators in Survey Sampling. Scandinavian Journal of Statistics 11, 146–156.
Overview (of all implemented functions)
svymean_ratio
and svytotal_ratio
for (robust)
ratio predictors
svymean_huber
, svytotal_huber
,
svymean_tukey
and svytotal_tukey
for
-estimators
svyreg
, svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression - and
-estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust regression M-estimator with Huber psi-function reg <- svyreg_huberM(payroll ~ employment, dn, k = 3) # Summary of the regression M-estimate summary(reg) # Diagnostic plots of the regression M-estimate (e.g., standardized # residuals against fitted values) plot(reg, which = 1L) # Plot of the robustness weights of the regression M-estimate against # its residuals plot(residuals(reg), robweights(reg)) # ADU (asymptotically design unbiased) estimator m <- svytotal_reg(reg, totals = 1001233, 90840, type = "ADU") m # Robust GREG estimator of the mean; the population means of the auxiliary # variables are from a register m <- svymean_reg(reg, totals = 1001233, 90840, type = "huber", k = 20) m # Summary of the robust GREG estimate summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m) # Approximation of the estimated mean square error mse(m)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust regression M-estimator with Huber psi-function reg <- svyreg_huberM(payroll ~ employment, dn, k = 3) # Summary of the regression M-estimate summary(reg) # Diagnostic plots of the regression M-estimate (e.g., standardized # residuals against fitted values) plot(reg, which = 1L) # Plot of the robustness weights of the regression M-estimate against # its residuals plot(residuals(reg), robweights(reg)) # ADU (asymptotically design unbiased) estimator m <- svytotal_reg(reg, totals = 1001233, 90840, type = "ADU") m # Robust GREG estimator of the mean; the population means of the auxiliary # variables are from a register m <- svymean_reg(reg, totals = 1001233, 90840, type = "huber", k = 20) m # Summary of the robust GREG estimate summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m) # Approximation of the estimated mean square error mse(m)
Weighted trimmed population mean and total.
svymean_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...) svytotal_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)
svymean_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...) svytotal_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
Population mean or total. Let denote
the estimated trimmed population mean; then, the estimated trimmed
total is given by
with
, where
summation is over all observations in the sample.
The methods trims the LB
of the smallest observations and the (1 -
UB
)
of the largest observations from the data.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994).
Object of class svystat_rob
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
Overview (of all implemented functions)
weighted_mean_trimmed
and weighted_total_trimmed
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated trimmed population total (5% symmetric trimming) svytotal_trimmed(~employment, dn, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) svymean_trimmed(~employment, dn, UB = 0.95)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated trimmed population total (5% symmetric trimming) svytotal_trimmed(~employment, dn, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) svymean_trimmed(~employment, dn, UB = 0.95)
Weighted winsorized mean and total
svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...) svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)
svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...) svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
trim_var |
|
k |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
Population mean or total. Let
denote the estimated winsorized population mean; then, the
estimated winsorized total is given by
with
, where summation
is over all observations in the sample.
The amount of winsorization can be specified in relative or absolute terms:
Relative: By specifying LB
and UB
,
the method winsorizes the LB
of the smallest observations and the
(1 -
UB
) of the largest
observations from the data.
Absolute: By specifying argument k
in the
functions with the "infix" _k_
in their name (e.g.,
svymean_k_winsorized
), the
largest observations are winsorized,
,
where
denotes the sample size. E.g.,
k = 2
implies that the largest and the second largest observation
are winsorized.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994). Two estimators are available:
simple_var = FALSE
Variance estimator of
the winsorized mean/ total. The estimator depends on
the estimated probability density function evaluated at
the winsorization thresholds, which can be – depending
on the context – numerically unstable. As a remedy,
a simplified variance estimator is available by
setting simple_var = TRUE
.
simple_var = TRUE
Variance is approximated using the variance estimator of the trimmed mean/ total.
See:
Object of class svystat_rob
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
Overview (of all implemented functions)
weighted_mean_winsorized
,
weighted_mean_k_winsorized
,
weighted_total_winsorized
and
weighted_total_k_winsorized
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated winsorized population mean (5% symmetric winsorization) svymean_winsorized(~employment, dn, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) svytotal_k_winsorized(~employment, dn, k = 2)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated winsorized population mean (5% symmetric winsorization) svymean_winsorized(~employment, dn, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) svytotal_k_winsorized(~employment, dn, k = 2)
Weighted Huber and Tukey M-estimator of the population mean and total (robust Horvitz-Thompson estimator)
svymean_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svytotal_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svymean_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...) svytotal_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)
svymean_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svytotal_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svymean_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...) svytotal_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)
x |
a one-sided |
design |
an object of class |
k |
|
type |
|
asym |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
type = "rht"
or type = "rwm"
; see
weighted_mean_huber
or
weighted_mean_tukey
for more details.
Taylor linearization (residual variance estimator).
See weighted_mean_huber
weighted_mean_tukey
,
weighted_total_huber
, and
weighted_total_tukey
.
Object of class svystat_rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust Horvitz-Thompson M-estimator of the population total svytotal_huber(~employment, dn, k = 9, type = "rht") # Robust weighted M-estimator of the population mean m <- svymean_huber(~employment, dn, k = 12, type = "rwm") # Summary statistic summary(m) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust Horvitz-Thompson M-estimator of the population total svytotal_huber(~employment, dn, k = 9, type = "rht") # Robust weighted M-estimator of the population mean m <- svymean_huber(~employment, dn, k = 12, type = "rwm") # Summary statistic summary(m) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
svyratio_huber
and svyratio_tukey
compute the robust
-estimator of the ratio of two variables with, respectively,
Huber and Tukey biweight (bisquare) psi-function.
svyratio_huber(numerator, denominator, design, k, var = denominator, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyratio_tukey(numerator, denominator, design, k, var = denominator, na.rm = FALSE, verbose = TRUE, ...)
svyratio_huber(numerator, denominator, design, k, var = denominator, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyratio_tukey(numerator, denominator, design, k, var = denominator, na.rm = FALSE, verbose = TRUE, ...)
numerator |
a one-sided |
denominator |
a one-sided |
design |
an object of class |
k |
|
var |
a |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
The functions svyratio_huber
and svyratio_tukey
are
implemented as wrapper functions of the regression estimators
svyreg_huberM
and svyreg_tukeyM
. See
the help files of these functions (e.g., on how additional
parameters can be passed via ...
or on the usage of the
var
argument).
Object of class svyreg.rob
and ratio
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
for robust
regression estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyratio_huber(~payroll, ~employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract estimated standard error SE(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyratio_huber(~payroll, ~employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract estimated standard error SE(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
Weighted least squares estimator of regression
svyreg(formula, design, var = NULL, na.rm = FALSE, ...)
svyreg(formula, design, var = NULL, na.rm = FALSE, ...)
formula |
a |
design |
an object of class |
var |
a one-sided |
na.rm |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg
computes the regression coefficients by weighted least
squares.
Models for svyreg_rob
are specified symbolically. A typical
model has the form response ~ terms
where response
is
the (numeric) response vector and terms
is a series of terms
which specifies a linear predictor for response
; see
formula
and lm
.
A formula has an implied intercept term. To remove this use either
y ~ x - 1
or y ~ 0 + x
; see formula
for more
details of allowed formulae.
Object of class svyreg_rob
.
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
Robust estimating methods svyreg_huberM
,
svyreg_huberGM
, svyreg_tukeyM
and
svyreg_tukeyGM
.
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute the regression estimate (weighted least squares) m <- svyreg(payroll ~ employment, dn) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., Normal Q-Q-plot) plot(m, which = 2L)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute the regression estimate (weighted least squares) m <- svyreg(payroll ~ employment, dn) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., Normal Q-Q-plot) plot(m, which = 2L)
The function svyreg_huber
is deprecated; use instead
svyreg_huberM
.
svyreg_huber(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
svyreg_huber(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
See svyreg_huberM
.
Object of class svyreg.rob
svyreg_huberM
and svyreg_huberGM
compute, respectively,
a survey weighted M- and GM-estimator of regression using
the Huber psi-function.
svyreg_huberM(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyreg_huberGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
svyreg_huberM(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyreg_huberGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg_huberM
and svyreg_huberGM
compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted
least squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad
) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted interquartile
range (IQR).
The regression M-estimator svyreg_huberM
is robust against
residual outliers (granted that the tuning constant k
is
chosen appropriately).
Function svyreg_huberGM
implements the Mallows and Schweppe
regression GM-estimator (see argument type
).
The regression GM-estimators are robust against residual outliers
and outliers in the model's design space (leverage
observations; see argument xwgt
).
See svyreg_control
.
Models for svyreg_rob
are specified symbolically. A typical
model has the form response ~ terms
, where response
is the (numeric) response vector and terms
is a series of
terms which specifies a linear predictor for response
; see
formula
and lm
.
A formula has an implied intercept term. To remove this use
either y ~ x - 1
or y ~ 0 + x
; see
formula
for more details of allowed formulae.
Object of class svyreg.rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
Overview (of all implemented functions)
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods
Other robust estimating methods svyreg_tukeyM
and
svyreg_tukeyGM
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
The function svyreg_tukey
is deprecated; use instead
svyreg_tukeyM
.
svyreg_tukey(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
svyreg_tukey(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
See svyreg_tukeyM
.
Object of class svyreg.rob
svyreg_tukeyM
and svyreg_tukeyGM
compute, respectively,
a survey weighted M- and GM-estimator of regression using
the biweight Tukey psi-function.
svyreg_tukeyM(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...) svyreg_tukeyGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
svyreg_tukeyM(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...) svyreg_tukeyGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library
or require
).
svyreg_tukeyM
and svyreg_tukeyGM
compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted least
squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad
) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted
interquartile range (IQR).
The regression M-estimator svyreg_tukeyM
is robust against
residual outliers (granted that the tuning constant k
is
chosen appropriately).
Function svyreg_huberGM
implements the Mallows and Schweppe
regression GM-estimator (see argument type
).
The regression GM-estimators are robust against residual outliers
and outliers in the model's design space (leverage
observations; see argument xwgt
).
See svyreg_control
.
Models for svyreg_rob
are specified symbolically. A typical
model has the form response ~ terms
, where response
is the (numeric) response vector and terms
is a series of
terms which specifies a linear predictor for response
; see
formula
and lm
.
A formula has an implied intercept term. To remove this use
either y ~ x - 1
or y ~ 0 + x
; see
formula
for more details of allowed formulae.
Object of class svyreg.rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
Overview (of all implemented functions)
summary
, coef
,
residuals
, fitted
,
SE
and vcov
plot
for regression diagnostic plot methods.
Other robust estimating methods svyreg_huberM
and
svyreg_huberGM
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Tukey bisquare psi-function m <- svyreg_tukeyM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Tukey bisquare psi-function m <- svyreg_tukeyM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
Weighted five-number summary used for survey.design
and
survey.design2
objects (similar to base::summary
for [numeric vectors]
).
svysummary(object, design, na.rm = FALSE, ...)
svysummary(object, design, na.rm = FALSE, ...)
object |
one-sided |
design |
an object of class |
na.rm |
|
... |
additional arguments. |
A weighted five-number summary (numeric variable) or a frequency table (factor variable).
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } svysummary(~payroll, dn)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } svysummary(~payroll, dn)
Weighted (normalized) interquartile range
weighted_IQR(x, w, na.rm = FALSE, constant = 0.7413)
weighted_IQR(x, w, na.rm = FALSE, constant = 0.7413)
x |
|
w |
|
na.rm |
|
constant |
|
By default, the weighted IQR is normalized to be an unbiased estimate of
scale at the Gaussian core model. If normalization is not wanted, put
constant = 1
.
Weighted IQR
Overview (of all implemented functions)
head(workplace) # normalized weighted IQR (default) weighted_IQR(workplace$employment, workplace$weight) # weighted IQR (without normalization) weighted_IQR(workplace$employment, workplace$weight, constant = 1)
head(workplace) # normalized weighted IQR (default) weighted_IQR(workplace$employment, workplace$weight) # weighted IQR (without normalization) weighted_IQR(workplace$employment, workplace$weight, constant = 1)
weighted_line
fits a robust line and allows weights.
weighted_line(x, y = NULL, w, na.rm = FALSE, iter = 1) ## S3 method for class 'medline' print(x, ...) ## S3 method for class 'medline' coef(object, ...) ## S3 method for class 'medline' residuals(object, ...) ## S3 method for class 'medline' fitted(object, ...)
weighted_line(x, y = NULL, w, na.rm = FALSE, iter = 1) ## S3 method for class 'medline' print(x, ...) ## S3 method for class 'medline' coef(object, ...) ## S3 method for class 'medline' residuals(object, ...) ## S3 method for class 'medline' fitted(object, ...)
x |
|
y |
|
w |
|
na.rm |
|
iter |
|
object |
object of class |
... |
additional arguments passed to the method. |
weighted_line
uses different quantiles for splitting the sample
than stats::line()
.
intercept and slope of the fitted line
Overview (of all implemented functions)
head(cars) # compute weighted line weighted_line(cars$speed, cars$dist, w = rep(1, length(cars$speed))) m <- weighted_line(cars$speed, cars$dist, w = rep(1:10, each = 5)) m coef(m) residuals(m) fitted(m)
head(cars) # compute weighted line weighted_line(cars$speed, cars$dist, w = rep(1, length(cars$speed))) m <- weighted_line(cars$speed, cars$dist, w = rep(1:10, each = 5)) m coef(m) residuals(m) fitted(m)
Weighted median of the absolute deviations from the weighted median
weighted_mad(x, w, na.rm = FALSE, constant = 1.482602)
weighted_mad(x, w, na.rm = FALSE, constant = 1.482602)
x |
|
w |
|
na.rm |
|
constant |
|
The weighted MAD is computed as the (normalized) weighted median of the
absolute deviation from the weighted median; see
weighted_median
. The weighted MAD is normalized to be
an unbiased estimate of scale at the Gaussian core model. If
normalization is not wanted, put constant = 1
.
Weighted median absolute deviation from the (weighted) median
Overview (of all implemented functions)
head(workplace) # normalized weighted MAD (default) weighted_mad(workplace$employment, workplace$weight) # weighted MAD (without normalization) weighted_mad(workplace$employment, workplace$weight, constant = 1)
head(workplace) # normalized weighted MAD (default) weighted_mad(workplace$employment, workplace$weight) # weighted MAD (without normalization) weighted_mad(workplace$employment, workplace$weight, constant = 1)
Weighted total and mean (Horvitz-Thompson and Hajek estimators)
weighted_mean(x, w, na.rm = FALSE) weighted_total(x, w, na.rm = FALSE)
weighted_mean(x, w, na.rm = FALSE) weighted_total(x, w, na.rm = FALSE)
x |
|
w |
|
na.rm |
|
weighted_total
and weighted_mean
compute, respectively,
the Horvitz-Thompson estimator of the population total and the Hajek
estimator of the population mean.
Estimated population mean or total
Overview (of all implemented functions)
head(workplace) # Horvitz-Thompson estimator of the total weighted_total(workplace$employment, workplace$weight) # Hajek estimator of the mean weighted_mean(workplace$employment, workplace$weight)
head(workplace) # Horvitz-Thompson estimator of the total weighted_total(workplace$employment, workplace$weight) # Hajek estimator of the mean weighted_mean(workplace$employment, workplace$weight)
Dalén's estimators of the population mean and the population total (bare-bone functions with limited functionality).
weighted_mean_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE) weighted_total_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE)
weighted_mean_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE) weighted_total_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE)
x |
|
w |
|
censoring |
|
type |
|
info |
|
na.rm |
|
verbose |
|
Let denote the expansion
estimator of the
-total (summation is over all elements
in sample
). The estimators Z2 and Z3 of Dalén (1987) are
defined as follows.
The estimator Z2 of the population total sums over
; hence, it
censors the products
to the
censoring constant
(
censoring
). The estimator of
the population -mean is is defined as the total divided
by the population size.
The estimator Z3 of the population total is defined as the sum
over the elements , which is equal to
if
and
otherwise.
The return value depends on info
:
info = FALSE
:estimate of mean or total [double]
info = TRUE
:a [list]
with items:
characteristic
[character]
,
estimator
[character]
,
estimate
[double]
,
variance
(default: NA
),
robust
[list]
,
residuals
[numeric vector]
,
model
[list]
,
design
(default: NA
),
[call]
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
Overview (of all implemented functions)
head(workplace) # Dalen's estimator of the total (with censoring threshold: 100000) weighted_total_dalen(workplace$employment, workplace$weight, 100000)
head(workplace) # Dalen's estimator of the total (with censoring threshold: 100000) weighted_total_dalen(workplace$employment, workplace$weight, 100000)
Weighted trimmed mean and total (bare-bone functions with limited
functionality; see svymean_trimmed
and
svytotal_trimmed
for more capable methods)
weighted_mean_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE)
weighted_mean_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE)
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
Population mean or total. Let denote
the estimated trimmed population mean; then, the estimated trimmed
population total is given by
with
, where
summation is over all observations in the sample.
The methods trims the LB
of the smallest observations and the
(1 -
UB
) of the largest observations
from the data.
See survey methods:
The return value depends on info
:
info = FALSE
:estimate of mean or total [double]
info = TRUE
:a [list]
with items:
characteristic
[character]
,
estimator
[character]
,
estimate
[double]
,
variance
(default: NA
),
robust
[list]
,
residuals
[numeric vector]
,
model
[list]
,
design
(default: NA
),
[call]
Overview (of all implemented functions)
svymean_trimmed
and svytotal_trimmed
head(workplace) # Estimated trimmed population total (5% symmetric trimming) weighted_total_trimmed(workplace$employment, workplace$weight, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) weighted_mean_trimmed(workplace$employment, workplace$weight, UB = 0.95)
head(workplace) # Estimated trimmed population total (5% symmetric trimming) weighted_total_trimmed(workplace$employment, workplace$weight, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) weighted_mean_trimmed(workplace$employment, workplace$weight, UB = 0.95)
Weighted winsorized mean and total (bare-bone functions with limited
functionality; see svymean_winsorized
and
svytotal_winsorized
for more capable methods)
weighted_mean_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_mean_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE) weighted_total_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)
weighted_mean_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_mean_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE) weighted_total_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
k |
|
Population mean or total. Let
denote the estimated winsorized population mean; then, the
estimated population total is given by
with
, where summation
is over all observations in the sample.
The amount of winsorization can be specified in relative or absolute terms:
Relative: By specifying LB
and UB
,
the methods winsorizes the LB
of the smallest observations and the
(1 -
UB
) of the largest
observations from the data.
Absolute: By specifying argument k
in the
functions with the "infix" _k_
in their name, the
largest observations are winsorized,
,
where
denotes the sample size. E.g.,
k = 2
implies that the largest and the second largest
observation are winsorized.
See survey methods:
The return value depends on info
:
info = FALSE
:estimate of mean or total [double]
info = TRUE
:a [list]
with items:
characteristic
[character]
,
estimator
[character]
,
estimate
[double]
,
variance
(default: NA
),
robust
[list]
,
residuals
[numeric vector]
,
model
[list]
,
design
(default: NA
),
[call]
Overview (of all implemented functions)
svymean_winsorized
, svymean_k_winsorized
,
svytotal_winsorized
and svytotal_k_winsorized
head(workplace) # Estimated winsorized population mean (5% symmetric winsorization) weighted_mean_winsorized(workplace$employment, workplace$weight, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) weighted_total_k_winsorized(workplace$employment, workplace$weight, k = 2)
head(workplace) # Estimated winsorized population mean (5% symmetric winsorization) weighted_mean_winsorized(workplace$employment, workplace$weight, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) weighted_total_k_winsorized(workplace$employment, workplace$weight, k = 2)
Weighted population median.
weighted_median(x, w, na.rm = FALSE)
weighted_median(x, w, na.rm = FALSE)
x |
|
w |
|
na.rm |
|
Weighted sample median; see weighted_quantile
for more
information.
Weighted estimate of the population median
Overview (of all implemented functions)
head(workplace) weighted_median(workplace$employment, workplace$weight)
head(workplace) weighted_median(workplace$employment, workplace$weight)
Robust simple linear regression based on medians: two methods are available:
"slopes"
and "product"
.
weighted_median_line(x, y = NULL, w, type = "slopes", na.rm = FALSE)
weighted_median_line(x, y = NULL, w, type = "slopes", na.rm = FALSE)
x |
|
y |
|
w |
|
type |
|
na.rm |
|
Robust simple linear regression based on medians
Two methods/ types are available. Let
denote the weighted median of variable
x
with weights
w
:
type = "slopes"
:The slope is computed as
type = "products"
:The slope is computed as
A vector with two components: intercept and slope
Overview (of all implemented functions)
line
, weighted_line
and
weighted_median_ratio
x <- c(1, 2, 4, 5) y <- c(3, 2, 7, 4) weighted_line(y ~ x, w = rep(1, length(x))) weighted_median_line(y ~ x, w = rep(1, length(x))) m <- weighted_median_line(y ~ x, w = rep(1, length(x)), type = "prod") m coef(m) fitted(m) residuals(m) # cars data head(cars) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)))) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)), type = "prod")) # weighted w <- c(rep(1,20), rep(2,20), rep(5, 10)) with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in y cars$dist[49] <- 360 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in x data(cars) cars$speed[49] <- 72 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
x <- c(1, 2, 4, 5) y <- c(3, 2, 7, 4) weighted_line(y ~ x, w = rep(1, length(x))) weighted_median_line(y ~ x, w = rep(1, length(x))) m <- weighted_median_line(y ~ x, w = rep(1, length(x)), type = "prod") m coef(m) fitted(m) residuals(m) # cars data head(cars) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)))) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)), type = "prod")) # weighted w <- c(rep(1,20), rep(2,20), rep(5, 10)) with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in y cars$dist[49] <- 360 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in x data(cars) cars$speed[49] <- 72 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
A weighted median of the ratios y/x determines the slope of a regression through the origin.
weighted_median_ratio(x, y = NULL, w, na.rm = FALSE)
weighted_median_ratio(x, y = NULL, w, na.rm = FALSE)
x |
|
y |
|
w |
|
na.rm |
|
A vector with two components: intercept and slope
Overview (of all implemented functions)
line
, weighted_line
and
weighted_median_line
x <- c(1,2,4,5) y <- c(1,0,5,2) m <- weighted_median_ratio(y ~ x, w = rep(1, length(y))) m coef(m) fitted(m) residuals(m)
x <- c(1,2,4,5) y <- c(1,0,5,2) m <- weighted_median_ratio(y ~ x, w = rep(1, length(y))) m coef(m) fitted(m) residuals(m)
Weighted population quantile.
weighted_quantile(x, w, probs, na.rm = FALSE)
weighted_quantile(x, w, probs, na.rm = FALSE)
x |
|
w |
|
probs |
|
na.rm |
|
weighted_quantile
computes the weighted
sample quantiles; argument probs
allows vector inputs.
The function is based on a weighted version of the quickselect/Find algorithm with the Bentley and McIlroy (1993) 3-way partitioning scheme. For very small arrays, we use insertion sort.
For equal weighting, i.e., when all elements in
w
are equal, weighted_quantile
is identical with
type = 2
of stats::quantile
; see also
Hyndman and Fan (1996).
Weighted estimate of the population quantiles
Bentley, J. L. and McIlroy, D. M. (1993). Engineering a Sort Function, Software - Practice and Experience 23, 1249–1265. doi:10.1002/spe.4380231105
Hyndman, R.J. and Fan, Y. (1996). Sample Quantiles in Statistical Packages, The American Statistician 50, 361–365. doi:10.1080/00031305.1996.10473566
Overview (of all implemented functions)
head(workplace) # Weighted 25% quantile (1st quartile) weighted_quantile(workplace$employment, workplace$weight, 0.25)
head(workplace) # Weighted 25% quantile (1st quartile) weighted_quantile(workplace$employment, workplace$weight, 0.25)
Weighted Huber and Tukey M-estimator of the mean and total
(bare-bone function with limited functionality; see
svymean_huber
, svymean_tukey
,
svytotal_huber
, and svytotal_tukey
for more
capable methods)
weighted_mean_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_mean_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...)
weighted_mean_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_mean_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...)
x |
|
w |
|
k |
|
type |
|
asym |
|
info |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g.,
|
Population mean or total. Let
denote the estimated population mean; then, the estimated
total is given by
with
, where
summation is over all observations in the sample.
Two methods/types are available for estimating the
location :
type = "rwm" (default)
:robust weighted
M-estimator of the population mean and total,
respectively. This estimator is recommended for sampling
designs whose inclusion probabilities are not
proportional to some measure of size. [Legacy note: In an
earlier version, the method type = "rwm"
was called
"rhj"
; the type "rhj"
is now silently
converted to "rwm"
]
type = "rht"
:robust Horvitz-Thompson M-estimator of the population mean and total, respectively. This estimator is recommended for proportional-to-size sampling designs.
See the related but more capable functions:
By default, the Huber
or Tukey
psi-function are used in the specification of the M-estimators. For
the Huber estimator, an asymmetric version of the Huber
psi-function can be used by setting the argument
asym = TRUE
in the function call.
The return value depends on info
:
info = FALSE
:estimate of mean or total [double]
info = TRUE
:a [list]
with items:
characteristic
[character]
,
estimator
[character]
,
estimate
[double]
,
variance
(default: NA
),
robust
[list]
,
residuals
[numeric vector]
,
model
[list]
,
design
(default: NA
),
[call]
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05
. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control
.
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(workplace) # Robust Horvitz-Thompson M-estimator of the population total weighted_total_huber(workplace$employment, workplace$weight, k = 9, type = "rht") # Robust weighted M-estimator of the population mean weighted_mean_huber(workplace$employment, workplace$weight, k = 12, type = "rwm")
head(workplace) # Robust Horvitz-Thompson M-estimator of the population total weighted_total_huber(workplace$employment, workplace$weight, k = 9, type = "rht") # Robust weighted M-estimator of the population mean weighted_mean_huber(workplace$employment, workplace$weight, k = 12, type = "rwm")
Weight functions associated with the Huber and the Tukey biweight psi-functions; and the weight function of Simpson et al. (1992) for GM-estimators.
huberWgt(x, k = 1.345) tukeyWgt(x, k = 4.685) simpsonWgt(x, a, b)
huberWgt(x, k = 1.345) tukeyWgt(x, k = 4.685) simpsonWgt(x, a, b)
x |
|
k |
|
a |
|
b |
|
The functions huberWgt
and tukeyWgt
return the weights
associated with the respective psi-function.
The function simpsonWgt
is used (in regression GM-estimators)
to downweight leverage observations (i.e., outliers in the model's design
space). Let denote the (robust) squared Mahalanobis
distance of the i-th observation. The Simpson et al. (1992) type of
weight is defined as
, where
a
and b
are tuning constants.
By default, a = 1
; this choice implies that the weights
are computed on the basis of the robust Mahalanobis distances.
Alternative: a = Inf
implies a weight of zero for all
observations whose (robust) squared Mahalanobis is larger than
b
.
The tuning constants b
is a threshold on the distances.
Numerical vector of weights
Simpson, D. G., Ruppert, D. and Carroll, R.J. (1992). On One-Step GM Estimates and Stability of Inferences in Linear Regression. Journal of the American Statistical Association 87, 439–450. doi:10.2307/2290275
Overview (of all implemented functions)
svyreg_huberM
, svyreg_huberGM
,
svyreg_tukeyM
and svyreg_tukeyGM
head(flour) # standardized distance from median (copper content in wholemeal flour) x <- flour$copper z <- abs(x - median(x)) / mad(x) # plot of weight functions vs. distance plot(z, huberWgt(z, k = 3), ylim = c(0, 1), xlab = "distance", ylab = "weight") points(z, tukeyWgt(z, k = 6), pch = 2, col = 2) points(z, simpsonWgt(z, a = Inf, b = 3), pch = 3, col = 4) legend("topright", c("huberWgt(k = 3)", "tukeyWgt(k = 6)", "simpsonWgt(a = Inf, b = 3)"), pch = 1:3, col = c(1, 2, 4))
head(flour) # standardized distance from median (copper content in wholemeal flour) x <- flour$copper z <- abs(x - median(x)) / mad(x) # plot of weight functions vs. distance plot(z, huberWgt(z, k = 3), ylim = c(0, 1), xlab = "distance", ylab = "weight") points(z, tukeyWgt(z, k = 6), pch = 2, col = 2) points(z, simpsonWgt(z, a = Inf, b = 3), pch = 3, col = 4) legend("topright", c("huberWgt(k = 3)", "tukeyWgt(k = 6)", "simpsonWgt(a = Inf, b = 3)"), pch = 1:3, col = c(1, 2, 4))
The workplace
data are from Fuller (2009, pp. 366–367).
data(workplace)
data(workplace)
A data.frame
with a sample of 142 workplaces on the following
variables
ID
identifier variable [integer]
.
weight
sampling weight [double]
.
employment
employment total [double]
.
payroll
payroll total (1000 dollars)[double]
.
strat
stratum identifier[integer]
.
fpc
finite population correction [integer]
.
The workplace
data represent a sample of workplaces in the
retail sector in a Canadian province. The data are not those
collected by Statistics Canada, but have been generated by Fuller
(2009, Example 3.1.1) to display similar characteristics to the
original 1999 Canadian Workplace and Employee Survey (WES).
The WES target population is defined as all workplaces operating in Canada with paid employees. The sampling frame is stratified by industry, geographic region, and size (size is defined using estimated employment). A sample of workplaces has been drawn independently in each stratum using simple random sample without replacement (the stratum-specific sample sizes are determined by Neyman allocation). Several strata containing very large workplaces were sampled exhaustively; see Patak et al (1998). The original sampling weights were adjusted for nonresponse.
The original weights of WES were about 2200 for the stratum of small workplaces, about 750 for medium-sized, and about 35 for large workspaces.
The data workplace
is from Table 6.3 in Fuller (2009, pp. 366–367).
Fuller, W. A. (2009). Sampling Statistics, Hoboken (NJ): John Wiley and Sons. doi:10.1002/9780470523551
Patak, Z., Hidiroglou, M. and Lavallée, P. (1998). The methodology of the Workplace and Employee Survey. Proceedings of the Survey Research Methods Section, American Statistical Association, 83–91.
head(workplace) library("survey") # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) }
head(workplace) library("survey") # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) }