| Title: | Automatic Optimal Stratification for Survey Sampling |
|---|---|
| Description: | Provides tools for the automatic stratification of survey populations using clustering and optimization techniques. The package assists researchers and survey practitioners in constructing homogeneous strata to improve the efficiency and precision of survey estimates. Functions are provided for generating strata, evaluating stratification quality, summarizing stratified populations, and visualizing stratification results. These tools support the design and implementation of efficient survey sampling strategies. The package utilizes standard statistical methods from survey sampling, clustering, and optimization for automatic stratification. Methods are described in Cochran (1977, ISBN:9780471162405) and Lohr (2021, ISBN:9780367354556). |
| Authors: | Khalid Islam [aut, cre] |
| Maintainer: | Khalid Islam <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4 |
| Built: | 2026-06-30 21:30:10 UTC |
| Source: | https://github.com/cran/AutoStrataK |
Automatically partitions a population into homogeneous strata using clustering-based methods. The resulting strata can be used in survey sampling designs to improve the precision and efficiency of estimates.
autostrata(data, target, n_strata = 4)autostrata(data, target, n_strata = 4)
data |
A data frame containing the study variables. |
target |
A target variable used for stratification. |
n_strata |
An integer specifying the desired number of strata. |
A data frame containing the original data along with the assigned
stratum membership. The returned object has class "autostrata".
data <- data.frame( x = c(10, 15, 20, 25, 30, 35), y = c(5, 8, 12, 16, 20, 24) ) result <- autostrata( data = data, target = y, n_strata = 2 ) head(result)data <- data.frame( x = c(10, 15, 20, 25, 30, 35), y = c(5, 8, 12, 16, 20, 24) ) result <- autostrata( data = data, target = y, n_strata = 2 ) head(result)
Create Optimal Strata
clustering_strata(data, target, n_strata)clustering_strata(data, target, n_strata)
data |
Input data frame |
target |
Target variable name |
n_strata |
Number of strata |
Data frame with stratum assignments
Compares variance under simple random sampling (SRS) with within-stratum variance after stratification.
compare_sampling(data, target)compare_sampling(data, target)
data |
A stratified data frame containing a variable named
|
target |
Character string giving the target variable name. |
A numeric value representing relative efficiency. Values greater than 1 indicate improved efficiency due to stratification.
Computes measures of stratification quality, including the overall variance of the target variable, the total within-stratum variance, and a homogeneity index.
evaluate_strata(data, target)evaluate_strata(data, target)
data |
A stratified data frame containing a column named
|
target |
Character string specifying the target variable used for stratification. |
A list containing:
Variance of the target variable in the full population.
Sum of within-stratum variances.
Homogeneity index, where larger values indicate more homogeneous strata.
data <- data.frame( income = c(15, 18, 20, 25, 30, 35, 40, 50), stratum = c(1, 1, 2, 2, 3, 3, 4, 4) ) evaluate_strata( data = data, target = "income" )data <- data.frame( income = c(15, 18, 20, 25, 30, 35, 40, 50), stratum = c(1, 1, 2, 2, 3, 3, 4, 4) ) evaluate_strata( data = data, target = "income" )
Neyman Allocation
optimal_allocation(data, target, total_sample)optimal_allocation(data, target, total_sample)
data |
Stratified data frame |
target |
Target variable name |
total_sample |
Total sample size |
Allocation table
Plot Strata
## S3 method for class 'autostrata' plot(x, ...)## S3 method for class 'autostrata' plot(x, ...)
x |
Object of class autostrata |
... |
Additional arguments |
A ggplot object
Summary Method
## S3 method for class 'autostrata' summary(object, ...)## S3 method for class 'autostrata' summary(object, ...)
object |
Object of class autostrata |
... |
Additional arguments |
Printed summary