library(BayesSampling)

Application of the BLE to the Ratio estimator

(From Section 3 of the “Gonçalves, Moura and Migon: Bayes linear estimation for finite population with emphasis on categorical data”)

In many practical situations, it is possible to have information about an auxiliary variate x_i (correlated with y_i) for all the population units, or at least for each unit in the sample, plus the population mean, X̄. In practice, x_i is often the value of y_i at some previous time when a complete census was taken. This approach is used in situations where the expected value and the variance of y_i is proportional to x_i, so in the BLE setup, we replace some hypotheses about the y’s with ones about the first two moments of the rate y_i/x_i. To the best of our knowledge, the new ratio estimator proposed below is a novel contribution in sampling survey theory.

The new ratio estimator is obtained as a particular case of model (2.4) and with the hypothesis of exchangeability, used in Bayes linear approach, applied to the rate y_i/x_i for all i = 1, ..., N as described below:

such that: σ² = v − c

Application

We can apply this with the BLE_Ratio() function, which receives the following parameters:

y_s - either a vector containing the observed values or just the value for the sample mean (σ and n parameters will be required in this case);
x_s - either a vector containing the values for the auxiliary variable of the elements in the sample or just the value for the sample mean;
x_s̄ - a vector containing the values for the auxiliary variable of the elements not in the sample;
m - prior mean for the ratio between Y and X. If NULL, $\bar{y_s}$/$\bar{x_s}$ will be used (non-informative prior);
v - prior variance of the ratio between Y and X ( > σ²). If NULL, it will tend to infinity (non-informative prior);
σ - prior estimate of variability (standard deviation) of the ratio within the population. If NULL, sample variance of the ratio will be used;
n - sample size. Necessary only if y_s and x_s represent sample means (will not be used otherwise).

Vague Prior Distribution

Letting v → ∞ and v → ∞, but keeping σ² fixed, that is, assuming prior ignorance, we recover the ratio type estimator, found in the design-based approach: T̂_ra = NX̄(ȳ_s/x̄_s).

This can be achieved using the BLE_SRS() function by omitting either the prior mean or the prior variance, that is:

m= NULL - the ratio between sample means will be used as prior mean
v= NULL - prior variance will tend to infinity

Examples

We will use the TeachingSampling’s BigCity dataset for this example (actually we have to take a sample of size 10000 from this dataset so that R can perform the calculations). Imagine that we want to estimate the mean or the total Expenditure of this population, using the Income as an auxiliary variable (suppose that we know its value for every individual, maybe from a census). After taking a simple random sample of 10 individuals, we want to estimate the expenditure/income ratio and the total expenditure, conjugating the sample information with an expert’s expectation (a priori mean for the ratio will be 0.85, that is, people from this city expend 85% of their income).

data(BigCity)
end <- dim(BigCity)[1]
s <- seq(from = 1, to = end, by = 1)

set.seed(5)
samp <- sample(s, size = 10000, replace = FALSE)
ordered_samp <- sort(samp)
BigCity_red <- BigCity[ordered_samp,]

Expend <- BigCity_red$Expenditure
Income <- BigCity_red$Income

sampl <- sample(seq(1,10000),size=10)
ys <- Expend[sampl]
xs <- Income[sampl]

The real ratio between expenditure and income will be the value we want to estimate. In this example we know its real value:

mean(Expend/Income)
#> [1] 0.807571

Our design-based estimator for the mean would be the ratio between sample means:

mean(ys)/mean(xs)
#> [1] 0.5978265

Applying the prior information about the ratio we can get a better estimate, especially in cases when only a small sample is available:

x_nots <- BigCity_red$Income[-sampl]

Estimator <- BLE_Ratio(ys, xs, x_nots, m = 0.85, v = 0.24, sigma = sqrt(0.23998))

Estimator$est.beta
#>        Beta
#> 1 0.7723287
Estimator$Vest.beta
#>             V1
#> 1 1.383985e-05
Estimator$est.mean[1:4,]
#> [1]  104.2644  230.4165  826.3917 1241.5184
Estimator$Vest.mean[1:5,1:5]
#>           V1         V2         V3         V4         V5
#> 1 32.6495313  0.5574125   1.999167   3.003421  0.5217451
#> 2  0.5574125 72.8274736   4.418010   6.637338  1.1530181
#> 3  1.9991667  4.4180104 272.623847  23.804893  4.1353134
#> 4  3.0034210  6.6373380  23.804893 421.530808  6.2126320
#> 5  0.5217451  1.1530181   4.135313   6.212632 68.0936545
Estimator$est.tot
#> [1] 4466282

Example from the help page

ys <- c(10,8,6)
xs <- c(5,4,3.1)
x_nots <- c(1,20,13,15,-5)
m <- 2.5
v <- 10
sigma <- 2

Estimator <- BLE_Ratio(ys, xs, x_nots, m, v, sigma)
Estimator
#> $est.beta
#>       Beta
#> 1 2.010444
#> 
#> $Vest.beta
#>          V1
#> 1 0.3133159
#> 
#> $est.mean
#>       y_nots
#> 1   2.010444
#> 2  40.208877
#> 3  26.135770
#> 4  30.156658
#> 5 -10.052219
#> 
#> $Vest.mean
#>          V1         V2         V3         V4        V5
#> 1  4.313316   6.266319   4.073107   4.699739  -1.56658
#> 2  6.266319 205.326371  81.462141  93.994778 -31.33159
#> 3  4.073107  81.462141 104.950392  61.096606 -20.36554
#> 4  4.699739  93.994778  61.096606 130.496084 -23.49869
#> 5 -1.566580 -31.331593 -20.365535 -23.498695 -12.16710
#> 
#> $est.tot
#> [1] 112.4595
#> 
#> $Vest.tot
#> [1] 782.5796

Example from the help page, but informing sample means and sample size instead of sample observations

ys <- mean(c(10,8,6))
xs <- mean(c(5,4,3.1))
n <- 3
x_nots <- c(1,20,13,15,-5)
m <- 2.5
v <- 10
sigma <- 2

Estimator <- BLE_Ratio(ys, xs, x_nots, m, v, sigma, n)
#> sample means informed instead of sample observations, parameters 'n' and 'sigma' will be necessary
Estimator
#> $est.beta
#>       Beta
#> 1 2.010444
#> 
#> $Vest.beta
#>          V1
#> 1 0.3133159
#> 
#> $est.mean
#>       y_nots
#> 1   2.010444
#> 2  40.208877
#> 3  26.135770
#> 4  30.156658
#> 5 -10.052219
#> 
#> $Vest.mean
#>          V1         V2         V3         V4        V5
#> 1  4.313316   6.266319   4.073107   4.699739  -1.56658
#> 2  6.266319 205.326371  81.462141  93.994778 -31.33159
#> 3  4.073107  81.462141 104.950392  61.096606 -20.36554
#> 4  4.699739  93.994778  61.096606 130.496084 -23.49869
#> 5 -1.566580 -31.331593 -20.365535 -23.498695 -12.16710
#> 
#> $est.tot
#> [1] 112.4595
#> 
#> $Vest.tot
#> [1] 782.5796

- Application of the BLE to the Ratio estimator