Title: | Small Area Estimation |
---|---|
Description: | Functions for small area estimation. |
Authors: | Isabel Molina, Yolanda Marhuenda |
Maintainer: | Yolanda Marhuenda <[email protected]> |
License: | GPL-2 |
Version: | 1.3 |
Built: | 2024-10-29 06:32:31 UTC |
Source: | CRAN |
This package provides a variety of functions for small area estimation, including functions for mean squared error estimation. Basic estimators include direct, poststratified synthetic and sample size dependent. Model-based estimators include the EBLUP based on a Fay-Herriot model and the EBLUP based on a unit level nested error model. Estimators obtained from spatial and spatio-temporal Fay-Herriot models and the EB method based on the unit level nested error model for estimation of general non linear parameters are also included.
This package provides functions for estimation in domains with small sample sizes. For a complete list of functions, see library(help=sae).
Package: | sae |
Type: | Package |
Version: | 1.2 |
Date: | 2018-05-01 |
License: | GPL-2 |
Depends: | stats, lmer |
Isabel Molina <[email protected]> and Yolanda Marhuenda <[email protected]>
- Arora, V. and Lahiri, P. (1997). On the superiority of the Bayesian method over the BLUP in small area estimation problems. Statistica Sinica 7, 1053-1063.
- Battesse, G.E., Harter, R.M. and Fuller, W.A. (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association 83, 28-36.
- Box, G.E.P. and Cox, D.R. (1964). An analysis of transformations. Journal of Royal Statistical Society Series B 26, 211-246.
- Cochran, W.G. (1977). Sampling techniques. Wiley, New York.
- Datta, G.S. and Lahiri, P. (2000). A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica 10, 613-627.
- Datta, G.S., Rao, J.N.K. and Smith D.D. (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika 92, 183-196.
- Drew, D., Singh, M.P. and Choudhry, G.H. (1982). Evaluation of small area estimation techniques for the Canadian Labour Force Survey. Survey Methodology 8, 17-47.
- Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association 74, 269-277.
- Gonzalez-Manteiga, W., Lombardia, M., Molina, I., Morales, D. and Santamaria, L. (2008). Analytic and bootstrap approximations of prediction errors under a multivariate Fay-Herriot model. Computational Statistics and Data Analysis 52, 5242-5252.
- Jiang, J. (1996). REML estimation: asymptotic behavior and related topics. Annals of Statistics 24, 255-286.
- Marhuenda, Y., Molina, I. and Morales, D. (2013). Small area estimation with spatio-temporal Fay-Herriot models. Computational Statistics and Data Analysis 58, 308-325.
- Marhuenda, Y., Morales, D. and Pardo, M.C. (2014). Information criteria for Fay-Herriot model selection. Computational Statistics and Data Analysis 70, 268-280.
- Molina, I., Salvati, N. and Pratesi, M. (2009). Bootstrap for estimating the MSE of the Spatial EBLUP. Computational Statistics 24, 441-458.
- Molina, I. and Rao, J.N.K. (2010). Small Area Estimation of Poverty Indicators. The Canadian Journal of Statistics 38, 369-385.
- Petrucci, A. and Salvati, N. (2006). Small area estimation for spatial correlation in watershed erosion assessment. Journal of Agricultural, Biological and Environmental Statistics 11, 169-182.
- Prasad, N. and Rao, J. (1990). The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association 85, 163-171.
- Pratesi, M. and Salvati, N. (2008). Small area estimation: the EBLUP estimator based on spatially correlated random area effects. Statistical Methods & Applications 17, 113-141.
- Rao, J.N.K. (2003). Small Area Estimation. Wiley, London.
- Sarndal, C.E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.
- Singh, B., Shukla, G. and Kundu, D. (2005). Spatio-temporal models in small area estimation. Survey Methodology 31, 183-195.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- You, Y. and Chapman, B. (2006). Small area estimation using area level models and estimated sampling variances. Survey Methodology 32, 97-103.
Box-Cox or power transformation or its inverse. For lambda!=0, the Box-Cox transformation of x is (x^lambda-1)/lambda, whereas the regular power transformation is simply x^lambda. When lambda=0, it is log in both cases. The inverse of the Box-Cox and the power transform can also be obtained.
bxcx(x, lambda, InverseQ = FALSE, type = "BoxCox")
bxcx(x, lambda, InverseQ = FALSE, type = "BoxCox")
x |
a vector or time series |
lambda |
power transformation parameter |
InverseQ |
if TRUE, the inverse transformation is done |
type |
either "BoxCox" or "power" |
A vector or time series of the transformed data
A.I. McLeod. R package FitAR
- Box, G.E.P. and Cox, D.R. (1964). An analysis of transformations. Journal of Royal Statistical Society Series B 26, 211-246.
#lambda=0.5 z<-AirPassengers; lambda<-0.5 y<-bxcx(z, lambda) z2<-bxcx(y, lambda, InverseQ=TRUE) sum(abs(z2-z)) # z<-AirPassengers; lambda<-0.0 y<-bxcx(z, lambda) z2<-bxcx(y, lambda, InverseQ=TRUE) sum(abs(z2-z))
#lambda=0.5 z<-AirPassengers; lambda<-0.5 y<-bxcx(z, lambda) z2<-bxcx(y, lambda, InverseQ=TRUE) sum(abs(z2-z)) # z<-AirPassengers; lambda<-0.0 y<-bxcx(z, lambda) z2<-bxcx(y, lambda, InverseQ=TRUE) sum(abs(z2-z))
Survey and satellite data for corn and soy beans in 12 Iowa counties, obtained from the 1978 June Enumerative Survey of the U.S. Department of Agriculture and from land observatory satellites (LANDSAT) during the 1978 growing season.
data(cornsoybean)
data(cornsoybean)
A data frame with 37 observations on the following 5 variables.
County
:numeric county code.
CornHec
:reported hectares of corn from the survey.
SoyBeansHec
:reported hectares of soy beans from the survey.
CornPix
:number of pixels of corn in sample segment within county, from satellite data.
SoyBeansPix
:number of pixels of soy beans in sample segment within county, from satellite data.
- Battesse, G.E., Harter, R.M. and Fuller, W.A. (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association 83, 28-36.
County means of number of pixels per segment of corn and soy beans, from satellite data, for 12 counties in Iowa. Population size, sample size and means of auxiliary variables in data set cornsoybean
.
data(cornsoybeanmeans)
data(cornsoybeanmeans)
A data frame with 12 observations on the following 6 variables.
CountyIndex
:numeric county code.
CountyName
:name of the county.
SampSegments
:number of sample segments in the county (sample size).
PopnSegments
:number of population segments in the county (population size).
MeanCornPixPerSeg
:mean number of corn pixels per segment in the county.
MeanSoyBeansPixPerSeg
:mean number of soy beans pixels per segment in the county.
- Battesse, G.E., Harter, R.M. and Fuller, W.A. (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical Association 83, 28-36.
Using a n*m
matrix A
, this function constructs a block-diagonal matrix with dimension (n*ntimes) *
(m*ntimes)
, with all blocks equal to matrix A
and the rest of entries equal to 0
.
diagonalizematrix(A, ntimes)
diagonalizematrix(A, ntimes)
A |
|
ntimes |
number of times. |
X <- matrix(data=c(1,2,3,4,5,6), nrow=3, ncol=2) diagonalizematrix(X,3)
X <- matrix(data=c(1,2,3,4,5,6), nrow=3, ncol=2) diagonalizematrix(X,3)
This function calculates direct estimators of domain means.
direct(y, dom, sweight, domsize, data, replace = FALSE)
direct(y, dom, sweight, domsize, data, replace = FALSE)
y |
vector specifying the individual values of the variable for which we want to estimate the domain means. |
dom |
vector or factor (same size as |
sweight |
optional vector (same size as |
domsize |
|
data |
optional data frame containing the variables named in |
replace |
logical variable with default value |
The function returns a data frame of size D*5
with the following columns:
Domain |
domain codes in ascending order. |
SampSize |
domain sample sizes. |
Direct |
direct estimators of domain means of variable |
SD |
estimated standard deviations of domain direct estimators. If sampling design is SRS or Poisson sampling, estimated variances are unbiased. Otherwise, estimated variances are obtained under the approximation that second order inclusion probabilities are the product of first order inclusion probabilities. |
CV |
absolute value of percent coefficients of variation of domain direct estimators. |
Cases with NA values in y
, dom
or sweight
are ignored.
- Cochran, W.G. (1977). Sampling techniques. Wiley, New York.
- Rao, J.N.K. (2003). Small Area Estimation. Wiley, London.
- Sarndal, C.E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.
pssynt
for post-stratified synthetic estimator, ssd
for sample size dependent estimator.
In case that the sampling design is known, see packages survey
or sampling
for more exact variance estimation.
# Load data set with synthetic income data for provinces (domains) data(incomedata) # Load population sizes of provinces data(sizeprov) # Compute Horvitz-Thompson direct estimator of mean income for each # province under random sampling without replacement within each province. result1 <- direct(y=income, dom=prov, sweight=weight, domsize=sizeprov[,2:3], data=incomedata) result1 # The same but using province labels as domain codes result2 <- direct(y=incomedata$income, dom=incomedata$provlab, sweight=incomedata$weight, domsize=sizeprov[,c(1,3)]) result2 # The same, under SRS without replacement within each province. result3 <- direct(y=income ,dom=provlab, domsize=sizeprov[,c(1,3)], data=incomedata) result3 # Compute direct estimator of mean income for each province # under SRS with replacement within each province result4 <- direct(y=income, dom=provlab, data=incomedata, replace=TRUE) result4
# Load data set with synthetic income data for provinces (domains) data(incomedata) # Load population sizes of provinces data(sizeprov) # Compute Horvitz-Thompson direct estimator of mean income for each # province under random sampling without replacement within each province. result1 <- direct(y=income, dom=prov, sweight=weight, domsize=sizeprov[,2:3], data=incomedata) result1 # The same but using province labels as domain codes result2 <- direct(y=incomedata$income, dom=incomedata$provlab, sweight=incomedata$weight, domsize=sizeprov[,c(1,3)]) result2 # The same, under SRS without replacement within each province. result3 <- direct(y=income ,dom=provlab, domsize=sizeprov[,c(1,3)], data=incomedata) result3 # Compute direct estimator of mean income for each province # under SRS with replacement within each province result4 <- direct(y=income, dom=provlab, data=incomedata, replace=TRUE) result4
Fits by REML method the unit level model of Battese, Harter and Fuller (1988) to a transformation of the specified dependent variable by a Box-Cox family or power family and obtains Monte Carlo approximations of EB estimators of the specified small area indicators, when the values of auxiliary variables for out-of-sample units are available.
ebBHF(formula, dom, selectdom, Xnonsample, MC = 100, data, transform = "BoxCox", lambda = 0, constant = 0, indicator)
ebBHF(formula, dom, selectdom, Xnonsample, MC = 100, data, transform = "BoxCox", lambda = 0, constant = 0, indicator)
formula |
an object of class |
dom |
|
selectdom |
|
Xnonsample |
matrix or data frame containing in the first column the domain codes and in the rest of columns the values of each of |
MC |
number of Monte Carlo replicates for the empirical approximation of the EB estimator. Default value is |
data |
optional data frame containing the variables named in |
transform |
type of transformation for the dependent variable to be chosen between the |
lambda |
value for the parameter of the family of transformations specified in |
constant |
constant added to the dependent variable before doing the transformation, to achieve a distribution close to Normal. Default value is |
indicator |
function of the (untransformed) variable on the left hand side of |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
eb |
data frame with number of rows equal to number of selected domains,
containing in its columns the domain codes ( |
fit |
a list containing the following objects:
|
Cases with NA values in formula
or dom
are ignored.
- Molina, I. and Rao, J.N.K. (2010). Small Area Estimation of Poverty Indicators. The Canadian Journal of Statistics 38, 369-385.
data(incomedata) # Load data set attach(incomedata) # Construct design matrix for sample elements Xs <- cbind(age2, age3, age4, age5, nat1, educ1, educ3, labor1, labor2) # Select the domains to compute EB estimators. data(Xoutsamp) domains <- unique(Xoutsamp[,"domain"]) # Poverty gap indicator povertyline <- 0.6*median(income) povertyline # 6477.484 povgap <- function(y) { z <- 6477.484 result <- mean((y<z) * (z-y) / z) return (result) } # Compute EB predictors of poverty gap. The value constant=3600 is selected # to achieve approximately symmetric residuals. set.seed(123) result <- ebBHF(income ~ Xs, dom=prov, selectdom=domains, Xnonsample=Xoutsamp, MC=10, constant=3600, indicator=povgap) result$eb result$fit$summary result$fit$fixed result$fit$random[,1] result$fit$errorvar result$fit$refvar result$fit$loglike result$fit$residuals[1:10] detach(incomedata)
data(incomedata) # Load data set attach(incomedata) # Construct design matrix for sample elements Xs <- cbind(age2, age3, age4, age5, nat1, educ1, educ3, labor1, labor2) # Select the domains to compute EB estimators. data(Xoutsamp) domains <- unique(Xoutsamp[,"domain"]) # Poverty gap indicator povertyline <- 0.6*median(income) povertyline # 6477.484 povgap <- function(y) { z <- 6477.484 result <- mean((y<z) * (z-y) / z) return (result) } # Compute EB predictors of poverty gap. The value constant=3600 is selected # to achieve approximately symmetric residuals. set.seed(123) result <- ebBHF(income ~ Xs, dom=prov, selectdom=domains, Xnonsample=Xoutsamp, MC=10, constant=3600, indicator=povgap) result$eb result$fit$summary result$fit$fixed result$fit$random[,1] result$fit$errorvar result$fit$refvar result$fit$loglike result$fit$residuals[1:10] detach(incomedata)
This function calculates, for selected domains, EBLUPs of domain means based on the nested error linear regression model of Battese, Harter and Fuller (1988).
eblupBHF(formula, dom, selectdom, meanxpop, popnsize, method = "REML", data)
eblupBHF(formula, dom, selectdom, meanxpop, popnsize, method = "REML", data)
formula |
an object of class |
dom |
|
selectdom |
|
meanxpop |
|
popnsize |
|
method |
a character string. If |
data |
optional data frame containing the variables named in |
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
eblup |
data frame with number of rows equal to number of selected domains ( |
fit |
a list containing the following objects:
|
Cases with NA values in formula
or dom
are ignored.
- Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data, Journal of the American Statistical Association 83, 28-36.
- Rao, J.N.K. (2003). Small Area Estimation. New York: John Wiley and Sons.
# Load data set for segments (units within domains) data(cornsoybean) # Load data set for counties data(cornsoybeanmeans) attach(cornsoybeanmeans) # Construct data frame with county means of auxiliary variables for # domains. First column must include the county code Xmean <- data.frame(CountyIndex, MeanCornPixPerSeg, MeanSoyBeansPixPerSeg) Popn <- data.frame(CountyIndex, PopnSegments) # Compute EBLUPs of county means of corn crop areas for all counties resultCorn <- eblupBHF(CornHec ~ CornPix + SoyBeansPix, dom=County, meanxpop=Xmean, popnsize=Popn, data=cornsoybean) resultCorn$eblup # Compute EBLUPs of county means of soy beans crop areas for # a subset of counties using ML method domains <- c(10,1,5) resultBean <- eblupBHF(SoyBeansHec ~ CornPix + SoyBeansPix, dom=County, selectdom=domains, meanxpop=Xmean, popnsize=Popn, method="ML", data=cornsoybean) resultBean$eblup resultBean$fit detach(cornsoybeanmeans)
# Load data set for segments (units within domains) data(cornsoybean) # Load data set for counties data(cornsoybeanmeans) attach(cornsoybeanmeans) # Construct data frame with county means of auxiliary variables for # domains. First column must include the county code Xmean <- data.frame(CountyIndex, MeanCornPixPerSeg, MeanSoyBeansPixPerSeg) Popn <- data.frame(CountyIndex, PopnSegments) # Compute EBLUPs of county means of corn crop areas for all counties resultCorn <- eblupBHF(CornHec ~ CornPix + SoyBeansPix, dom=County, meanxpop=Xmean, popnsize=Popn, data=cornsoybean) resultCorn$eblup # Compute EBLUPs of county means of soy beans crop areas for # a subset of counties using ML method domains <- c(10,1,5) resultBean <- eblupBHF(SoyBeansHec ~ CornPix + SoyBeansPix, dom=County, selectdom=domains, meanxpop=Xmean, popnsize=Popn, method="ML", data=cornsoybean) resultBean$eblup resultBean$fit detach(cornsoybeanmeans)
This function gives the EBLUP (or EB predictor under normality) based on a Fay-Herriot model. Fitting method can be chosen between ML, REML and FH methods.
eblupFH(formula, vardir, method = "REML", MAXITER = 100, PRECISION = 0.0001, B = 0, data)
eblupFH(formula, vardir, method = "REML", MAXITER = 100, PRECISION = 0.0001, B = 0, data)
formula |
an object of class |
vardir |
vector containing the |
method |
type of fitting method, to be chosen between |
MAXITER |
maximum number of iterations allowed in the Fisher-scoring algorithm. Default is 100 iterations. |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
B |
number of bootstrap replicates to calculate the goodness-of-fit measures proposed by Marhuenda et al. (2014). Default value is |
data |
optional data frame containing the variables named in |
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
eblup |
vector with the values of the estimators for the domains. |
fit |
a list containing the following objects:
|
In case that formula
or vardir
contain NA values a message is printed and no action is done.
- Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association 74, 269-277.
- Marhuenda, Y., Morales, D. and Pardo, M.C. (2014). Information criteria for Fay-Herriot model selection. Computational Statistics and Data Analysis 70, 268-280.
- Rao, J.N.K. (2003). Small Area Estimation. Wiley, London.
# Load data set data(milk) attach(milk) # Fit FH model using REML method with indicators of 4 Major Areas as # explanatory variables. resultREML <- eblupFH(yi ~ as.factor(MajorArea), SD^2) resultREML #Fit FH model using FH method resultFH <- eblupFH(yi ~ as.factor(MajorArea), SD^2, method="FH") resultFH detach(milk)
# Load data set data(milk) attach(milk) # Fit FH model using REML method with indicators of 4 Major Areas as # explanatory variables. resultREML <- eblupFH(yi ~ as.factor(MajorArea), SD^2) resultREML #Fit FH model using FH method resultFH <- eblupFH(yi ~ as.factor(MajorArea), SD^2, method="FH") resultFH detach(milk)
This function gives small area estimators based on a spatial Fay-Herriot model, where area effects follow a SAR(1) process. Fitting method can be chosen between REML and ML.
eblupSFH(formula, vardir, proxmat, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
eblupSFH(formula, vardir, proxmat, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
formula |
an object of class |
vardir |
vector containing the |
proxmat |
|
method |
type of fitting method, to be chosen between |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
data |
optional data frame containing the variables named in |
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
eblup |
vector with the values of the estimators for the domains. |
fit |
a list containing the following objects:
|
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Isabel Molina, Monica Pratesi and Nicola Salvati.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Molina, I., Salvati, N. and Pratesi, M. (2009). Bootstrap for estimating the MSE of the Spatial EBLUP. Computational Statistics 24, 441-458.
- Petrucci, A. and Salvati, N. (2006). Small area estimation for spatial correlation in watershed erosion assessment. Journal of Agricultural, Biological and Environmental Statistics 11, 169-182.
- Pratesi, M. and Salvati, N. (2008). Small area estimation: the EBLUP estimator based on spatially correlated random area effects. Statistical Methods & Applications 17, 113-141.
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Fit Spatial Fay-Herriot model using ML method resultML <- eblupSFH(grapehect ~ area + workdays - 1, var, grapesprox, method="ML", data=grapes) resultML # Fit Spatial Fay-Herriot model using REML method resultREML <- eblupSFH(grapehect ~ area + workdays - 1, var, grapesprox, data=grapes) resultREML
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Fit Spatial Fay-Herriot model using ML method resultML <- eblupSFH(grapehect ~ area + workdays - 1, var, grapesprox, method="ML", data=grapes) resultML # Fit Spatial Fay-Herriot model using REML method resultREML <- eblupSFH(grapehect ~ area + workdays - 1, var, grapesprox, data=grapes) resultREML
Fits a spatio-temporal Fay-Herriot model with area effects following a SAR(1) process and with either uncorrelated or AR(1) time effects.
eblupSTFH(formula, D, T, vardir, proxmat, model = "ST", MAXITER = 100, PRECISION = 0.0001, theta_iter = FALSE, sigma21_start = 0.5 * median(vardir), rho1_start = 0.5, sigma22_start = 0.5 * median(vardir), rho2_start = 0.5, data)
eblupSTFH(formula, D, T, vardir, proxmat, model = "ST", MAXITER = 100, PRECISION = 0.0001, theta_iter = FALSE, sigma21_start = 0.5 * median(vardir), rho1_start = 0.5, sigma22_start = 0.5 * median(vardir), rho2_start = 0.5, data)
formula |
an object of class |
D |
total number of domains. |
T |
total number of time instants (constant for all domains). |
vardir |
vector containing the |
proxmat |
|
model |
type of model to be chosen between |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
theta_iter |
If |
sigma21_start |
Starting value of the area effects variance in the fitting algorithm. Default value is |
rho1_start |
Starting value of the area effects spatial autocorrelation parameter in the fitting algorithm. Default value is |
sigma22_start |
Starting value of the area-time effects variance in the fitting algorithm. Default value is |
rho2_start |
Starting value of the time autocorrelation parameter of the area-time effects in the fitting algorithm. Default value is |
data |
optional data frame containing the variables named in |
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
eblup |
a column vector with length |
fit |
a list containing the following objects:
|
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Yolanda Marhuenda, Isabel Molina and Domingo Morales.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Marhuenda, Y., Molina, I. and Morales, D. (2013). Small area estimation with spatio-temporal Fay-Herriot models. Computational Statistics and Data Analysis 58, 308-325.
data(spacetime) # Load data set data(spacetimeprox) # Load proximity matrix D <- nrow(spacetimeprox) # number of domains T <- length(unique(spacetime$Time)) # number of time instant # Fit model S with uncorrelated time effects for each domain resultS <- eblupSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, "S", theta_iter=TRUE, data=spacetime) rowsT <- seq(T, T*D, by=T) data.frame(Domain=spacetime$Area[rowsT], EBLUP_S=resultS$eblup[rowsT]) resultS$fit # Fit model ST with AR(1) time effects for each domain resultST <- eblupSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, theta_iter=TRUE, data=spacetime) data.frame(Domain=spacetime$Area[rowsT], EBLUP_ST=resultS$eblup[rowsT]) resultST$fit
data(spacetime) # Load data set data(spacetimeprox) # Load proximity matrix D <- nrow(spacetimeprox) # number of domains T <- length(unique(spacetime$Time)) # number of time instant # Fit model S with uncorrelated time effects for each domain resultS <- eblupSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, "S", theta_iter=TRUE, data=spacetime) rowsT <- seq(T, T*D, by=T) data.frame(Domain=spacetime$Area[rowsT], EBLUP_S=resultS$eblup[rowsT]) resultS$fit # Fit model ST with AR(1) time effects for each domain resultST <- eblupSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, theta_iter=TRUE, data=spacetime) data.frame(Domain=spacetime$Area[rowsT], EBLUP_ST=resultS$eblup[rowsT]) resultST$fit
Synthetic data on grape production with spatial correlation for 274 municipalities in the region of Tuscany.
data(grapes)
data(grapes)
A data frame with 274 observations on the following 4 variables.
grapehect
:direct estimators of the mean agrarian surface area used for production of grape (in hectares) for each Tuscany municipality.
area
:agrarian surface area used for production (in hectares).
workdays
:average number of working days in the reference year (2000).
var
:sampling variance of the direct estimators for each Tuscany municipality.
A data frame containing the proximity values for the 274 municipalities in the region of Tuscany included in data set grapes
.
data(grapesprox)
data(grapesprox)
The values are numbers in the interval [0,1]
containing the proximity of the row and column domains. The sum of the values of each row is equal to 1.
Synthetic data on income and other related variables for Spanish provinces.
data(incomedata)
data(incomedata)
A data frame with 17199 observations on the following 21 variables.
provlab
:province name.
prov
:province code.
ac
:region of the province.
gen
:gender: 1:male, 2:female.
age
:age group: 0:<=13, 1:14-15, 2:16-24, 3:25-49, 4:50-64, 5: >=65.
nat
:nationality: 1:Spanish, 2:other.
educ
:education level: 0:age<16, 1:primary education (compulsory educ.), 2:secondary education, 3:post-secondary education.
labor
:labor force status: 0:age<16, 1:employed, 2:unemployed, 3:inactive.
age2
:indicator of age group 16-24.
age3
:indicator of age group 25-49.
age4
:indicator of age group 50-64.
age5
:indicator of age group >=65.
educ1
:indicator of education level 1 (primary education).
educ2
:indicator of education level 2 (secondary education.
educ3
:indicator of education level 3 (post-secondary education).
nat1
:indicator of Spanish nationality.
labor1
:indicator of being employed.
labor2
:indicator of being unemployed.
labor3
:indicator of being inactive.
income
:normalized income.
weight
:sampling weight.
Data on fresh milk expenditure, used by Arora and Lahiri (1997) and by You and Chapman (2006).
data(milk)
data(milk)
A data frame with 43 observations on the following 6 variables.
SmallArea
:areas of inferential interest.
ni
:sample sizes of small areas.
yi
:average expenditure on fresh milk for the year 1989 (direct estimates for the small areas).
SD
:estimated standard deviations of yi
.
CV
:estimated coefficients of variation of yi
.
MajorArea
:major areas created by You and Chapman (2006). These areas have similar direct estimates and produce a large CV reduction when using a FH model.
- Arora, V. and Lahiri, P. (1997). On the superiority of the Bayesian method over the BLUP in small area estimation problems. Statistica Sinica 7, 1053-1063.
- You, Y. and Chapman, B. (2006). Small area estimation using area level models and estimated sampling variances. Survey Methodology 32, 97-103.
Calculates the mean squared error estimator of the EBLUP under a Fay-Herriot model. The EBLUP might have been obtained by either ML, REML or by FH fitting methods.
mseFH(formula, vardir, method = "REML", MAXITER = 100, PRECISION = 0.0001, B = 0, data)
mseFH(formula, vardir, method = "REML", MAXITER = 100, PRECISION = 0.0001, B = 0, data)
formula |
an object of class |
vardir |
vector containing the |
method |
method used to fit the Fay-Herriot model, which can be either |
MAXITER |
maximum number of iterations allowed in the Fisher-scoring algorithm. Default is 100 iterations. |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
B |
number of bootstrap replicates to calculate the goodness-of-fit measures proposed by Marhuenda et al. (2014). Default value is |
data |
optional data frame containing the variables named in |
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
a vector with the estimated mean squared errors of the EBLUPs for the small domains. |
In case that formula
or vardir
contain NA values a message is printed and no action is done.
- Datta, G.S. and Lahiri, P. (2000). A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica 10, 613-627.
- Datta, G.S., Rao, J.N.K. and Smith D.D. (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika 92, 183-196.
- Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association 74, 269-277.
- Jiang, J. (1996). REML estimation: asymptotic behavior and related topics. Annals of Statistics 24, 255-286.
- Marhuenda, Y., Morales, D. and Pardo, M.C. (2014). Information criteria for Fay-Herriot model selection. Computational Statistics and Data Analysis 70, 268-280.
- Prasad, N. and Rao, J. (1990). The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association 85, 163-171.
# Load data set data(milk) attach(milk) # Fit Fay-Herriot model using ML method with indicators # of 4 Major Areas as explanatory variables and compute # estimated MSEs of EB estimators resultML <- mseFH(yi ~ as.factor(MajorArea), SD^2, method="ML") resultML # Fit Fay-Herriot model using REML method and compute # estimated MSEs of EB estimators resultREML <- mseFH(yi ~ as.factor(MajorArea), SD^2) resultREML # Fit Fay-Herriot model using FH method and compute # estimated MSEs of EB estimators resultFH <- mseFH(yi ~ as.factor(MajorArea), SD^2, method="FH") resultFH detach(milk)
# Load data set data(milk) attach(milk) # Fit Fay-Herriot model using ML method with indicators # of 4 Major Areas as explanatory variables and compute # estimated MSEs of EB estimators resultML <- mseFH(yi ~ as.factor(MajorArea), SD^2, method="ML") resultML # Fit Fay-Herriot model using REML method and compute # estimated MSEs of EB estimators resultREML <- mseFH(yi ~ as.factor(MajorArea), SD^2) resultREML # Fit Fay-Herriot model using FH method and compute # estimated MSEs of EB estimators resultFH <- mseFH(yi ~ as.factor(MajorArea), SD^2, method="FH") resultFH detach(milk)
Calculates analytical mean squared error estimates of the spatial EBLUPs obtained from the fit of a spatial Fay-Herriot model, in which area effects follow a Simultaneously Autorregressive (SAR) process.
mseSFH(formula, vardir, proxmat, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
mseSFH(formula, vardir, proxmat, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
formula |
an object of class |
vardir |
vector containing the |
proxmat |
|
method |
type of fitting method, to be chosen between |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
data |
optional data frame containing the variables named in |
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
a vector with the analytical mean squared error estimates of the spatial EBLUPs. |
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Isabel Molina, Monica Pratesi and Nicola Salvati.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Molina, I., Salvati, N. and Pratesi, M. (2009). Bootstrap for estimating the MSE of the Spatial EBLUP. Computational Statistics 24, 441-458.
- Singh, B., Shukla, G. and Kundu, D. (2005). Spatio-temporal models in small area estimation. Survey Methodology 31, 183-195.
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Calculate analytical MSE estimates using REML method result <- mseSFH(grapehect ~ area + workdays - 1, var, grapesprox, data=grapes) result
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Calculate analytical MSE estimates using REML method result <- mseSFH(grapehect ~ area + workdays - 1, var, grapesprox, data=grapes) result
Calculates nonparametric bootstrap mean squared error estimates of the spatial EBLUPs obtained by fitting a spatial Fay-Herriot model, in which area effects follow a Simultaneously Autorregressive (SAR) process.
npbmseSFH(formula, vardir, proxmat, B = 100, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
npbmseSFH(formula, vardir, proxmat, B = 100, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
formula |
an object of class |
vardir |
vector containing the |
proxmat |
|
B |
number of bootstrap replicates. Default value is |
method |
type of fitting method. Currently only |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
data |
optional data frame containing the variables named in |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
data frame containing the naive nonparametric bootstrap mean squared error estimates of the spatial EBLUPs ( |
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Isabel Molina, Monica Pratesi and Nicola Salvati.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Molina, I., Salvati, N. and Pratesi, M. (2009). Bootstrap for estimating the MSE of the Spatial EBLUP. Computational Statistics 24, 441-458.
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Obtain the naive and bias-corrected non parametric bootstrap MSE # estimates using REML set.seed(123) result <- npbmseSFH(grapehect ~ area + workdays - 1, var, grapesprox, B=2, data=grapes) result
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Obtain the naive and bias-corrected non parametric bootstrap MSE # estimates using REML set.seed(123) result <- npbmseSFH(grapehect ~ area + workdays - 1, var, grapesprox, B=2, data=grapes) result
Calculates, for selected domains, parametric bootstrap mean squared error estimators of the EBLUPs of means, when EBLUPs are obtained from a nested error linear regression model.
pbmseBHF(formula, dom, selectdom, meanxpop, popnsize, B = 200, method = "REML", data)
pbmseBHF(formula, dom, selectdom, meanxpop, popnsize, B = 200, method = "REML", data)
formula |
an object of class |
dom |
|
selectdom |
|
meanxpop |
|
popnsize |
|
B |
number of bootstrap replicates. Default is |
method |
a character string. If |
data |
optional data frame containing the variables named in |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
data frame with number of rows equal to number of selected domains, containing in its columns the domain codes ( |
Cases with NA values in formula
or dom
are ignored.
- Gonzalez-Manteiga, W., Lombardia, M., Molina, I., Morales, D. and Santamaria, L. (2008). Analytic and bootstrap approximations of prediction errors under a multivariate Fay-Herriot model. Computational Statistics and Data Analysis 52, 5242-5252.
- Molina, I. and Rao, J.N.K. (2010). Small Area Estimation of Poverty Indicators. The Canadian Journal of Statistics 38, 369-385.
# Load data set for segments (units within domains) data(cornsoybean) # Load data set for counties data(cornsoybeanmeans) attach(cornsoybeanmeans) # Construct data frame with county means of auxiliary variables for # domains. First column must include the county code Xmean <- data.frame(CountyIndex, MeanCornPixPerSeg, MeanSoyBeansPixPerSeg) Popn <- data.frame(CountyIndex, PopnSegments) # Compute parametric bootstrap MSEs of the EBLUPs of means of crop areas # for each county. set.seed(123) result <- pbmseBHF(CornHec ~ CornPix + SoyBeansPix, dom=County, selectdom=c(10,1,5), meanxpop=Xmean, popnsize=Popn, B=50, data=cornsoybean) result detach(cornsoybeanmeans)
# Load data set for segments (units within domains) data(cornsoybean) # Load data set for counties data(cornsoybeanmeans) attach(cornsoybeanmeans) # Construct data frame with county means of auxiliary variables for # domains. First column must include the county code Xmean <- data.frame(CountyIndex, MeanCornPixPerSeg, MeanSoyBeansPixPerSeg) Popn <- data.frame(CountyIndex, PopnSegments) # Compute parametric bootstrap MSEs of the EBLUPs of means of crop areas # for each county. set.seed(123) result <- pbmseBHF(CornHec ~ CornPix + SoyBeansPix, dom=County, selectdom=c(10,1,5), meanxpop=Xmean, popnsize=Popn, B=50, data=cornsoybean) result detach(cornsoybeanmeans)
This function obtains estimators of the mean squared errors of the EB estimators of domain parameters by a parametric bootstrap method. Population values of auxiliary variables are required.
pbmseebBHF(formula, dom, selectdom, Xnonsample, B = 100, MC = 100, data, transform = "BoxCox", lambda = 0, constant = 0, indicator)
pbmseebBHF(formula, dom, selectdom, Xnonsample, B = 100, MC = 100, data, transform = "BoxCox", lambda = 0, constant = 0, indicator)
formula |
an object of class |
dom |
|
selectdom |
|
Xnonsample |
matrix or data frame containing in the first column the domain codes and in the rest of columns the values of each of
|
B |
number of bootstrap replicates. Default value is |
MC |
number of Monte Carlo replicates for the empirical approximation of the EB estimator. Default value is |
data |
optional data frame containing the variables named in |
transform |
type of transformation for the dependent variable to be chosen between the |
lambda |
value for the parameter of the family of transformations specified in |
constant |
constant added to the dependent variable before doing the transformation, to achieve a distribution close to Normal. Default value is |
indicator |
function of the (untransformed) variable on the left hand side of |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
data frame with number of rows equal to number of selected domains, containing in its columns the domain codes ( |
Cases with NA values in formula
or dom
are ignored.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Molina, I. and Rao, J.N.K. (2010). Small Area Estimation of Poverty Indicators. The Canadian Journal of Statistics 38, 369-385.
data(incomedata) # Load data set attach(incomedata) # Construct design matrix for sample elements Xs<-cbind(age2,age3,age4,age5,nat1,educ1,educ3,labor1,labor2) # Select the domains to compute EB estimators data(Xoutsamp) domains <- c(5) # Poverty incidence indicator povertyline <- 0.6*median(incomedata$income) povertyline # 6477.484 povinc <- function(y) { z <- 6477.484 result <- mean(y<z) return (result) } # Compute parametric bootstrap MSE estimators of the EB # predictors of poverty incidence. Take constant=3600 to achieve # approximately symmetric residuals. set.seed(123) result <- pbmseebBHF(income~Xs, dom=prov, selectdom=domains, Xnonsample=Xoutsamp, B=2, MC=2, constant=3600, indicator=povinc) result$est$eb result$mse result$est$fit$refvar detach(incomedata)
data(incomedata) # Load data set attach(incomedata) # Construct design matrix for sample elements Xs<-cbind(age2,age3,age4,age5,nat1,educ1,educ3,labor1,labor2) # Select the domains to compute EB estimators data(Xoutsamp) domains <- c(5) # Poverty incidence indicator povertyline <- 0.6*median(incomedata$income) povertyline # 6477.484 povinc <- function(y) { z <- 6477.484 result <- mean(y<z) return (result) } # Compute parametric bootstrap MSE estimators of the EB # predictors of poverty incidence. Take constant=3600 to achieve # approximately symmetric residuals. set.seed(123) result <- pbmseebBHF(income~Xs, dom=prov, selectdom=domains, Xnonsample=Xoutsamp, B=2, MC=2, constant=3600, indicator=povinc) result$est$eb result$mse result$est$fit$refvar detach(incomedata)
Calculates the parametric bootstrap mean squared error estimates of the spatial EBLUPs obtained by fitting the spatial Fay-Herriot model, in which area effects follow a Simultaneously Autorregressive (SAR) process.
pbmseSFH(formula, vardir, proxmat, B = 100, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
pbmseSFH(formula, vardir, proxmat, B = 100, method = "REML", MAXITER = 100, PRECISION = 0.0001, data)
formula |
an object of class |
vardir |
vector containing the |
proxmat |
|
B |
number of bootstrap replicates. Default value is |
method |
type of fitting method, to be chosen between |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
data |
optional data frame containing the variables named in |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
data frame containing the naive parametric bootstrap mean squared error estimates ( |
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Isabel Molina, Monica Pratesi and Nicola Salvati.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Molina, I., Salvati, N. and Pratesi, M. (2009). Bootstrap for estimating the MSE of the Spatial EBLUP. Computational Statistics 24, 441-458.
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Obtain the fitting values, naive and bias-corrected parametric bootstrap MSE estimates # using REML method set.seed(123) result <- pbmseSFH(grapehect ~ area + workdays - 1, var, grapesprox, B=2, data=grapes) result
data(grapes) # Load data set data(grapesprox) # Load proximity matrix # Obtain the fitting values, naive and bias-corrected parametric bootstrap MSE estimates # using REML method set.seed(123) result <- pbmseSFH(grapehect ~ area + workdays - 1, var, grapesprox, B=2, data=grapes) result
Calculates parametric bootstrap mean squared error estimates of the EBLUPs based on a spatio-temporal Fay-Herriot model with area effects following a SAR(1) process and with either uncorrelated or correlated time effects within each domain following an AR(1) process.
pbmseSTFH(formula, D, T, vardir, proxmat, B = 100, model = "ST", MAXITER = 100, PRECISION = 0.0001, theta_iter = FALSE, sigma21_start = 0.5 * median(vardir), rho1_start = 0.5, sigma22_start = 0.5 * median(vardir), rho2_start = 0.5, data)
pbmseSTFH(formula, D, T, vardir, proxmat, B = 100, model = "ST", MAXITER = 100, PRECISION = 0.0001, theta_iter = FALSE, sigma21_start = 0.5 * median(vardir), rho1_start = 0.5, sigma22_start = 0.5 * median(vardir), rho2_start = 0.5, data)
formula |
an object of class |
D |
total number of domains. |
T |
total number of time instants (constant for each domain). |
vardir |
vector containing the |
proxmat |
|
B |
number of bootstrap replicates. Default value is |
model |
type of model to be chosen between |
MAXITER |
maximum number of iterations allowed for the Fisher-scoring algorithm. Default value is |
PRECISION |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is |
theta_iter |
If |
sigma21_start |
Starting value of the area effects variance in the fitting algorithm. Default value is |
rho1_start |
Starting value of the area effects spatial autocorrelation parameter in the fitting algorithm. Default value is |
sigma22_start |
Starting value of the area-time effects variance in the fitting algorithm. Default value is |
rho2_start |
Starting value of the time autocorrelation parameter of the area-time effects in the fitting algorithm. Default value is |
data |
optional data frame containing the variables named in |
This function uses random number generation. To fix the seed, use set.seed
.
A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed.
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula
for more details of allowed formulae.
The function returns a list with the following objects:
est |
a list with the results of the estimation process: |
mse |
a vector of length |
In case that formula
, vardir
or proxmat
contain NA values a message is printed and no action is done.
Yolanda Marhuenda, Isabel Molina and Domingo Morales.
- Small Area Methods for Poverty and Living Conditions Estimates (SAMPLE), funded by European Commission, Collaborative Project 217565, Call identifier FP7-SSH-2007-1.
- Marhuenda, Y., Molina, I. and Morales, D. (2013). Small area estimation with spatio-temporal Fay-Herriot models. Computational Statistics and Data Analysis 58, 308-325.
data(spacetime) # Load data set data(spacetimeprox) # Load proximity matrix D <- nrow(spacetimeprox) # number of domains T <- length(unique(spacetime$Time)) # number of time instant # Calculate MSEs of EBLUPs under the spatio-temporal Fay-Herriot model # with uncorrelated time effects nested within domains (model S) set.seed(123) resultS <- pbmseSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, B=10, model="S", data=spacetime) # Print direct estimates, variance, "S" model estimates, mse and # residuals of the last time instant. output <- data.frame(Domain=spacetime$Area, Period=spacetime$Time, Direct=spacetime$Y, EBLUP_S=resultS$est$eblup, VarDirect=spacetime$Var, MSE_S=resultS$mse, Residuals=spacetime$Y-resultS$est$eblup) periods <- unique(spacetime$Time) lastperiod <- periods[length(periods)] print(output[output[,"Period"]==lastperiod,], row.names=FALSE) # Calculate MSEs of the EBLUPs based on the spatio-temporal Fay-Herriot model # with AR(1) time effects nested within each area attach(spacetime) set.seed(123) resultST <- pbmseSTFH(Y ~ X1 + X2, D, T, vardir=Var, spacetimeprox, B=10) # Print direct estimates, variance, "ST" model estimates, mse and # residuals of the last time instant. output <- data.frame(Domain=Area, Period=Time, Direct=Y, EBLUP_ST=resultST$est$eblup, VarDirect=Var, MSE_ST=resultST$mse, Residuals=Y-resultST$est$eblup) periods <- unique(Time) lastperiod <- periods[length(periods)] print(output[output[,"Period"]==lastperiod,], row.names=FALSE) detach(spacetime)
data(spacetime) # Load data set data(spacetimeprox) # Load proximity matrix D <- nrow(spacetimeprox) # number of domains T <- length(unique(spacetime$Time)) # number of time instant # Calculate MSEs of EBLUPs under the spatio-temporal Fay-Herriot model # with uncorrelated time effects nested within domains (model S) set.seed(123) resultS <- pbmseSTFH(Y ~ X1 + X2, D, T, Var, spacetimeprox, B=10, model="S", data=spacetime) # Print direct estimates, variance, "S" model estimates, mse and # residuals of the last time instant. output <- data.frame(Domain=spacetime$Area, Period=spacetime$Time, Direct=spacetime$Y, EBLUP_S=resultS$est$eblup, VarDirect=spacetime$Var, MSE_S=resultS$mse, Residuals=spacetime$Y-resultS$est$eblup) periods <- unique(spacetime$Time) lastperiod <- periods[length(periods)] print(output[output[,"Period"]==lastperiod,], row.names=FALSE) # Calculate MSEs of the EBLUPs based on the spatio-temporal Fay-Herriot model # with AR(1) time effects nested within each area attach(spacetime) set.seed(123) resultST <- pbmseSTFH(Y ~ X1 + X2, D, T, vardir=Var, spacetimeprox, B=10) # Print direct estimates, variance, "ST" model estimates, mse and # residuals of the last time instant. output <- data.frame(Domain=Area, Period=Time, Direct=Y, EBLUP_ST=resultST$est$eblup, VarDirect=Var, MSE_ST=resultST$mse, Residuals=Y-resultST$est$eblup) periods <- unique(Time) lastperiod <- periods[length(periods)] print(output[output[,"Period"]==lastperiod,], row.names=FALSE) detach(spacetime)
Calculates post-stratified synthetic estimators of domain means using the categories of a cualitative variable as post-strata.
pssynt(y, sweight, ps, domsizebyps, data)
pssynt(y, sweight, ps, domsizebyps, data)
y |
vector specifying the individual values of the variable for which we want to estimate the domain means. |
sweight |
vector (same size as |
ps |
vector (same size as |
domsizebyps |
data frame with domain codes in the first column. Each remaining column contains the domain population sizes for each post-strata. Names of these columns must be the post-strata identifiers specified in |
data |
optional data frame containing the variables named in |
The function returns a data frame of size D*2
with the following columns:
Domain |
domain codes in ascending order. |
PsSynthetic |
post-stratified synthetic estimators of domain means of variable |
Cases with NA values in y
, sweight
or ps
are ignored.
- Rao, J.N.K. (2003). "Small Area Estimation". Wiley, London.
# Compute post-stratified synthetic estimators of mean income # for provinces considering the education levels codes # (variable educ) as post-strata. # Load data set data(incomedata) # Load province sizes by education levels data(sizeprovedu) # Compute post-stratified synthetic estimators with province labels # as domain codes colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") result1 <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-2], data=incomedata) result1 # Now with province codes as domain codes colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") result2 <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-1], data=incomedata) result2
# Compute post-stratified synthetic estimators of mean income # for provinces considering the education levels codes # (variable educ) as post-strata. # Load data set data(incomedata) # Load province sizes by education levels data(sizeprovedu) # Compute post-stratified synthetic estimators with province labels # as domain codes colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") result1 <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-2], data=incomedata) result1 # Now with province codes as domain codes colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") result2 <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-1], data=incomedata) result2
Identifiers and population sizes for domains in data set incomedata
.
data(sizeprov)
data(sizeprov)
A data frame with 52 observations on the following 3 variables.
provlab
:province name.
prov
:province code.
Nd
:province population count.
Names, codes and population sizes by age for domains in data set incomedata
.
data(sizeprovage)
data(sizeprovage)
A data frame with 52 observations on the following 7 variables.
provlab
:province name.
prov
:province code.
age1
:province count for age group <16.
age2
:province count for age group 16-24.
age3
:province count for age group 25-49.
age4
:province count for age group 50-64.
age5
:province count for age group >=65.
Identifiers and population sizes by level of education for domains in data set incomedata
.
data(sizeprovedu)
data(sizeprovedu)
A data frame with 52 observations on the following 6 variables.
provlab
:province name.
prov
:province code.
educ0
:province count for education level 0 (age<16).
educ1
:province count for education level 1 (primary education).
educ2
:province count for education level 2 (secondary education).
educ3
:province count for education level 3 (post-secondary education).
Names, codes and population sizes by labor force status for domains in data set incomedata
.
data(sizeprovlab)
data(sizeprovlab)
A data frame with 52 observations on the following 6 variables.
provlab
:province name.
prov
:province code.
labor0
province count for labor force status 0 (age<16).
labor1
province count for labor force status 1 (employed).
labor2
province count for labor force status 2 (unemployed).
labor3
province count for labor force status 3 (inactive).
Names, codes and population sizes for Spanish or non Spanish nationality for domains in data set incomedata
.
data(sizeprovnat)
data(sizeprovnat)
A data frame with 52 observations on the following 4 variables.
provlab
:province name.
prov
:province code.
nat1
:province count for Spanish nationality.
nat2
:province count for non Spanish nationality.
Synthetic area level data with spatial and temporal correlation.
data(spacetime)
data(spacetime)
A data frame with 33 observations on the following 6 variables.
Area
:numeric domain indicator.
Time
:numeric time instant indicator.
X1
:first auxiliary variable at domain level.
X2
:second auxiliary variable at domain level.
Y
:direct estimators of the target variable in the domains.
Var
:sampling variances of direct estimators for each domain.
Example of proximity matrix for the domains included in data set spacetime
.
data(spacetimeprox)
data(spacetimeprox)
The values are numbers in the interval [0,1]
containing the
proximity of the row and column domains.
The sum of the values of each row is equal to 1.
Calculates sample size dependent estimators of domain means, as composition of direct and synthetic estimators. The estimators involved in the composition must be given as function arguments.
ssd(dom, sweight, domsize, direct, synthetic, delta = 1, data)
ssd(dom, sweight, domsize, direct, synthetic, delta = 1, data)
dom |
vector or factor (same size as |
sweight |
vector (same size as |
domsize |
matrix or data frame with domain codes in the first column and the corresponding domain population sizes in the second column. |
direct |
matrix or data frame with domain codes in the first column and the corresponding direct estimators of domain means in the second column. |
synthetic |
matrix or data frame with domain codes in the first column and the corresponding synthetic estimators of domain means in the second column. |
delta |
constant involved in sample size dependent estimator, controlling how much strength to borrow. Default value is 1. |
data |
optional data frame containing the variables named in |
The function returns a data frame of size D*2
with the following columns:
Domain |
domain codes in ascending order. |
ssd |
sample size dependent estimators of domain means. |
CompWeight |
weights attached to direct estimators in the composition. |
Cases with NA values in dom
or sweight
are ignored.
- Drew, D., Singh, M.P. and Choudhry, G.H. (1982). Evaluation of small area estimation techniques for the Canadian Labour Force Survey. Survey Methodology 8, 17-47.
- Rao, J. N. K. (2003). Small Area Estimation. Wiley, London.
# We compute sample size dependent estimators of mean income by # composition of the Horvitz-Thompson direct estimator and the # post-stratified synthetic estimator with age groups as post-strata. # Load data set data(incomedata) # Load population sizes of provinces (domains) data(sizeprov) # First we compute Horvitz-Thompson direct estimators dir <- direct(y=income, dom=provlab, sweight=weight, domsize=sizeprov[,c(1,3)], data=incomedata) # Now we compute post-stratified synthetic estimators with education # levels as post-strata # Load province sizes by education levels data(sizeprovedu) # Compute post-stratified synthetic estimators colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") synth <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-2], data=incomedata) # Compute sample size dependent estimators of province mean income # by composition of Horvitz-Thompson direct estimators and # post-stratified estimators for delta=1 comp <- ssd(dom=provlab, sweight=weight, domsize=sizeprov[,c(1,3)], direct=dir[,c("Domain","Direct")], synthetic=synth, data=incomedata) comp
# We compute sample size dependent estimators of mean income by # composition of the Horvitz-Thompson direct estimator and the # post-stratified synthetic estimator with age groups as post-strata. # Load data set data(incomedata) # Load population sizes of provinces (domains) data(sizeprov) # First we compute Horvitz-Thompson direct estimators dir <- direct(y=income, dom=provlab, sweight=weight, domsize=sizeprov[,c(1,3)], data=incomedata) # Now we compute post-stratified synthetic estimators with education # levels as post-strata # Load province sizes by education levels data(sizeprovedu) # Compute post-stratified synthetic estimators colnames(sizeprovedu) <- c("provlab", "prov", "0", "1", "2", "3") synth <- pssynt(y=income, sweight=weight, ps=educ, domsizebyps=sizeprovedu[,-2], data=incomedata) # Compute sample size dependent estimators of province mean income # by composition of Horvitz-Thompson direct estimators and # post-stratified estimators for delta=1 comp <- ssd(dom=provlab, sweight=weight, domsize=sizeprov[,c(1,3)], direct=dir[,c("Domain","Direct")], synthetic=synth, data=incomedata) comp
Values of p auxiliary variables for out-of-sample units within 5 domains of data set incomedata
.
data(Xoutsamp)
data(Xoutsamp)
A data frame with 713301 observations on the following 10 variables.
domain
:a numeric vector with the domain codes.
age2
:indicator of age group 16-24.
age3
:indicator of age group 25-49.
age4
:indicator of age group 50-64.
age5
:indicator of age group >=65.
nat1
:indicator of Spanish nationality.
educ1
:indicator of education level 1 (primary education).
educ3
:indicator of education level 3 (post-secondary education).
labor1
:indicator of being employed.
labor2
:indicator of being unemployed.