Title: | Small Area Estimation with Cluster Information for Estimation of Non-Sampled Areas |
---|---|
Description: | Implementation of small area estimation (Fay-Herriot model) with EBLUP (Empirical Best Linear Unbiased Prediction) Approach for non-sampled area estimation by adding cluster information and assuming that there are similarities among particular areas. See also Rao & Molina (2015, ISBN:978-1-118-73578-7) and Anisa et al. (2013) <doi:10.9790/5728-10121519>. |
Authors: | Ridson Al Farizal P [aut, cre, cph] , Azka Ubaidillah [aut] |
Maintainer: | Ridson Al Farizal P <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2024-10-25 05:32:01 UTC |
Source: | CRAN |
Generic function calculating Akaike's "An Information Criterion" for EBLUP model
## S3 method for class 'eblupres' AIC(object, ...) ## S3 method for class 'eblupres' BIC(object, ...)
## S3 method for class 'eblupres' AIC(object, ...) ## S3 method for class 'eblupres' BIC(object, ...)
object |
EBLUP model. |
... |
further arguments passed to or from other methods. |
AIC value.
m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") AIC(m1)
m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") AIC(m1)
autoplot()
uses ggplot2 to draw a particular plot for an object of a
particular class in a single command. This defines the S3 generic that
other classes and packages can extend.
autoplot(object, ...)
autoplot(object, ...)
object |
an object, whose class will determine the behaviour of autoplot |
... |
other arguments passed to specific methods |
a ggplot object
autolayer()
, ggplot()
and fortify()
Autoplot EBLUP results.
## S3 method for class 'eblupres' autoplot(object, variable = "RSE", ...)
## S3 method for class 'eblupres' autoplot(object, variable = "RSE", ...)
object |
EBLUP model. |
variable |
variable to plot. |
... |
further arguments passed to or from other methods. |
plot.
library(saens) m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") autoplot(m1)
library(saens) m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") autoplot(m1)
Extract Model Coefficients.
## S3 method for class 'eblupres' coef(object, ...)
## S3 method for class 'eblupres' coef(object, ...)
object |
EBLUP model. |
... |
further arguments passed to or from other methods. |
model coefficients
m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") coef(m1)
m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") coef(m1)
This function gives the Empirical Best Linear Unbiased Prediction (EBLUP) or Empirical Best (EB) predictor under normality based on a Fay-Herriot model.
eblupfh( formula, data, vardir, method = "REML", maxiter = 100, precision = 1e-04, scale = FALSE, print_result = TRUE )
eblupfh( formula, data, vardir, method = "REML", maxiter = 100, precision = 1e-04, scale = FALSE, print_result = TRUE )
formula |
an object of class formula that contains a description of the model to be fitted. The variables included in the formula must be contained in the data. |
data |
a data frame or a data frame extension (e.g. a tibble). |
vardir |
vector or column names from data that contain variance sampling from the direct estimator for each area. |
method |
Fitting method can be chosen between 'ML' and 'REML'. |
maxiter |
maximum number of iterations allowed in the Fisher-scoring algorithm. Default is 100 iterations. |
precision |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is 0.0001. |
scale |
scaling auxiliary variable or not, default value is FALSE. |
print_result |
print coefficient or not, default value is TRUE. |
The model has a form that is response ~ auxiliary variables. where numeric type response variables can contain NA. When the response variable contains NA it will be estimated with cluster information.
The function returns a list with the following objects (df_res
and fit
):
df_res
a data frame that contains the following columns:
y
variable response
eblup
estimated results for each area
random_effect
random effect for each area
vardir
variance sampling from the direct estimator for each area
mse
Mean Square Error
rse
Relative Standart Error (%)
fit
a list containing the following objects:
estcoef
a data frame with the estimated model coefficients in the first column (beta),
their asymptotic standard errors in the second column (std.error),
the t-statistics in the third column (tvalue) and the p-values of the significance of each coefficient
in last column (pvalue)
model_formula
model formula applied
method
type of fitting method applied (ML
or REML
)
random_effect_var
estimated random effect variance
convergence
logical value that indicates the Fisher-scoring algorithm has converged or not
n_iter
number of iterations performed by the Fisher-scoring algorithm.
goodness
vector containing several goodness-of-fit measures: loglikehood, AIC, and BIC
Rao, J. N., & Molina, I. (2015). Small area estimation. John Wiley & Sons.
library(saens) m1 <- eblupfh(y ~ x1 + x2 + x3, data = na.omit(mys), vardir = "var") m1 <- eblupfh(y ~ x1 + x2 + x3, data = na.omit(mys), vardir = ~var)
library(saens) m1 <- eblupfh(y ~ x1 + x2 + x3, data = na.omit(mys), vardir = "var") m1 <- eblupfh(y ~ x1 + x2 + x3, data = na.omit(mys), vardir = ~var)
This function gives the Empirical Best Linear Unbiased Prediction (EBLUP) or Empirical Best (EB) predictor based on a Fay-Herriot model with cluster information for non-sampled areas.
eblupfh_cluster( formula, data, vardir, cluster, method = "REML", maxiter = 100, precision = 1e-04, scale = FALSE, print_result = TRUE )
eblupfh_cluster( formula, data, vardir, cluster, method = "REML", maxiter = 100, precision = 1e-04, scale = FALSE, print_result = TRUE )
formula |
an object of class formula that contains a description of the model to be fitted. The variables included in the formula must be contained in the data. |
data |
a data frame or a data frame extension (e.g. a tibble). |
vardir |
vector or column names from data that contain variance sampling from the direct estimator for each area. |
cluster |
vector or column name from data that contain cluster information. |
method |
Fitting method can be chosen between 'ML' and 'REML' |
maxiter |
maximum number of iterations allowed in the Fisher-scoring algorithm. Default is 100 iterations. |
precision |
convergence tolerance limit for the Fisher-scoring algorithm. Default value is 0.0001. |
scale |
scaling auxiliary variable or not, default value is FALSE. |
print_result |
print coefficient or not, default value is TRUE. |
The model has a form that is response ~ auxiliary variables. where numeric type response variables can contain NA. When the response variable contains NA it will be estimated with cluster information.
The function returns a list with the following objects df_res
and fit
:
df_res
a data frame that contains the following columns:
y
variable response
eblup
estimated results for each area
random_effect
random effect for each area
vardir
variance sampling from the direct estimator for each area
mse
Mean Square Error
cluster
cluster information for each area
rse
Relative Standart Error (%)
fit
a list containing the following objects:
estcoef
a data frame with the estimated model coefficients in the first column (beta),
their asymptotic standard errors in the second column (std.error),
the t-statistics in the third column (tvalue) and the p-values of the significance of each coefficient
in last column (pvalue)
model_formula
model formula applied
method
type of fitting method applied (ML
or REML
)
random_effect_var
estimated random effect variance
convergence
logical value that indicates the Fisher-scoring algorithm has converged or not
n_iter
number of iterations performed by the Fisher-scoring algorithm.
goodness
vector containing several goodness-of-fit measures: loglikehood, AIC, and BIC
Rao, J. N., & Molina, I. (2015). Small area estimation. John Wiley & Sons.
Anisa, R., Kurnia, A., & Indahwati, I. (2013). Cluster information of non-sampled area in small area estimation. E-Prosiding Internasional| Departemen Statistika FMIPA Universitas Padjadjaran, 1(1), 69-76.
library(saens) m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = ~var, cluster = ~clust)
library(saens) m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") m1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = ~var, cluster = ~clust)
Extract Log-Likelihood.
## S3 method for class 'eblupres' logLik(object, ...)
## S3 method for class 'eblupres' logLik(object, ...)
object |
EBLUP model. |
... |
further arguments passed to or from other methods. |
Log-Likehood value
library(saens) model1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") logLik(model1)
library(saens) model1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") logLik(model1)
Data on fresh milk expenditure, used by Arora and Lahiri (1997) and by You and Chapman (2006).
milk
milk
A data frame with 43 observations on the following 6 variables.
areas of inferential interest.
sample sizes of small areas.
average expenditure on fresh milk for the year 1989 (direct estimates for the small areas).
estimated standard deviations of yi.
variance sampling from the direct estimator (yi) for each area
estimated coefficients of variation of yi.
major areas created by You and Chapman (2006). These areas have similar direct estimates and produce a large CV reduction when using a FH model.
Arora, V. and Lahiri, P. (1997). On the superiority of the Bayesian method over the BLUP in small area estimation problems. Statistica Sinica 7, 1053-1063.
You, Y. and Chapman, B. (2006). Small area estimation using area level models and estimated sampling variances. Survey Methodology 32, 97-103.
A dataset containing the mean years of schooling people with disabilities in Papua Island, Indonesia in 2021.
mys
mys
A data frame with 42 rows and 7 variables with 10 domains are non-sampled areas.
regency municipality
mean years of schooling people with disabilities
variance sampling from the direct estimator for each area
relative standard error (%)
Number of Elementary Schools
Number of Junior High Schools
Number of Senior High Schools
Cluster
'summary' method for class "eblupres".
## S3 method for class 'eblupres' summary(object, ...)
## S3 method for class 'eblupres' summary(object, ...)
object |
EBLUP model. |
... |
further arguments passed to or from other methods. |
The function returns a data frame that contains the following columns:
* y
variable response
* eblup
estimated results for each area
* random_effect
random effect for each area
* vardir
variance sampling from the direct estimator for each area
* mse
Mean Square Error
* cluster
cluster information for each area
* rse
Relative Standart Error (
library(saens) model1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") summary(model1)
library(saens) model1 <- eblupfh_cluster(y ~ x1 + x2 + x3, data = mys, vardir = "var", cluster = "clust") summary(model1)