Package 'GB2'

Title: Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation
Description: Package GB2 explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distributions are given. Functions for the full log-likelihood, the profile log-likelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudo-likelihood estimation using the full and profile log-likelihood, and non-linear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudo-likelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as "compound" in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudo-likelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.
Authors: Monique Graf <[email protected]>, Desislava Nedyalkova <[email protected]>.
Maintainer: Desislava Nedyalkova <[email protected]>
License: GPL (>= 2)
Version: 2.1.1
Built: 2024-11-29 09:05:49 UTC
Source: CRAN

Help Index


Compound Distribution based on the Generalized Beta Distribution of the Second Kind

Description

Mixture distribution based on the compounding property of the GB2, in short "compound GB2". Decomposition of the GB2 distribution with respect to the left and right tail of the distribution. Calculation of the component densities and cumulative distribution functions. Calculation of the compound density function and the compound cumulative distribution function.

Usage

fg.cgb2(x, shape1, scale, shape2, shape3, pl0, decomp="r")
dl.cgb2(x, shape1, scale, shape2, shape3, pl0, decomp="r") 
pl.cgb2(y, shape1, scale, shape2, shape3, pl0, decomp="r", tol=1e-05)
dcgb2(x, shape1, scale, shape2, shape3, pl0, pl, decomp="r")
pcgb2(y, shape1, scale, shape2, shape3, pl0, pl, decomp="r")
prcgb2(y1, y2, shape1, scale, shape2, shape3, pl0, pl, decomp="r", tol=1e-08, 
debug=FALSE)

Arguments

x

numeric; can be a vector. The value(s) at which the compound density and the component densities are calculated, x is positive.

y

numeric; can be a vector. The value(s) at which the compound distribution function and the component distribution functions are calculated.

y1, y2

numeric values.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of fitted proportions. Sums to one. If pl is equal to pl0, we obtain the GB2 distribution.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

debug

logical; By default, debug = FALSE.

tol

numeric; tolerance with default 0, meaning to iterate until additional terms do not change the partial sum.

Details

The number of components LL is given by the length of the vector pl0. In our examples L=3L=3. Let NN denote the length of the vector x. Function fg.cgb2 calculates the LL gamma factors which multiply the GB2 density in order to obtain the component density ff_\ell. These component densities are calculated using the function dl.cgb2. Function pl.cgb2 calculates the corresponding LL cumulative component distribution functions. Function dcgb2 calculates the resulting compound density function. Function pcgb2 calculates the compound cumulative distribution function for a vector of values y and function prcgb2, given 2 arguments y1 and y2, calculates the probability P(min(y1,y2)<Y<max(y1,y2))P(min(y1,y2) < Y < max(y1,y2)), where the random variable YY follows a compound GB2 distribution.

Value

fg.cgb2 returns a matrix of size N×LN \times L of the Gamma factors, dl.cgb2 returns a matrix of size N×LN \times L of component densities, pl.cgb2 returns a matrix containing the LL component cdfs, dcgb2 returns a matrix of size N×1N \times 1 of the GB2 compound density function, pcgb2 returns a matrix of size N×1N \times 1 of the GB2 compound distribution function and prcgb2 returns a probability between 0 and 1.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Examples

#\dontrun{

#\library{cubature}

# GB2 parameters
af <- 5
bf <- 20000
pf <- 0.45 
qf <- 0.75

p0 <- rep(1/3,3)
p1 <- c(0.37,0.43,0.2)

# a vector of values
x <- rep(20000*seq(1,2,length.out=9))

#Gamma components
fg.cgb2(20000,af,bf,pf,qf,p0)
fg.cgb2(Inf,af,bf,pf,qf,p0,"l")

#Component densities
dl.cgb2(x,af,bf,pf,qf,p0)
dl.cgb2(20000,af,bf,pf,qf,p0,"l")

#Component cdf
pl.cgb2(25000,af,bf,pf,qf,p0)

#Compound cdf
pcgb2(x,af,bf,pf,qf,p0,p1)
prcgb2(37000,38000,af,bf,pf,qf,p0,p1,"l")
#}

Comparison of the compound GB2 and kernel densities by group

Description

Function dplot.cavgb2 produces a plot in which the compound and kernel (Epanechnikov) densities are plotted by group.

Usage

dplot.cavgb2(group, x, shape1, scale, shape2, shape3, pl0, pl, w=rep(1,length(x)), 
xmax = max(x)*(2/3), ymax=2e-05, decomp="r", choicecol=1:length(levels(group)), 
xlab="")

Arguments

group

numeric; a factor variable giving the group membership of each sampled unit.

x

numeric; can be a vector. The value(s) at which the density is calculated, used for the kernel estimate only. x is positive.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution. On the plot they are denotes as a, b, p, q and pl0 respectively.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of fitted proportions (output of pkl.cavgb2). Sums to one. If pl is equal to pl0, we obtain the GB2 distribution.

w

numeric; weights.

xmax

numeric; scale on the horizontal axis. By default is equal to max(x)(2/3)max(x)*(2/3).

ymax

numeric; scale on the vertical axis. By default is equal to 2e-05.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

choicecol

numeric vector of length the number of groups; defines the color with which the density curves will be plotted for each group.

xlab

string; label for xx. The default is " ".

Details

The legend is placed interactively.

Value

dplot.cavgb2 plots a graph with two curves - the GB2 density, the compound GB2 per group and the corresponding kernel estimate.

Author(s)

Monique Graf and Desislava Nedyalkova


Fitting the Compound Distribution based on the GB2 by the Method of Pseudo Maximum Likelihood Estimation using Auxiliary Information

Description

Calculates the log-likelihood, the score functions of the log-likelihood and fits the compound distribution based on the GB2 and using auxiliary information.

Usage

pkl.cavgb2(z, lambda)
lambda0.cavgb2(pl0, z, w=rep(1, dim(z)[1]))
logl.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1]))
scores.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1]))
ml.cavgb2(fac, z, lambda0, w = rep(1, dim(fac)[1]), maxiter = 100, fnscale=length(w))

Arguments

z

numeric; a matrix of auxiliary variables.

lambda

numeric; a matrix of parameters.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

w

numeric; vector of weights of length the number of rows of the matrix fac. By default w is a vector of 1.

fac

numeric; a matrix of Gamma factors.

lambda0

numeric; a matrix of initial parameters.

maxiter

numeric; maximum number of iterations to perform. By default maxiter = 100.

fnscale

numeric; parameter of the optim function. By default fnscale is equal to the lenth of the vector of weights (value of fnscale in the preceding version of the package). Permits to solve some convergence problems (see optim).

Details

We model the probabilities pp_\ell with auxiliary variables. Let zkz_k denote the vector of auxiliary information for unit kk. This auxiliary information modifies the probabilities pp_\ell at the unit level. Denote by pk,p_{k,\ell} the weight of the density ff_\ell for unit kk. For =1,...,L1\ell=1,...,L-1, we pose a linear model for the log-ratio vk,v_{k,\ell}:

log(pk,/pk,L)=vk,=i=1Iλizki=zkTλ.\log(p_{k,\ell}/p_{k,L}) = v_{k,\ell} =\sum_{i=1}^I \lambda_{\ell i} z_{k i}= \mathbf{z}_k^{T} \boldsymbol{\lambda_{\ell}}.

Function pkl.cavgb2 calculates the pk,p_{k,\ell}. Function lambda0.cavgb2 calculates the initial values λi\lambda_{\ell i}, i=1,...,Ii= 1, ..., I, =1,...,L1\ell = 1, ..., L-1 . Let

zˉi=kwkzki/kwk\bar{z}_{i}=\sum_k w_k z_{ki}/\sum_k w_k

be the mean value of the ii-th explanatory variable. Writing

log(p^(0)/p^L(0))=v(0)=i=1Iλi(0)zˉi,\log(\hat{p}^{(0)}_\ell / \hat{p}^{(0)}_L)=v^{(0)}_\ell = \sum_{i=1}^I \lambda^{(0)}_{\ell i} \bar{z}_{i},

we can choose λi(0)=v(0)/(Izˉi).\lambda^{(0)}_{\ell i}= v^{(0)}_\ell / (I \bar{z}_{i}). Analogically to the ordinary fit of the compound distribution based on the GB2 CompoundFit, we express the log-likelihood as a weighted mean of log(f)=log((pk,f(xk))log(f) = log(\sum(p_{k,\ell} f_\ell(x_k)), evaluated at the data points, where ff is the GB2 compound density. The scores are obtained as the weighted sums of the first derivatives of the log-likelihood, with respect to the parameters λ, =1,...,L1\lambda_\ell, \ \ell=1, ..., L-1, evaluated at the data points. Function ml.cavgb2 performs maximum likelihood estimation through the general-purpose optimization function optim from package stats. The considered method of optimization is "BFGS" (optim). Once we have the fitted parameters λ^\hat{\lambda} we can deduce the fitted parameters vk^\hat{v{k\ell}} and pk^\hat{p_{k\ell}} in function of zˉ\bar{z} and λ^\hat{\lambda}_{\ell}.

Value

pkl.cavgb2 returns a matrix of probabilities. lambda0.cavgb2 returns a matrix of size I×L1I \times L-1. logl.cavgb2 returns the value of the pseudo log-likelihood. scores.cavgb2 returns the weighted sum of the scores of the log-likelihood. ml.cavgb2 returns a list containing two objects - the vector of fitted coefficients λ^\hat{\lambda_\ell} and the output of the "BFGS" fit.

Author(s)

Monique Graf and Desislava Nedyalkova

See Also

optim

Examples

## Not run: 

library(simFrame)
data(eusilcP)

# Stratified cluster sampling
set.seed(12345)
srss <- SampleControl(design = "region", grouping = "hid", size = c(200*3, 1095*3, 1390*3,
 425*3, 820*3, 990*3, 400*3, 450*3, 230*3), k = 1)

# Draw a sample
s1 <- draw(eusilcP,srss)
#names(s1)

# Creation of auxiliary variables
ind <- order(s1[["hid"]])
ss1 <- data.frame(hid=s1[["hid"]], region=s1[["region"]], hsize=s1[["hsize"]], 
peqInc=s1[["eqIncome"]], age=s1[["age"]], pw=s1[[".weight"]])[ind,]
ss1[["child"]] <- as.numeric((ss1[["age"]]<=14))
ss1[["adult"]] <- as.numeric((ss1[["age"]]>=20))
sa <- aggregate(ss1[,c("child","adult")],list(ss1[["hid"]]),sum)
names(sa)[1] <- "hid"
sa[["children"]] <- as.numeric((sa[["child"]]>0))
sa[["single_a"]] <- as.numeric((sa[["adult"]]==1))
sa[["sa.ch"]] <- sa[["single_a"]]*sa[["children"]]
sa[["ma.ch"]] <- (1-sa[["single_a"]])*sa[["children"]]
sa[["nochild"]] <- 1-sa[["children"]]

# New data set
ns <- merge(ss1[,c("hid","region","hsize","peqInc","pw")], 
sa[,c("hid","nochild","sa.ch","ma.ch")], by="hid")

# Ordering the data set
ns <- ns[!is.na(ns$peqInc),]
index <- order(ns$peqInc)
ns <- ns[index,]

# 	Truncate at 0
ns <- ns[ns$peqInc>0,]
# income
peqInc <- ns$peqInc
# weights
pw <- ns$pw             

# Adding the weight adjustment
c1 <- 0.1                                
pwa <- robwts(peqInc,pw,c1,0.001)[[1]]        
corr <- mean(pw)/mean(pwa)              
pwa <- pwa*corr  

ns <- data.frame(ns, aw=pwa) 

# Empirical indicators with original weights
emp.ind <- c(main.emp(peqInc, pw),
              main.emp(peqInc[ns[["nochild"]]==1], pw[ns[["nochild"]]==1]),
              main.emp(peqInc[ns[["sa.ch"]]==1], pw[ns[["sa.ch"]]==1]),
              main.emp(peqInc[ns[["ma.ch"]]==1], pw[ns[["ma.ch"]]==1]))

# Matrix of auxiliary variables
z <- ns[,c("nochild","sa.ch","ma.ch")]
#unique(z)
z <- as.matrix(z)

# global GB2 fit, ML profile log-likelihood
gl.fit <- profml.gb2(peqInc,pwa)$opt1
agl.fit <- gl.fit$par[1]
bgl.fit <- gl.fit$par[2]
pgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[3]
qgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[4]

# Likelihood and convergence
proflikgl <- -gl.fit$value
convgl <- gl.fit$convergence

# Fitted GB2 parameters and indicators
profgb2.par <- c(agl.fit, bgl.fit, pgl.fit, qgl.fit)
profgb2.ind <- main.gb2(0.6, agl.fit, bgl.fit, pgl.fit, qgl.fit)

# Initial lambda and pl
pl0 <- c(0.2,0.6,0.2)
lambda0 <- lambda0.cavgb2(pl0, z, pwa)

# left decomposition
decomp <- "l"
facgl <- fg.cgb2(peqInc, agl.fit, bgl.fit, pgl.fit, qgl.fit, pl0 ,decomp)
fitcml <- ml.cavgb2(facgl, z, lambda0, pwa, maxiter=500) 
fitcml
convcl <- fitcml[[2]]$convergence
convcl
lambdafitl <- fitcml[[1]]
pglfitl <-  pkl.cavgb2(diag(rep(1,3),lambdafitl)
row.names(pglfitl) <- colnames(z)

## End(Not run)

Variance Estimation under the Compound GB2 Distribution Using Auxiliary Information

Description

Calculation of variance estimates of the parameters of the compound GB2 distribution and of the estimated compound GB2 indicators under a complex survey design (see package survey).

Usage

scoreU.cavgb2(fac, z, lambda) 
scorez.cavgb2(U,z)
varscore.cavgb2(SC, w=rep(1,dim(SC)[1]))
desvar.cavgb2(data=data, SC=SC, ids=NULL, probs=NULL, strata = NULL, variables = NULL,
fpc=NULL, nest = FALSE, check.strata = !nest, weights=NULL, pps=FALSE, 
variance=c("HT","YG"))
hess.cavgb2(U, P, z, w=rep(1, dim(z)[1]))
vepar.cavgb2(ml, Vsc, hess)
veind.cavgb2(group, vepar, shape1, scale, shape2, shape3, pl0, P, decomp="r")

Arguments

fac

numeric; a matrix of Gamma factors.

z

numeric; a matrix of auxiliary variables.

lambda

numeric; a matrix of parameters.

U

numeric; a matrix of scores Uk,U_{k,\ell} (output of the scoreU.cavgb2 function).

SC

numeric; scores, output of scorez.cavgb2.

w

numeric; vector of extrapolation weights. By default w is a vector of 1.

data

dataset containing the design information per unit

ids, probs, strata, variables, fpc, nest, check.strata, weights, pps, variance

parameters of svydesign.

P

numeric; matrix of mixture probabilities (output of pkl.cavgb2).

ml

numeric; estimated values of the vector of v's. Output of the ml.cavgb2 function (the second element in the list).

Vsc

numeric; 4 by 4 matrix. Variance of the scores SC, computed in varscore.cavgb2 or with the design information in desvar.cavgb2.

hess

numeric; Hessian (bread) for the sandwich variance estimate (output of hess.cavgb2).

group

numeric; a factor variable of the same length as the sample size giving the group membership in the special case when the auxiliary information defines group membership.

vepar

numeric; output of vepar.cavgb2.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution.

Details

The N×LN \times L matrix of fitted mixture probabilities P=(pk,)=(p_{k,\ell}) depends on the N×IN \times I matrix z of auxiliary variables. P has as many distinct rows as there are distinct rows in z. The N×LN \times L matrix of gamma factors fac=F=F, output of fg.cgb2 depends on the vector of initial probabilities p0,p_{0,\ell} only. The N×(L1)N \times (L-1) matrix of scores U is defined as

U(k,)=pk,(F(k,)j=1Lpk,jF(k,j)1).U(k,\ell)=p_{k,\ell} \left( \frac{F(k,\ell)}{\sum_{j=1}^L p_{k,j}\, F(k,j)} - 1\right).

The linearized scores are the columns of a N×I(L1)N \times I(L-1) matrix

SC(k,I(1)+i)=U(k,)z(k,i).SC(k,\,I(\ell-1)+i)= U(k,\ell) \, z(k,i).

Function varscore.cavgb2 calculates the middle term of the sandwich variance estimator, that is the (I(L1)×I(L1))(I(L-1) \times I(L-1)) estimated variance-covariance matrix of the I(L1)I(L-1) weighted sums of the columns of SCSC, without design information. desvar.cavgb2 calculates the design-based variance-covariance matrix of the I(L1)I(L-1) weighted sums of the columns of SCSC, invoking svydesign and svytotal of package survey. hess.cavgb2 calculates the Hessian (I(L1)×I(L1)I(L-1) \times I(L-1) matrix of second derivatives of the pseudo-log-likelihood with respect to the parameters). It should be negative definite. If not, the maximum likelihood estimates are spurious. vepar.cavgb2 calculates the sandwich variance estimate of the vectorized matrix of parameters lambda. veind.cavgb2 calculates estimates, std error, covariance and correlation matrices of the indicators under the compound GB2 with auxiliary variables in the particular case where the unique combinations of the auxiliary variables define a small number of groups. Group membership is specified by the vector group of length NN.

Value

scoreU.cavgb2 returns a N×(L1)N \times (L-1) matrix of scores U. scorez.cavgb2 returns a N×I(L1)N \times I(L-1) matrix whose columns are the linearized scores SC. varscore.cavgb2 returns the variance-covariance estimate of the weighted sums of scores SC, given by weighted cross products. desvar.cavgb2 returns a list of two elements. The first is the output of svytotal and the second is the design-based variance-covariance matrix of the weighted sums of the scores SC. hess.cavgb2 returns the matrix of second derivatives of the likelihood with respect to the parameters (bread for the sandwich variance estimate). vepar.cavgb2 returns a list of five elements - [["type"]] with value "parameter", [["estimate"]] estimated parameters, [["stderr"]] corresponding standard errors, [["Vcov"]] variance -covariance matrix and [["Vcor"]] - correlation matrix. veind.cavgb2 returns a list of five elements: [["type"]] with value "indicator", followed by a list with as many arguments as length(levels(group)). Each argument is itself a list with 5 arguments: [["group"]] group name, [["estimate"]] estimated indicators under the compound GB2, [["stderr"]] corresponding standard errors, [["Vcov"]] variance -covariance matrix and [["Vcor"]] - correlation matrix.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Davison, A. (2003), Statistical Models. Cambridge University Press.

Freedman, D. A. (2006), On The So-Called "Huber Sandwich Estimator" and "Robust Standard Errors". The American Statistician, 60, 299–302.

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Pfeffermann, D. and Sverchkov, M. Yu. (2003), Fitting Generalized Linear Models under Informative Sampling. In, Skinner, C.J. and Chambers, R.L. (eds.). Analysis of Survey Data, chapter 12, 175–195. Wiley, New York.

Examples

## Not run: 
# Example (following of example in CompoundAuxFit)

# Scores U 
U <- scoreU.cavgb2(facgl, z, lambdafitl)

# Scores multiplied by z
SC <- scorez.cavgb2(U,z)

# Naive variance estimate of sum of scores
(Vsc <- varscore.cavgb2(SC,w=pwa))

# Design based variance of sum of scores
(desv <- desvar.cavgb2(data=ns,SC=SC,id=~hid,strata=~region,weights=~pwa))

# Hessian
hess <- hess.cavgb2(U,pglfitl,z,w=pwa) 

# 1. Sandwich variance-covariance matrix estimate of parameters using Vsc:
Param1 <- vepar.cavgb2(fitcml,Vsc, hess)
Param1

# 2. Sandwich variance-covariance matrix estimate of parameters using 
# the design variance:
Param2 <- vepar.cavgb2(fitcml,desv$Vtheta, hess)
Param2 

# 3. Indicators and conditional variances : takes a long time!
(Indic <- veind.cavgb2(group,Param2 ,agl.fit,bgl.fit,pgl.fit,qgl.fit, 
                       pl0, pglfitl, decomp="l") )

## End(Not run)

Comparison of the GB2, compound GB2 and kernel densities

Description

Function dplot.cgb2 produces a plot in which the three densities are plotted.

Usage

dplot.cgb2(x,shape1, scale, shape2, shape3, pl0, pl, w=rep(1,length(x)), decomp="r", 
xmax = max(x)*(2/3), choicecol=1:3, kernel="epanechnikov", adjust=1, title=NULL,
ylim=NULL)

Arguments

x

numeric; can be a vector. The value(s) at which the density is calculated, used for the kernel estimate only. x is positive.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution. On the plot they are denotes as a, codeb, p, q and pl0 respectively.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of mixture probabilities (output of ml.cgb2). Sums to one. If pl is equal to pl0, we obtain the GB2 distribution.

w

numeric; weights.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

xmax

numeric; maximum x value to be plotted.

choicecol

numeric vector of length 3; defines the color with which the density curves will be plotted.

adjust

numeric; graphical parameter of the generic function density.

title

string; title of the plot. By default is equall to NULL (no title).

ylim

string; scaling of parameters. By default is equall to NULL (automatic scaling).

kernel

string; the kernel used for the kernel density estimate. The default value is "Epanechnikov" (see plot.density).

Details

The legend is placed interactively.

Value

dplot.cgb2 plots a graph with three curves - the GB2 density, the compound GB2 density and the corresponding kernel estimate

Author(s)

Monique Graf and Desislava Nedyalkova


Fitting the Compound Distribution based on the GB2 by the Method of Maximum Likelihood Estimation

Description

Calculates the log-likelihood, the score functions of the log-likelihood, the weighted mean of scores, and fits the parameters of the Compound Distribution based on the GB2.

Usage

vofp.cgb2(pl)
pofv.cgb2(vl)
logl.cgb2(fac, pl, w=rep(1, dim(fac)[1]))
scores.cgb2(fac, pl, w=rep(1, dim(fac)[1]))
ml.cgb2(fac, pl0, w=rep(1, dim(fac)[1]), maxiter=100, fnscale=length(w))

Arguments

pl0

numeric; vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; vector of fitted proportions. Sums to one. If pl is equal to pl0, we obtain the GB2 distribution.

fac

numeric; matrix of Gamma factors (output of fac.cgb2.

vl

numeric; vector of parameters. Its length is equal to the length of pl - 1.

w

numeric; vector of weights of length the number of rows of the matrix fac. By default w is a vector of 1.

maxiter

numeric; maximum number of iterations to perform. By default maxiter = 100.

fnscale

numeric; an overall scaling parameter used in the function optim. By default it is equal to the length of the vector of weights w.

Details

There are only L1L-1 parameters to estimate, because the probabilities pp_\ell sum to 1 (L is the dimension of the vector of probailities pp_\ell). Knowing this, we change the parameters pp_\ell to v=log(p/pL), =1,...,L1v_\ell=log(p_\ell/p_L), \ \ell= 1, ..., L-1. This calculation is done through the function vofp.cgb2. pofv.cgb2 calculates the pp_\ell in function of the given vv_\ell. We express the log-likelihood as a weighted mean of log(f)=log((pf)log(f) = log(\sum(p_\ell f_\ell), evaluated at the data points, where ff is the GB2 compound density. If the weights are not available, then we suppose that w =1= 1. Analogically, the scores are obtained as weighted sums of the first derivatives of the log-likelihood, with respect to the parameters v, =1,...,L1v_\ell, \ \ell=1, ..., L-1, evaluated at the data points. Function ml.cgb2 performs maximum likelihood estimation through the general-purpose optimization function optim from package stats. The considered method of optimization is BFGS.

Value

vofp.cgb2 returns a vector of length L1L-1, where LL is the length of the vector pp_\ell. pofv.cgb2 returns a vector of length \ell. logl.cgb2 returns the value of the pseudo log-likelihood. scores.cgb2 returns a vector of the weighted mean of the scores of length L1L-1. ml.cgb2 returns a list containing two objects - the vector of fitted proportions p^\hat{p_\ell} and the output of the BFGS fit.

Author(s)

Monique Graf and Desislava Nedyalkova

See Also

optim

Examples

## Not run: 
# GB2 parameters:
a <- 4
b <- 1950
p <- 0.8
q <- 0.6

# Proportions defining the component densities:
pl0 <- rep(1/3,3)

# Mixture probabilities
pl <- c(0.1,0.8,0.1)

# Random generation:
n <- 10000
set.seed(12345)
x <- rcgb2(n,a,b,p,q,pl0,pl,decomp="l")

# Factors in component densities
fac <- fg.cgb2(x,a,b,p,q, pl0,decomp="l")

# Estimate the mixture probabilities:
estim <- ml.cgb2(fac,pl0)

# estimated mixture probabilities:
estim[[1]]
#[1] 0.09724319 0.78415797 0.11859883

## End(Not run)

Indicators of Poverty and Social Exclusion under the Compound Distribution based on the GB2

Description

Functions to calculate four primary social welfare indicators under the compound GB2 distribution, i.e. the at-risk-of-poverty threshold, the at-risk-of-poverty rate, the relative median at-risk-of-poverty gap, and the income quintile share ratio.

Usage

arpt.cgb2(prop, shape1, scale, shape2, shape3, pl0, pl, decomp="r")
arpr.cgb2(prop, shape1, shape2, shape3, pl0, pl, decomp="r")
rmpg.cgb2(arpr, shape1, shape2, shape3, pl0, pl, decomp="r")
qsr.cgb2(shape1, shape2, shape3, pl0, pl, decomp="r")
main.cgb2(prop, shape1, scale, shape2, shape3, pl0, pl, decomp="r")

Arguments

prop

numeric; proportion (in general is set to 0.6).

arpr

numeric; the value of the at-risk-of-poverty rate.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of mixture probabilities. Sums to one. If pl=pl0pl=pl0 we obtain the GB2 distribution.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

Details

The four indicators are described in details in the case of the GB2. The difference here is that we need to give an initial vector of proportions, fitted proportions and define for which decomposition (left or right) the indicators should be calculated.

Value

arpt.cgb2 gives the ARPT, arpr.cgb2 the ARPR, rmpg.cgb2 the RMPG, qsr.cgb2 gives the QSR and main.cgb2 calculates the median, the mean, the ARPR, the RMPG and the QSR under the compound GB2.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

See Also

arpr.gb2 for details on the welfare indicators under the GB2.

Examples

# GB2 parameters
a <- 3.9
b <- 18873
p <- 0.97
q <- 1.03

# Proportions defining the component densities
p0 <- rep(1/3,3)

# Mixture probabilities
pl <- c(0.39,0.26,0.35)

# for the right discretization
arpt <- arpt.cgb2(0.6, a, b, p, q, p0, pl)
arpr <- arpr.cgb2(0.6, a, p, q, p0, pl)
rmpg <- rmpg.cgb2(arpr, a, p, q, p0, pl)
qsr <- qsr.cgb2(a, p, q, p0, pl)

# for the left discretization
arptleft <- arpt.cgb2(0.6, a, b, p, q, p0, pl, "l")

Moments of the Compound Distribution based on the GB2

Description

These functions calculate the moment of order k and incomplete moment of order k of a GB2 compound random variable XX as well as the moment of order k for each component density.

Usage

mkl.cgb2(k, shape1, scale, shape2, shape3, pl0, decomp="r")
moment.cgb2(k, shape1, scale, shape2 ,shape3, pl0, pl, decomp="r")
incompl.cgb2(x, k, shape1, scale, shape2, shape3, pl0, pl, decomp="r")

Arguments

x

numeric; vector of quantiles.

k

numeric; order of the moment.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of mixture probabilities. Sums to one. If pl=pl0pl=pl0 we obtain the GB2 distribution.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

Value

mkl.cgb2 returns a vector of the moments of the component densities, moment.cgb2 returns the moment of order k and incompl.cgb2 - the incomplete moment of order k.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Examples

#\dontrun{

#\library{cubature}

# GB2 parameters
af <- 5
bf <- 20000
pf <- 0.45 
qf <- 0.75

p0 <- rep(1/3,3)
p1 <- c(0.37,0.43,0.2)

# moments for the component densities
mkl.cgb2(1,af,bf,pf,qf,p0)
mkl.cgb2(-1,af,bf,pf,qf,p0,"l")

#Moment of order k
moment.cgb2(0.5,af,bf,pf,qf,p0,p1)
moment.cgb2(0.5,af,bf,pf,qf,p0,p1,"l")

#Incomplete moment of order k
incompl.cgb2(20000,1,af,bf,pf,qf,p0,p1)
incompl.cgb2(20000,1,af,bf,pf,qf,p0,p1,"l")
#}

Quantiles and random generation of the Compound Distribution based on the GB2

Description

Calculation of the quantiles of a compound GB2 random variable. Random generation of compound GB2 variables.

Usage

qcgb2(prob, shape1, scale, shape2, shape3, pl0, pl, decomp="r", tol=1e-08, ff=1.5, 
debug=FALSE, maxiter=50)
rcgb2(n, shape1, scale, shape2, shape3, pl0, pl, decomp="r", tol=1e-02, maxiter=100,
debug = FALSE)

Arguments

prob

numeric; vector of probabilities between 0 and 1.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

n

numeric; number of observations. If length(n) > 1, the length is taken to be the number required.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

pl

numeric; a vector of mixture probabilities. Sums to one. If pl=pl0pl=pl0 we obtain the GB2 distribution.

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution. By default, decomp = "r" - right tail decomposition.

ff

numeric; a tuning parameter.

debug

logical; By default, debug = FALSE.

maxiter

numeric; maximum number of iterations to perform.

tol

numeric; tolerance with default 0, meaning to iterate until additional terms do not change the partial sum.

Value

qcgb2 returns a vector of quantiles and rcgb2 return a vector of size n of GB2 compound random deviates.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Examples

#\dontrun{

#\library{cubature}

# GB2 parameters
af <- 5
bf <- 20000
pf <- 0.45 
qf <- 0.75

p0 <- rep(1/3,3)
p1 <- c(0.37,0.43,0.2)

#Quantiles

qcgb2(0.5,af,bf,pf,qf,p0,p1)
qcgb2(1,af,bf,pf,qf,p0,p1)
qcgb2(c(0.5,0.8),af,bf,pf,qf,p0,p1)

#Random generation
rcgb2(10,af,bf,pf,qf,p0,p1)

#}

Variance Estimation of the Compound GB2 Distribution

Description

Calculation of variance estimates of the parameters of the compound GB2 distribution and of the estimated compound GB2 indicators under cluster sampling.

Usage

scoreU.cgb2(fac, pl)
varscore.cgb2(U, w=rep(1,dim(U)[1]))
desvar.cgb2(data=data, U=U, ids=NULL, probs=NULL, strata = NULL, variables = NULL, 
fpc=NULL, nest = FALSE, check.strata = !nest, weights=NULL, pps=FALSE, 
variance=c("HT","YG"))
hess.cgb2(U, pl, w=rep(1,dim(U)[1]))
vepar.cgb2(ml, Vsc, hess)
derivind.cgb2(shape1, scale, shape2, shape3, pl0, pl, prop=0.6, decomp="r")
veind.cgb2(Vpar, shape1, scale, shape2, shape3, pl0, pl, decomp="r")

Arguments

fac

numeric; matrix of Gamma factors (output of fac.cgb2.

pl

numeric; a vector of fitted mixture probabilities. Sums to one. If pl is equal to pl0, we obtain the GB2 distribution.

U

numeric; vector of scores. Output of the scoreU.cgb2 function.

w

numeric; vector of some extrapolation weights. By default w is a vector of 1.

data

dataset containing the design information per unit.

ids, probs, strata, variables, fpc, nest, check.strata, weights, pps, variance

parameters of svydesign.

ml

numeric; output of the ml.cgb2 function. A list with two components. First component: estimated mixture probabilities. Second component: list containing the output of optim.

Vsc

numeric; 4 by 4 matrix.

hess

numeric; Hessian (bread) for the sandwich variance estimate.

shape1, scale, shape2, shape3

numeric; positive parameters of the GB2 distribution.

pl0

numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one.

prop

numeric; proportion (in general is set to 0.6).

decomp

string; specifying if the decomposition of the GB2 is done with respect to the right tail ("r") or the left tail ("l") of the distribution.

Vpar

numeric; 4 by 4 matrix. Output of the function vepar.cgb2.

Details

Function scoreU.cgb2 calculates the N×(L1)N \times (L-1) matrix of scores U is defined as

U(k,)=p(F(k,)j=1LpjF(k,j)1),U(k,\ell)=p_{\ell} \left( \frac{F(k,\ell)}{\sum_{j=1}^L p_{j}\, F(k,j)} - 1\right),

where p,=1,..,Lp_\ell, \ell=1,..,L is the vector of fitted mixture probabilities and FF is the N×LN \times L matrix of gamma factors, output of fg.cgb2. The linearized scores are the columns of U. They serve to compute the linearization approximation of the covariance matrix of the parameters v=log(p/pL),=1,...,L1v_\ell=\log(p_\ell/p_L), \ell=1,...,L-1. Function varscore.cgb2 calculates the middle term of the sandwich variance estimator, that is the ((L1)×(L1))((L-1) \times (L-1)) estimated variance-covariance matrix of the (L1)(L-1) weighted sums of the columns of U, without design information. desvar.cgb2 calculates the design-based variance-covariance matrix of the (L1)(L-1) weighted sums of the columns of U, invoking svydesign and svytotal of package survey. hess.cgb2 calculates the Hessian ((L1)×(L1))(L-1) \times (L-1)) matrix of second derivatives of the pseudo-log-likelihood with respect to the parameters vv_\ell). It should be negative definite. If not, the maximum likelihood estimates are spurious. vepar.cgb2 calculates the sandwich covariance matrix estimate of the vector of parameters v. veind.cgb2 calculates estimates, standard error, covariance and correlation matrices of the indicators under the compound GB2.

Value

scoreU.cgb2 returns a N×(L1)N \times (L-1) matrix of scores <codeU. varscore.cgb2 returns the variance-covariance estimate of the weighted sums of scores U, given by weighted cross products. desvar.cgb2 returns a list of two elements. The first is the output of svytotal and the second is the design-based variance-covariance matrix of the weighted sums of the scores. hess.cgb2 returns the matrix of second derivatives of the likelihood with respect to the parameters (bread for the sandwich variance estimate). vepar.cgb2 returns a list of five elements - [["type"]] with value "parameter", [["estimate"]] estimated parameters, [["stderr"]] corresponding standard errors, [["Vcov"]] variance -covariance matrix and [["Vcor"]] - correlation matrix. veind.cgb2 returns a list of five elements: [["type"]] with value "indicator", [["estimate"]] estimated indicators under the compound GB2, [["stderr"]] corresponding standard errors, [["Vcov"]] variance -covariance matrix and [["Vcor"]] - correlation matrix.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Davison, A. (2003), Statistical Models. Cambridge University Press.

Freedman, D. A. (2006), On The So-Called "Huber Sandwich Estimator" and "Robust Standard Errors". The American Statistician, 60, 299–302.

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Pfeffermann, D. and Sverchkov, M. Yu. (2003), Fitting Generalized Linear Models under Informative Sampling. In, Skinner, C.J. and Chambers, R.L. (eds.). Analysis of Survey Data, chapter 12, 175–195. Wiley, New York.

Examples

## Not run: 
# Example (following of example in CompoundFit)

# Estimated mixture probabilities:
(pl.hat <- estim[[1]])

# scores per unit
U <- scoreU.cgb2(fac, pl.hat)  

# Conditional variances given a,b,p,q:

# 1. Variance of sum of scores:
(Vsc <- t(U) 
(Vsc <- varscore.cgb2(U))        

# 2. sandwich variance-covariance matrix estimate of (v_1,v_2):
(hess <-  hess.cgb2(U,pl.hat))
(Parameters <- vepar.cgb2(estim, Vsc, hess))

# 3. Theoretical indicators  (with mixture prob pl)
decomp <- "r"
(theoretical <- main.cgb2( 0.6,a,b,p,q,pl0, pl,decomp=decomp))

# Estimated indicators and conditional variances : takes a long time!
(Indic <- veind.cgb2(Parameters,a,b,p,q, pl0, pl.hat, decomp="r") )


## End(Not run)

Sensitivity Analysis of Laeken Indicators on GB2 Parameters

Description

Produces a contour plot of an indicator for a given shape1.

Usage

contindic.gb2(resol, shape1, shape21, shape22, shape31, shape32, fn, title, table=FALSE)

Arguments

resol

numeric; number of grid points horizontally and vertically.

shape1

numeric; positive parameter, first shape parameter of the GB2 distribution.

shape21, shape22, shape31, shape32

numeric; limits on the positive parameters of the Beta distribution.

fn

string; the name of the function to be used for the calculation of the values to be plotted.

title

string; title of the plot.

table

boolean; if TRUE, a table containing the values of the function fn at the different grid points is printed.

Details

An indicator is defined as a function of three parameters. The shape parameter, shape1, is held fixed. The shape parameters shape2 and shape3 vary between shape21 and shape22, and shape31 and shape32, respectively.

Value

A contour plot of a given indicator for a fixed value of the shape parameter shape1.

Author(s)

Monique Graf

See Also

contour (package graphics) for more details on contour plots.

Examples

par(mfrow=c(2,2))
shape21 <- 0.3
shape31 <- 0.36
shape22 <- 1.5
shape32 <- 1.5
shape11 <- 2.7
shape12 <- 9.2
resol <- 11
rangea <- round(seq(shape11, shape12 ,length.out=4),digits=1)
arpr <- function(shape1, shape2, shape3) 100*arpr.gb2(0.6, shape1, shape2, shape3)
fonc <- "arpr"
for (shape1 in rangea){
contindic.gb2(resol, shape1, shape21, shape22, shape31, shape32, arpr, "At-risk-of-poverty rate", 
table=TRUE)
}

Contour Plot of the Profile Log-likelihood of the GB2 Distribution

Description

Produces a contour plot of the profile log-likelihood, which is a function of two parameters only.

Usage

contprof.gb2(z, w=rep(1,length(z)), resol, low=0.1, high=20)

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

resol

numeric; number of grid points horizontally and vertically. For better graph quality, we recommend a value of 100.

low, high

numeric; lower and upper factors for scale.

Details

The matrix containing the values to be plotted (NAs are allowed) is of size resol ×\times resol. The locations of the grid lines at which the values of the profile log-likelihood are measured are equally-spaced values between low and high multiplied by the initial parameters.

Value

A contour plot of the profile log-likelihood. The initial Fisk estimate is added as point "F".

Author(s)

Monique Graf

See Also

fisk for the Fisk estimate, ProfLogLikelihood for the profile log-likelihood and contour (package graphics) for more details on contour plots.


Parameters of the Fisk Distribution

Description

Calculation of the parameters aa and bb of the Fisk distribution, which is a GB2 distribution with p=q=1p = q = 1. If mm and vv denote, respectively, the mean and variance of log(z)log(z), then a^=π/3v\hat{a} = \pi/\sqrt{3*v} and b^=exp(m)\hat{b} = \exp(m).

Usage

fisk(z, w=rep(1, length(z)))
fiskh(z, w=rep(1, length(z)), hs=rep(1, length(z)))

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

hs

numeric; vector of household sizes. Must have the same length as z. By default hs is a vector of 1.

Details

Function fisk first calculates the mean and variance of log(z)log(z) and next the values of aa and bb under the Fisk distribution. Function fiskh first calculates the mean and variance of log(z)log(z), assuming a sample of households, and next the values of aa and bb under the Fisk distribution.

Value

fisk and fiskh return vectors of length 4 containing the estimated parameters aa, eqnb, as well as p=1p=1 and q=1q=1.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

See Also

optim for the general-purpose optimization

Examples

library(laeken)
data(eusilc)

# Income
inc <- as.vector(eusilc$eqIncome)

# Weights
w <- eusilc$rb050

#Fisk parameters
fpar <- fisk(inc, w)

The Generalized Beta Distribution of the Second Kind

Description

Density, distribution function, quantile function and random generation for the Generalized beta distribution of the second kind with parameters a, b, p and q.

Usage

dgb2(x, shape1, scale, shape2, shape3)
pgb2(x, shape1, scale, shape2, shape3)
qgb2(prob, shape1, scale, shape2, shape3)
rgb2(n, shape1, scale, shape2, shape3)

Arguments

x

numeric; vector of quantiles.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

prob

numeric; vector of probabilities.

n

numeric; number of observations. If length(n) > 1, the length is taken to be the number required.

Details

The Generalized Beta distribution of the second kind with parameters shape1 =a= a, scale =b= b, shape2 =p= p and shape3 =q= q has density

f(x)=a(x/b)ap1bB(p,q)(1+(x/b)a)p+qf(x)=\frac{a(x/b)^{ap-1}}{bB(p,q)(1+(x/b)^{a})^{p+q}}

for a>0a > 0, b>0b > 0, p>0p > 0 and q>0q > 0, where B(p,q)B(p,q) is the Beta function (beta). If Z follows a Beta distribution with parameters pp and qq and

y=z1z,y = \frac{z}{1-z},

then

x=by1/ax = b * y^{1/a}

follows the GB2 distribution.

Value

dgb2 gives the density, pgb2 the distribution function, qgb2 the quantile function, and rgb2 generates random deviates.

Author(s)

Monique Graf

References

Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences, chapter 6. Wiley, Ney York.

McDonald, J. B. (1984) Some generalized functions for the size distribution of income. Econometrica, 52, 647–663.

See Also

beta for the Beta function and dbeta for the Beta distribution.

Examples

a <- 3.9
b <- 18873
p <- 0.97
q <- 1.03
x <- qgb2(0.6, a, b, p, q)
y <- dgb2(x, a, b, p, q)

Computation of the Gini Coefficient for the GB2 Distribution and its Particular Cases.

Description

Computes the Gini coefficient for the GB2 distribution using the function gb2.gini. Computes the Gini coefficient for the Beta Distribution of the Second Kind, Dagum and Singh-Maddala distributions.

Usage

gini.gb2(shape1, shape2, shape3)
gini.b2(shape2, shape3)
gini.dag(shape1, shape2)
gini.sm(shape1, shape3)

Arguments

shape1

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

Value

The Gini coefficient.

Author(s)

Monique Graf

References

Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences, chapter 6. Wiley, Ney York.

McDonald, J. B. (1984) Some generalized functions for the size distribution of income. Econometrica, 52, 647–663.

See Also

gb2.gini


Monetary Laeken Indicators under the GB2

Description

Functions to calculate four primary social welfare indicators under the GB2, i.e. the at-risk-of-poverty threshold, the at-risk-of-poverty rate, the relative median at-risk-of-poverty gap, and the income quintile share ratio.

Usage

arpt.gb2(prop, shape1, scale, shape2, shape3)
arpr.gb2(prop, shape1, shape2, shape3)
rmpg.gb2(arpr, shape1, shape2, shape3)
qsr.gb2(shape1, shape2, shape3)
main.gb2(prop, shape1, scale, shape2, shape3)
main2.gb2(prop, shape1, scale, shape12, shape13)

Arguments

prop

numeric; proportion (in general is set to 0.6).

arpr

numeric; the value of the at-risk-of-poverty rate.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

shape12

numeric; the product of the two parameters shape1 and shape2.

shape13

numeric; the product of the two parameters shape1 and shape3.

Details

In June 2006, the Social Protection Committee, which is a group of officials of the European Commisiion, adopts a set of common indicators for the social protection and social inclusion process. It consists of a portfolio of 14 overarching indicators (+11 context indicators) meant to reflect the overarching objectives (a) "social cohesion" and (b) "interaction with the Lisbon strategy for growth and jobs (launched in 2000) objectives"; and of three strand portfolios for social inclusion, pensions, and health and long-term care.

The at-risk-of-poverty threshold (or ARPT) is defined as 60% of the median national equivalized income.

The at-risk-of-poverty rate (or ARPR) is defined as the share of persons with an equivalised disposable income below the ARPT.

The relative median at-risk-of-poverty gap (or RMPG) is defined as the difference between the median equivalised income of persons below the ARPT and the ARPT itself, expressed as a percentage of the ARPT.

The income quintile share ratio (or QSR) is defined as the ratio of total income received by the 20% of the country's population with the highest income (top quintile) to that received by the 20% of the country's population with the lowest income (lowest quintile).

Let x0.5x_{0.5} be the median of a GB2 with parameters shape1 =a= a, scale =b= b, shape2 =p= p and shape3 =q= q. Then,

ARPT(a,b,p,q)=0.6x0.5ARPT(a,b,p,q)=0.6 x_{0.5}

The ARPR being scale-free, bb can be chosen arbitrarily and can be fixed to 1.

The QSR is calculated with the help of the incomplete moments of order 1.

main.gb2 and main2.gb2 return a vector containing the following set of GB2 indicators: the median, the mean, the ARPR, the RMPG, the QSR and the Gini coefficient. The only difference is in the input parameters.

Value

arpt.gb2 gives the ARPT, arpr.gb2 the ARPR, rmpg.gb2 the RMPG, and qsr.gb2 calculates the QSR. main.gb2 returns a vector containing the median of the distribution, the mean of the distribution, the ARPR, the RMPG, the QSR and the Gini coefficient. main2.gb2 produces the same output as main.gb2.

Author(s)

Monique Graf

References

https://ec.europa.eu/social/main.jsp?langId=en&catId=750

See Also

qgb2, incompl.gb2

Examples

a <- 3.9
b <- 18873
p <- 0.97
q <- 1.03
ap <- a*p
aq <- a*q

arpt <- arpt.gb2(0.6, a, b, p, q)
arpr <- arpr.gb2(0.6, a, p, q)
rmpg <- rmpg.gb2(arpr, a, p, q)
qsr <- qsr.gb2(a, p, q)

ind1 <- main.gb2(0.6, a, b, p, q)
ind2 <- main2.gb2(0.6, a, b, ap, aq)

Log Density of the GB2 Distribution

Description

Calculates the log density of the GB2 distribution for a single value or a vector of values. Calculates the first- and second-order partial derivatives of the log density evaluated at a single value.

Usage

logf.gb2(x, shape1, scale, shape2, shape3)
dlogf.gb2(xi, shape1, scale, shape2, shape3)
d2logf.gb2(xi, shape1, scale, shape2, shape3)

Arguments

xi

numeric; a data value.

x

numeric; a vector of data values.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

Details

We calculate log(f(x,θ))log(f(x, \theta)), where ff is the GB2 density with parameters shape1 =a= a, scale =b= b, shape2 =p= p and shape3 =q= q, θ\theta is the parameter vector. We calculate the first- and second-order partial derivatives of log(f(x,θ))log(f(x, \theta)) with respect to the parameter vector θ\theta.

Value

Depending on the input logf.gb2 gives the log density for a single value or a vector of values. dlogf.gb2 gives the vector of the four first-order partial derivatives of the log density and d2logf.gb2 gives the 4×44 \times 4 matrix of second-order partial derivatives of the log density.

Author(s)

Desislava Nedyalkova

References

Brazauskas, V. (2002) Fisher information matrix for the Feller-Pareto distribution. Statistics & Probability Letters, 59, 159–167.


Full Log-likelihood of the GB2 Distribution

Description

Calculates the log-likelihood, the score functions of the log-likelihood and the Fisher information matrix based on all four parameters of the GB2 distribution.

Usage

loglp.gb2(x, shape1, scale, shape2, shape3, w=rep(1, length(x)))
loglh.gb2(x, shape1, scale, shape2, shape3, w=rep(1, length(x)), hs=rep(1, length(x)))
scoresp.gb2(x, shape1, scale, shape2, shape3, w=rep(1, length(x)))
scoresh.gb2(x, shape1, scale, shape2, shape3, w=rep(1, length(x)), hs=rep(1, length(x)))
info.gb2(shape1, scale, shape2, shape3)

Arguments

x

numeric; vector of data values.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

w

numeric; vector of weights. Must have the same length as x. By default w is a vector of 1.

hs

numeric; vector of household sizes. Must have the same length as x. By default hs is a vector of 1.

Details

We express the log-likelihood as a weighted mean of log(f)log(f), evaluated at the data points, where ff is the GB2 density with parameters shape1 =a= a, scale =b= b, shape2 =p= p and shape3 =q= q. If the weights are not available, then we suppose that w =1= 1. loglp.gb2 calculates the log-likelihood in the case where the data is a sample of persons and loglh.gb2 is adapted for a sample of households. Idem for the scores, which are obtained as weighted sums of the first derivatives of log(f)log(f) with respect to the GB2 parameters, evaluated at the data points. The Fisher information matrix of the GB2 was obtained by Brazauskas (2002) and is expressed in terms of the second derivatives of the log-likelihood with respect to the GB2 parameters.

Author(s)

Monique Graf

References

Brazauskas, V. (2002) Fisher information matrix for the Feller-Pareto distribution. Statistics & Probability Letters, 59, 159–167.

Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences, chapter 6. Wiley, Ney York.


Fitting the GB2 by the Method of Maximum Likelihood Estimation and Comparison of the Fitted Indicators with the Empirical Indicators

Description

The function mlfit.gb2 makes a call to ml.gb2 and profml.gb2. Estimates of the GB2 parameters are calculated using maximum likelihood estimation based on the full and profile log-likelihoods. Empirical estimates of the set of primary indicators of poverty and social inclusion are calculated using the function main.emp (see package laeken) and these estimates are compared with the indicators calculated with the GB2 fitted parameters using the function main.gb2.

Usage

main.emp(z, w=rep(1, length(z)))
mlfit.gb2(z, w=rep(1, length(z)))

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

Value

A list containing three different objects. The first is a data frame with the values of the fitted parameters for the full log-likelihood and the profile log-likelihood, the values of the two likelihoods, the values of the GB2 estimates of the indicators and the values of the empirical estimates of the indicators. The second and third objects are, respectively, the fit using the full log-likelihood and the fit using the profile log-likelihood.

Author(s)

Monique Graf and Desislava Nedyalkova

See Also

optim , ml.gb2, profml.gb2

Examples

# An example of using the function mlfit.gb2
# See also the examples of ml.gb2 and mlprof.gb2

## Not run: 
library(laeken)
data(eusilc)

# Income
inc <- as.vector(eusilc$eqIncome)

# Weights
w <- eusilc$rb050

# Data set
d <- data.frame(inc, w)
d <- d[!is.na(d$inc),]

# Truncate at 0
inc <- d$inc[d$inc > 0]
w   <- d$w[d$inc > 0]

# ML fit
m1 <- mlfit.gb2(inc,w)

# GB2 fitted parameters and indicators through maximum likelihood estimation
m1[[1]]
# The fit using the full log-likelihood
m1[[2]]
# The fit using the profile log-likelihood
m1[[3]]

# ML fit, when no weights are avalable 
m2 <- mlfit.gb2(inc)
# Results
m2[[1]]

## End(Not run)

Maximum Likelihood Estimation of the GB2 Based on the Full Log-likelihood

Description

Performs maximum pseudo-likelihood estimation through the general-purpose optimisation function optim from package stats. Two methods of optimization are considered: BFGS and L-BFGS-B (see optim documentation for more details). Initial values of the parameters to be optimized over (aa, bb, pp and qq) are given from the Fisk distribution and p=q=1p=q=1. The function to be maximized by optim is the negative of the full log-likelihood and the gradient is equal to the negative of the scores, respectively for the case of a sample of persons and a sample of households.

Usage

ml.gb2(z, w=rep(1, length(z)), method=1, hess=FALSE) 
mlh.gb2(z, w=rep(1, length(z)), hs=rep(1, length(z)), method=1, hess = FALSE)

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

hs

numeric; vector of household sizes. Must have the same length as z. By default hs is a vector of 1.

method

numeric; the method to be used by optim. By default, codemethod = 1 and the used method is BFGS. If method = 2, method L-BFGS-B is used.

hess

logical; By default, hess = FALSE, the hessian matrix is not calculated.

Details

Function ml.gb2 performs maximum likelihood estimation through the general-purpose optimization function optim from package stats, based on the full log-likelihood calculated in a sample of persons. Function mlh.gb2 performs maximum likelihood estimation through the general-purpose optimization function optim from package stats, based on the full log-likelihood calculated in a sample of households.

Value

ml.gb2 and mlh.gb2 return a list with 1 argument: opt1 for the output of the BFGS fit or opt2 for the output of the L-BFGS fit. Further values are given by the values of optim.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

See Also

optim for the general-purpose optimization and fisk for the Fisk distribution.

Examples

## Not run: 
library(laeken)
data(eusilc)

# Income
inc <- as.vector(eusilc$eqIncome)

# Weights
w <- eusilc$rb050

# Data set
d <- data.frame(inc, w)
d <- d[!is.na(d$inc),]
   
# Truncate at 0
inc <- d$inc[d$inc > 0]
w   <- d$w[d$inc > 0]

# Fit using the full log-likelihood
fitf <- ml.gb2(inc, w)

# Fitted GB2 parameters
af <- fitf$par[1]
bf <- fitf$par[2]
pf <- fitf$par[3]
qf <- fitf$par[4]

# Likelihood
flik <- fitf$value

# If we want to compare the indicators

# GB2 indicators
indicf <- round(main.gb2(0.6,af,bf,pf,qf), digits=3)
# Empirical indicators
indice <- round(main.emp(inc,w), digits=3)

# Plots
plotsML.gb2(inc,af,bf,pf,qf,w)

## End(Not run)

Maximum Likelihood Estimation of the GB2 Based on the Profile Log-likelihood

Description

profml.gb2 performs maximum likelihood estimation based on the profile log-likelihood through the general-purpose optimisation function optim from package stats.

Usage

profml.gb2(z, w=rep(1, length(z)), method=1, hess = FALSE)

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

method

numeric; the method to be used by optim. By default, codemethod = 1 and the used method is BFGS. If method = 2, method L-BFGS-B is used.

hess

logical; By default, hess = FALSE, the hessian matrix is not calculated.

Details

Two methods are considered: BFGS and L-BFGS-B (see optim documentation for more details). Initial values of the parameters to be optimized over (aa and bb) are given from the Fisk distribution. The function to be maximized by optim is the negative of the profile log-likelihood (proflogl.gb2) and the gradient is equal to the negative of the scores (profscores.gb2).

Value

A list with 1 argument: opt1 for the output of the BFGS fit or opt2 for the output of the L-BFGS fit. Further values are given by the values of optim.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

See Also

optim for the general-purpose optimization, link{ml.gb2} for the fit using the full log-likelihood and fisk for the Fisk distribution.

Examples

library(laeken)
data(eusilc)

# Income
inc <- as.vector(eusilc$eqIncome)

# Weights
w <- eusilc$rb050

# Data set
d <- data.frame(inc,w)
d <- d[!is.na(d$inc),]
   
# Truncate at 0
inc <- d$inc[d$inc > 0]
w   <- d$w[d$inc > 0]

# Fit using the profile log-likelihood
fitp <- profml.gb2(inc, w)$opt1

# Fitted GB2 parameters
ap <- fitp$par[1]
bp <- fitp$par[2]
pp <- prof.gb2(inc, ap, bp, w)[3]
qp <- prof.gb2(inc, ap, bp, w)[4]

# Profile log-likelihood
proflik <- fitp$value

# If we want to compare the indicators
## Not run: 
# GB2 indicators
indicp <- round(main.gb2(0.6,ap,bp,pp,qp), digits=3)
# Empirical indicators
indice <- round(main.emp(inc,w), digits=3)

## End(Not run)

# Plots
## Not run: plotsML.gb2(inc,ap,bp,pp,qp,w)

Moments and Other Properties of a GB2 Random Variable

Description

These functions calculate the moments of order k and incomplete moments of order k of a GB2 random variable XX as well as the expectation, the variance, the kurtosis and the skewness of log(X)log(X).

Usage

moment.gb2(k, shape1, scale, shape2, shape3)
incompl.gb2(x, k, shape1, scale, shape2, shape3)
el.gb2(shape1, scale, shape2, shape3)
vl.gb2(shape1, shape2, shape3)
sl.gb2(shape2, shape3)
kl.gb2(shape2, shape3)

Arguments

x

numeric; vector of quantiles.

k

numeric; order of the moment.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

Details

Let XX be a random variable following a GB2 distribution with parameters shape1 =a= a, scale =b= b, shape2 =p= p and shape3 =q= q. Moments and incomplete moments of XX exist only for apkaq-ap \le k \le aq. Moments are given by

E(Xk)=bkΓ(p+k/a)Γ(qk/a)Γ(p)Γ(q)E(X^k) = {b}^{k} \frac{\Gamma (p+k/a) \Gamma (q-k/a)}{\Gamma (p) \Gamma (q)}

This expression, when considered a function of k, can be viewed as the moment-generating function of Y=log(X)Y=log(X). Thus, it is useful to compute the moments of log(X)log(X), which are needed for deriving, for instance, the Fisher information matrix of the GB2 distribution. Moments of log(X)log(X) exist for all k.

Value

moment.gb2 gives the moment of order k, incompl.gb2 gives the incomplete moment of order k, El.gb2 gives the expectation of log(X)log(X), vl.gb2 gives the variance of log(X)log(X), sl.gb2 gives the skewness of log(X)log(X), kl.gb2 gives the kurtosis of log(X)log(X).

Author(s)

Monique Graf

References

Kleiber, C. and Kotz, S. (2003) Statistical Size Distributions in Economics and Actuarial Sciences, chapter 6. Wiley, Ney York.

See Also

gamma for the Gamma function and related functions (digamma, trigamma and psigamma).

Examples

a <- 3.9
b <- 18873
p <- 0.97
q <- 1.03
k <- 2
x <- qgb2(0.6, a, b, p, q)
moment.gb2(k, a, b, p, q)
incompl.gb2(x, k, a, b, p, q)
vl.gb2(a, p, q)
kl.gb2(p, q)

Fitting the GB2 by Minimizing the Distance Between a Set of Empirical Indicators and Their GB2 Expressions

Description

Fitting the parameters of the GB2 distribution by optimizing the squared weighted distance between a set of empirical indicators, i.e. the median, the ARPR, the RMPG, the QSR and the Gini coefficient, and the GB2 indicators using nonlinear least squares (function nls from package stats).

Usage

nlsfit.gb2(med, ei4, par0=c(1/ei4[4],med,1,1), cva=1, bound1=par0[1]*max(0.2,1-2*cva), 
bound2=par0[1]*min(2,1+2*cva), ei4w=1/ei4)

Arguments

med

numeric; the empirical median.

ei4

numeric; the values of the empirical indicators.

par0

numeric; vector of initial values for the GB2 parameters a,b,pa, b, p and qq. The default is to take aa equal to the inverse of the empirical Gini coefficient, bb equal to the empirical median and p=q=1p = q = 1.

cva

numeric; the coefficient of variation of the ML estimate of the parameter aa. The default value is 1.

bound1, bound2

numeric; the lower and upper bounds for the parameter aa in the algorithm. The default values are 0.2a00.2*a_{0} and 2a02*a_{0}, where a0a_{0} is the initial value of the parameter aa.

ei4w

numeric; vector of weights of to be passed to the nls function. The default values are the inverse of the empirical indicators.

Details

We consider the following set of indicators A=(median,ARPR,RMPG,QSR,Gini)A = (median, ARPR, RMPG, QSR, Gini) and their corresponding GB2 expressions AGB2A_{GB2}. We fit the parameters of the GB2 in two consecutive steps. In the first step, we use the set of indicators, excluding the median, and their corresponding expressions in function of aa, apap and aqaq. The bounds for aa are defined in function of the coefficient of variation of the fitted parameter (^a)\hat(a). The nonlinear model that is passed to nls is given by:

i=14ci(Aempir,iAGB2,i(a,ap,aq))2,\sum_{i=1}^4 c_i (A_{empir,i}-A_{GB2,i}(a,ap,aq))^2,

where the weights cic_i take the differing scales into account and are given by the vector ei4w. apap and aqaq are bounded so that the constraints for the existence of the moments of the GB2 distribution and the excess for calculating the Gini coefficient are fulfilled, i.e. ap1ap \ge 1 and aq2aq \ge 2. In the second step, only the the parameter bb is estimated, optimizing the weighted square difference between the empirical median and the GB2 median in function of the already obtained NLS parameters a,pa, p and qq.

Value

nlsfit.gb2 returns a list of three values: the fitted GB2 parameters, the first fitted object and the second fitted object.

Author(s)

Monique Graf and Desislava Nedyalkova

See Also

nls, Thomae, moment.gb2

Examples

# Takes long time to run, as it makes a call to the function ml.gb2

## Not run: 
library(laeken)
data(eusilc)

# Personal income
inc <- as.vector(eusilc$eqIncome)

# Sampling weights
w <- eusilc$rb050

# Data set
d <- data.frame(inc, w)
d <- d[!is.na(d$inc),]
   
# Truncate at 0
d <- d[d$inc > 0,]
inc <- d$inc
w   <- d$w

# ML fit, full log-likelihood
fitf <- ml.gb2(inc, w)$opt1

# Estimated parameters
af <- fitf$par[1]
bf <- fitf$par[2]
pf <- fitf$par[3]
qf <- fitf$par[4]

gb2.par <- c(af, bf, pf, qf)

# Empirical indicators
indicEMP  <- main.emp(inc, w)
indicEMP <- c(indicEMP[1],indicEMP[3:6])
indicE <- round(indicEMP, digits=3)

# Nonlinear fit
nn <- nlsfit.gb2(indicEMP[1,3:6],indicEMP[3:6])
an <- nn[[1]][1]
bn <- nn[[1]][2]
pn <- nn[[1]][3]
qn <- nn[[1]][4]

# GB2 indicators
indicNLS <- c(main.gb2(0.6, an, bn, pn, qn)[1], main.gb2(0.6, an, bn, pn, qn)[3:6])
indicML <- c(main.gb2(0.6, af, bf, pf, qf)[1], main.gb2(0.6, af, bf, pf, qf)[3:6])
indicN <- round(indicNLS, digits=3)
indicM <- round(indicML, digits=3)

# Likelihoods
nlik <- loglp.gb2(inc, an, bn, pn, qn, w)
mlik <- loglp.gb2(inc, af, bf, pf, qf, w)

# Results
type=c("Emp. est", "NLS", "ML full")
results <- data.frame(type=type,
        median=c(indicE[1], indicN[1], indicM[1]),
        ARPR=c(indicE[2], indicN[2], indicM[2]),
        RMPG=c(indicE[3], indicN[3], indicM[3]),
        QSR =c(indicE[4], indicN[4], indicM[4]),
        GINI=c(indicE[5], indicN[5], indicM[5]),
        likelihood=c(NA, nlik, mlik),
        a=c(NA, an, af), b=c(NA, bn, bf) ,p=c(NA, pn, pf), q=c(NA, qn, qf))


## End(Not run)

Cumulative Distribution Plot and Kernel Density Plot for the Fitted GB2

Description

Function plotsML.gb2 produces two plots. The first is a plot of the empirical cumulative distribution function versus the fitted cumulative distibution function. The second is a plot of the kernel density versus the fitted GB2 density. Function saveplot saves locally the produced plot.

Usage

plotsML.gb2(z, shape1, scale, shape2, shape3, w=rep(1,length(z))) 
saveplot(name, pathout)

Arguments

z

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as z. By default w is a vector of 1.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

name

string; the name of the plot.

pathout

string; the path of the folder or device where the plot will be saved.

Details

The used kernel is "Epanechnikov" (see plot).

Author(s)

Monique Graf and Desislava Nedyalkova


Profile Log-likelihood of the GB2 Distribution

Description

Expression of the parameters shape2 =p=p and shape3 =q=q of the GB2 distribution as functions of shape1 =a=a and scale =b=b, profile log-likelihood of the GB2 distribution, scores of the profile log-likelihood.

Usage

prof.gb2(x, shape1, scale, w=rep(1, length(x)))
proflogl.gb2(x, shape1, scale, w=rep(1, length(x)))
profscores.gb2(x, shape1, scale, w=rep(1, length(x)))

Arguments

x

numeric; vector of data values.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

w

numeric; vector of weights. Must have the same length as x. By default w is a vector of 1.

Details

Using the full log-likelihood equations for the GB2 distribution, the parameters pp and qq can be estimated as functions of aa and bb. These functions are plugged into the log-likelihood expression, which becomes a function of aa and bb only. This is obtained by reparametrizing the GB2, i.e. we set r=pp+qr=\frac{p}{p+q} and s=p+qs=p+q. More details can be found in Graf (2009).

Value

prof returns a vector containing the values of rr, ss, pp, qq as well as two other parameters used in the calculation of the profile log-likelihood and its first derivatives. proflogl.gb2 returns the value of the profile log-likelihood and profscores.gb2 returns the vector of the first derivatives of the profile log-likelihhod with respect to aa and bb.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Graf, M. (2009) The Log-Likelihood of the Generalized Beta Distribution of the Second Kind. working paper, SFSO.


Robustification of the sampling weights

Description

Calculation of a Huber-type correction factor by which the vector of weights is multiplied.

Usage

robwts(x, w=rep(1,length(x)), c=0.01, alpha=0.001)

Arguments

x

numeric; vector of data values.

w

numeric; vector of weights. Must have the same length as x. By default w is a vector of 1.

c

numeric; a constant which can take different values, e.g. 0.01,0.02. By default c=0.1c=0.1.

alpha

numeric; a probability in the interval (0,1)(0,1). By default alpha=0.001alpha=0.001.

Details

If xx denotes the observed value and xαx_{\alpha} the α\alpha-th qiantile of the Fisk distribution, then we define our scale as:

d=x1αbxαbd = \displaystyle \frac{x_{1-\alpha}}{b} - \frac{x_{\alpha}}{b}

. Next, the correction factor is calculated as follows:

corr=max{c,min(1,db/x1,dx/b1)}corr = \max\left\{c, \min\left(1,\displaystyle \frac{d}{|b/x-1|},\frac{d}{|x/b-1|}\right)\right\}

Value

robwts returns a list of two elements: the vector of correction factors by which the weights are multiplied and the vector of corrected (robustified) weights.

Author(s)

Monique Graf

References

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.


Maximum Excess Representation of a Generalized Hypergeometric Function Using Thomae's Theorem

Description

Defines Thomae's arguments from the upper (U) and lower (L) parameters of a 3F2(U,L;1)_{3}F_{2}(U,L;1). Computes the optimal combination leading to the maximum excess. Computes the optimal combination of Thomae's arguments and calculates the optimal representation of the 3F2(U,L;1)_{3}F_{2}(U,L;1) using the genhypergeo_series function from package hypergeo. Computes the Gini coefficient for the GB2 distribution, using Thomae's theorem.

Usage

ULg(U, L)
combiopt(g)
Thomae(U, L, lB, tol, maxiter, debug)
gb2.gini(shape1, shape2, shape3, tol=1e-08, maxiter=10000, debug=FALSE)

Arguments

U

numeric; vector of length 3 giving the upper arguments of the generalized hypergeometric function 3F2_{3}F_{2}.

L

numeric; vector of length 2 giving the lower arguments of the generalized hypergeometric function 3F2_{3}F_{2}.

g

numeric; vector of Thomae's permuting arguments.

lB

numeric; ratio of beta functions (a common factor in the expression of the Gini coefficient under the GB2).

shape1

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

tol

numeric; tolerance with default 0, meaning to iterate until additional terms do not change the partial sum.

maxiter

numeric; maximum number of iterations to perform.

debug

boolean; if TRUE, returns the list of changes to the partial sum.

Details

Internal use only. More details can be found in Graf (2009).

Value

ULg returns a list containing Thomae's arguments and the excess, combiopt gives the optimal combination of Thomae's arguments, Thomae returns the optimal representation of the 3F2(U,L;1)_{3}F_{2}(U,L;1), gb2.gini returns the value of the Gini coefficient under the GB2.

Author(s)

Monique Graf

References

Graf, M. (2009) An Efficient Algorithm for the Computation of the Gini coefficient of the Generalized Beta Distribution of the Second Kind. ASA Proceedings of the Joint Statistical Meetings, 4835–4843. American Statistical Association (Alexandria, VA).

McDonald, J. B. (1984) Some generalized functions for the size distribution of income. Econometrica, 52, 647–663.

See Also

genhypergeo_series, gini.gb2


Variance Estimation of the Parameters of the GB2 Distribution

Description

Calculation of variance estimates of the estimated GB2 parameters and the estimated GB2 indicators under cluster sampling.

Usage

varscore.gb2(x, shape1, scale, shape2, shape3, w=rep(1,length(x)), hs=rep(1,length(x)))
vepar.gb2(x, Vsc, shape1, scale, shape2, shape3, w=rep(1,length(x)), hs=rep(1,length(x)))
derivind.gb2(shape1, scale, shape2, shape3)
veind.gb2(Vpar, shape1, scale, shape2, shape3)

Arguments

x

numeric; vector of data values.

Vsc

numeric; 4 by 4 matrix.

shape1

numeric; positive parameter.

scale

numeric; positive parameter.

shape2, shape3

numeric; positive parameters of the Beta distribution.

w

numeric; vector of weights. Must have the same length as x. By default w is a vector of 1.

hs

numeric; vector of household sizes. Must have the same length as x. By default w is a vector of 1.

Vpar

numeric; 4 by 4 matrix.

Details

Knowing the first and second derivatives of log(f)log(f), and using the sandwich variance estimator (see Freedman (2006)), the calculation of the variance estimates of the GB2 parameters is straightforward. Vsc is a square matrix of size the number of parameters, e.g. the estimated design variance-covariance matrix of the estimated parameters. We know that the GB2 estimates of the Laeken indicators are functions of the GB2 parameters. In this case, the variance estimates of the fitted indicators are obtained using the delta method. The function veind.gb2 uses Vpar, the sandwich variance estimator of the vector of parameters, in order to obtain the sandwich variance estimator of the indicators. More details can be found in Graf and Nedyalkova (2011).

Value

varscore.gb2 calculates the middle term of the sandwich variance estimator under simple random cluster sampling. vepar.gb2 returns a list of two elements: the estimated variance-covariance matrix of the estimated GB2 parameters and the second-order partial derivative of the pseudo log-likelihood function. The function veind.gb2 returns the estimated variance-covariance matrix of the estimated GB2 indicators. derivind.gb2 calculates the numerical derivatives of the GB2 indicators and is for internal use only.

Author(s)

Monique Graf and Desislava Nedyalkova

References

Davison, A. (2003), Statistical Models. Cambridge University Press.

Freedman, D. A. (2006), On The So-Called "Huber Sandwich Estimator" and "Robust Standard Errors". The American Statistician, 60, 299–302.

Graf, M., Nedyalkova, D., Muennich, R., Seger, J. and Zins, S. (2011) AMELI Deliverable 2.1: Parametric Estimation of Income Distributions and Indicators of Poverty and Social Exclusion. Technical report, AMELI-Project.

Pfeffermann, D. and Sverchkov, M. Yu. (2003), Fitting Generalized Linear Models under Informative Sampling. In, Skinner, C.J. and Chambers, R.L. (eds.). Analysis of Survey Data, chapter 12, 175–195. Wiley, New York.

Examples

# An example of variance estimation of the GB2 parameters,
# using the dataset "eusilcP" from the R package simFrame.
# Takes long time to run

## Not run: 
library(survey)
library(simFrame)
data(eusilcP)

# Draw a sample from eusilcP
# 1-stage simple random cluster sampling of size 6000 (cluster = household)
# directly,
#s <- draw(eusilcP[, c("hid", "hsize", "eqIncome")], grouping = "hid", size = 6000)

# or setting up 250 samples, and drawing the first one.
# This sample setup can be used for running a simulation.
set.seed(12345)
scs <- setup(eusilcP, grouping = "hid", size = 6000, k = 250)
s <- draw(eusilcP[, c("region", "hid", "hsize", "eqIncome")], scs, i=1)

# The number of observations (persons) in eusilcP (58654 persons)
\dontrun{N <- dim(eusilcP)[1]} 
# The number of households in eusilcP (25000 households)
Nh <- length(unique(eusilcP$hid)) 

# Survey design corresponding to the drawn sample
sdo = svydesign(id=~hid, fpc=rep(Nh,nrow(s)), data=s)
\dontrun{summary(sdo)}

# Truncated sample (truncate at 0)
s <- s[!is.na(s$eqIncome),] 
str <- s[s$eqIncome > 0, ]
eqInc <- str$eqIncome
w <- str$.weight            

# Designs for the truncated sample
sdotr <- subset(sdo, eqIncome >0)
sddtr = svydesign(id=~hid, strata=~region, fpc=NULL, weights=~.weight, data=str)
\dontrun{summary(sdotr)}
\dontrun{summary(sddtr)}

# Fit by maximum likelihood
fit <- ml.gb2(eqInc,w)$opt1
af <- fit$par[1]
bf <- fit$par[2]
pf <- fit$par[3]
qf <- fit$par[4]
mlik <- -fit$value

# Estimated parameters and indicators, empirical indicators 
gb2.par <- round(c(af, bf, pf, qf), digits=3)
emp.ind <- main.emp(eqInc, w)
gb2.ind <- main.gb2(0.6, af, bf, pf, qf)

# Scores
scores <- matrix(nrow=length(eqInc), ncol=4)
for (i in 1:length(eqInc)){
scores[i,] <- dlogf.gb2(eqInc[i], af, bf, pf, qf)
}

# Data on households only
sh <- unique(str)
heqInc <- sh$eqIncome
hw <- sh$.weight
hhs <- sh$hsize 
hs <- as.numeric(as.vector(hhs))  

# Variance of the scores
VSC <- varscore.gb2(heqInc, af, bf, pf, qf, hw, hs)

# Variance of the scores using the explicit designs, and package survey
DV1 <- vcov(svytotal(~scores[,1]+scores[,2]+scores[,3]+scores[,4], design=sdotr))
DV2 <- vcov(svytotal(~scores[,1]+scores[,2]+scores[,3]+scores[,4], design=sddtr))

# Estimated variance-covariance matrix of the parameters af, bf, pf and qf 
VCMP <- vepar.gb2(heqInc, VSC, af, bf, pf, qf, hw, hs)[[1]]
DVCMP1 <- vepar.gb2(heqInc, DV1, af, bf, pf, qf, hw, hs)[[1]]
DVCMP2 <- vepar.gb2(heqInc, DV2, af, bf, pf, qf, hw, hs)[[1]]

\dontrun{diag(DVCMP1)/diag(VCMP)}
\dontrun{diag(DVCMP2)/diag(VCMP)}
\dontrun{diag(DV1)/diag(VSC)}
\dontrun{diag(DV2)/diag(VSC)}

# Standard errors of af, bf, pf and qf
se.par <- sqrt(diag(VCMP))
sed1.par <- sqrt(diag(DVCMP1))
sed2.par <- sqrt(diag(DVCMP2))

# Estimated variance-covariance matrix of the indicators (VCMI)
VCMI <- veind.gb2(VCMP, af, bf, pf, qf) 
DVCMI1 <- veind.gb2(DVCMP1, af, bf, pf, qf)
DVCMI2 <- veind.gb2(DVCMP2, af, bf, pf, qf)

# Standard errors and confidence intervals
varest.ind <- diag(VCMI)
se.ind <- sqrt(varest.ind)
lci.ind <- gb2.ind - 1.96*se.ind
uci.ind <- gb2.ind + 1.96*se.ind
inCI <- as.numeric(lci.ind <= emp.ind & emp.ind <= uci.ind)

# under the sampling design sdotr

varestd1.ind <- diag(DVCMI1)
sed1.ind <- sqrt(varestd1.ind)
lcid1.ind <- gb2.ind - 1.96*sed1.ind
ucid1.ind <- gb2.ind + 1.96*sed1.ind
inCId1 <- as.numeric(lcid1.ind <= emp.ind & emp.ind <= ucid1.ind)

#under the sampling design sddtr

varestd2.ind <- diag(DVCMI2)
sed2.ind <- sqrt(varestd2.ind)
lcid2.ind <- gb2.ind - 1.96*sed2.ind
ucid2.ind <- gb2.ind + 1.96*sed2.ind
inCId2 <- as.numeric(lcid2.ind <= emp.ind & emp.ind <= ucid2.ind)

#coefficients of variation .par (parameters), .ind (indicators)
cv.par <- se.par/gb2.par
names(cv.par) <- c("am","bm","pm","qm")
cvd1.par <- sed1.par/gb2.par
names(cvd1.par) <- c("am","bm","pm","qm")
cvd2.par <- sed2.par/gb2.par
names(cvd2.par) <- c("am","bm","pm","qm")

cv.ind <- se.ind/gb2.ind
cvd1.ind <- sed1.ind/gb2.ind
cvd2.ind <- sed2.ind/gb2.ind


#results 
res <- data.frame(am = af, bm = bf, pm = pf, qm = qf, lik = mlik,
  median = gb2.ind[[1]], mean = gb2.ind[[2]], ARPR = gb2.ind[[3]], 
    RMPG = gb2.ind[[4]], QSR = gb2.ind[[5]], Gini = gb2.ind[[6]],
  emedian = emp.ind[[1]], emean = emp.ind[[2]], eARPR = emp.ind[[3]], 
    eRMPG = emp.ind[[4]], eQSR = emp.ind[[5]], eGini = emp.ind[[6]],
  cva = cv.par[1], cvb = cv.par[2], cvp= cv.par[3], cvq = cv.par[4],
  cvd1a = cvd1.par[1], cvd1b = cvd1.par[2], cvd1p= cvd1.par[3], cvd1q = cvd1.par[4],
  cvd2a = cvd2.par[1], cvd2b = cvd2.par[2], cvd2p= cvd2.par[3], cvd2q = cvd2.par[4],
  cvmed = cv.ind[[1]], cvmean = cv.ind[[2]], cvARPR = cv.ind[[3]], 
  cvRMPG = cv.ind[[4]], cvQSR = cv.ind[[5]], cvGini = cv.ind[[6]],
  cvd1med = cvd1.ind[[1]], cvd1mean = cvd1.ind[[2]], cvd1ARPR = cvd1.ind[[3]], 
  cvd1RMPG = cvd1.ind[[4]], cvd1QSR = cvd1.ind[[5]], cvd1Gini = cvd1.ind[[6]],
  cvd2med = cvd2.ind[[1]], cvd2mean = cvd2.ind[[2]], cvd2ARPR = cvd2.ind[[3]], 
  cvd2RMPG = cvd2.ind[[4]], cvd2QSR = cvd2.ind[[5]], cvd2Gini = cvd2.ind[[6]])

  res <- list(parameters = data.frame(am = af, bm = bf, pm = pf, qm = qf, lik = mlik),
              cv.parameters.naive = cv.par,
              cv.parameters.design1 = cvd1.par,
              cv.parameters.design2 = cvd2.par,
		  GB2.indicators = gb2.ind,
              emp.indicators = emp.ind,
              cv.indicators.naive = cv.ind,
              cv.indicators.design1 = cvd1.ind,
              cv.indicators.design2 = cvd2.ind)
res
\dontrun{inCI}

## End(Not run)