Title: | Inequality Measures for Weighted Data |
---|---|
Description: | Computes inequality measures of a given variable taking into account weights. Suitable for ratio, interval and ordered scale. Includes Gini, Theil, Leti index, Palma ratio, 20:20 ratio, Allison and Foster index, Jenkins index, Cowell and Flechaire index, Abul Naga and Yalcin index, Apouey index, Blair and Lacy index. Bootstrap provides distribution of inequality measures enabling significance tests. |
Authors: | Sebastian Wójcik [aut, cre] , Agnieszka Giemza [aut], Katarzyna Machowska [aut], Jarosław Napora [aut] |
Maintainer: | Sebastian Wójcik <[email protected]> |
License: | GPL-3 |
Version: | 1.2.1 |
Built: | 2025-01-08 06:45:48 UTC |
Source: | CRAN |
Computes Allison and Foster inequality measure of a given variable taking into account weights.
AF(X, W = rep(1, length(X)), norm = TRUE)
AF(X, W = rep(1, length(X)), norm = TRUE)
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
norm |
(logical). If TRUE (default) then index is divided by a maximum possible value which is a difference between maximum and minimum of X |
Let be the vector of categories in increasing order,
be the median category and
be a share of
-th category. The following index was proposed by Allison and Foster (2004):
Note that above formula is valid only for numerical values. Thus, in order to compute AF for ordered factor, X is converted to numerical variable.
The value of Allison and Foster coefficient.
Allison R. A., Foster J E.: (2004) Measuring health inequality using qualitative data, Journal of Health Economics
# Compare weighted and unweighted result X=1:10 W=1:10 AF(X) AF(X,W) data(Well_being) # Allison and Foster index for health assessment with sample weights X=Well_being$V11 W=Well_being$Weight AF(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 AF(X) AF(X,W) data(Well_being) # Allison and Foster index for health assessment with sample weights X=Well_being$V11 W=Well_being$Weight AF(X,W)
Computes Abul Naga and Yalcin inequality measure of a given variable taking into account weights.
AN_Y(X, W = rep(1, length(X)), a = 1, b = 1)
AN_Y(X, W = rep(1, length(X)), a = 1, b = 1)
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
a |
is a positive parameter. See more in details |
b |
is a positive parameter. See more in details |
Let be the median category,
be the number of categories and
be the cumulative distribution of
-th category.
The following index with respect to the parameters a and b was proposed by Abul Naga and Yalcin (2008):
The value of Abul Naga and Yalcin coefficient.
Ramses H. Abul Naga and Tarik Yalcin: (2008) Inequality Measurement for ordered response health data, Journal of Health Economics 27(6);
# Compare weighted and unweighted result X=1:10 W=1:10 AN_Y(X) AN_Y(X,W) data(Well_being) # Abul Naga and Yalcin index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight AN_Y(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 AN_Y(X) AN_Y(X,W) data(Well_being) # Abul Naga and Yalcin index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight AN_Y(X,W)
Computes Apouey inequality measure of a given variable taking into account weights.
Apouey( X, W = rep(1, length(X)), a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1) )
Apouey( X, W = rep(1, length(X)), a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1) )
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
a |
is a positive parameter. See more in details |
b |
is a real parameter. See more in details |
Let be the median category,
will be the number of categories and
be the cumulative distribution of
-th category. The following index was proposed by Apouey (2007):
where and
are given parameters with default values
and
.
The value of Apouey coefficient.
Apouey B.: (2007) Measuring health polarization with self-assessed health data, Health Economics 16; 875-894.
# Compare weighted and unweighted result X=1:10 W=1:10 Apouey(X,a=2,b=2) Apouey(X,W,a=2,b=2) data(Well_being) # Apouey index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight Apouey(X,W,a=2,b=2)
# Compare weighted and unweighted result X=1:10 W=1:10 Apouey(X,a=2,b=2) Apouey(X,W,a=2,b=2) data(Well_being) # Apouey index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight Apouey(X,W,a=2,b=2)
Computes Atkinson inequality measure of a given variable taking into account weights.
Atkinson(X, W = rep(1, length(X)), e = 1)
Atkinson(X, W = rep(1, length(X)), e = 1)
X |
is a data vector |
W |
is a vector of weights |
e |
is a coefficient of aversion to inequality, by default 1 |
Atkinson coefficient with respect to parameter is given by
for and
for .
The value of Atkinson coefficient.
Atkinson A. B.: (1970) On the measurement of inequality, Journal of Economic Theory
# Compare weighted and unweighted result X=1:10 W=1:10 Atkinson(X) Atkinson(X,W) data(Tourism) # Atkinson index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Atkinson(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Atkinson(X) Atkinson(X,W) data(Tourism) # Atkinson index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Atkinson(X,W)
Computes Blair and Lacy inequality measure of a given variable taking into account weights.
BL(X, W = rep(1, length(X)), withsqrt = FALSE)
BL(X, W = rep(1, length(X)), withsqrt = FALSE)
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
withsqrt |
if TRUE function returns index given by BL2, elsewhere by BL (default). See more in details. |
Let be the median category,
be the number of categories and
be the cumulative distribution of
-th category.
The indices of Blair and Lacy (2000) are the following:
The value of Blair and Lacy coefficient.
Blair J, Lacy M G. (2000): Statistics of ordinal variation, Sociological Methods and Research 28(251);251-280.
# Compare weighted and unweighted result X=1:10 W=1:10 BL(X) BL(X,W) data(Well_being) # Blair and Lacy index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight BL(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 BL(X) BL(X,W) data(Well_being) # Blair and Lacy index for health assessment with sample weights X=Well_being$V1 W=Well_being$Weight BL(X,W)
Computes Coefficient of Variation inequality measure of a given variable taking into account weights.
CoefVar(X, W = rep(1, length(X)), square = FALSE)
CoefVar(X, W = rep(1, length(X)), square = FALSE)
X |
is a data vector |
W |
is a vector of weights |
square |
logical, argument of the function CoefVar, for details see below |
Coefficient of variation is given by:
where is a standard deviation and
is arithmetic mean.
The value of CoefVar coefficient.
Sheret M.: (1984) Social Indicators Research, An International and Interdisciplinary Journal for Quality-of-Life Measurement, Vol. 15, No. 3, Oct. ISSN 03038300
Coulter P. B.: (1989) Measuring Inequality ISBN 0-8133-7726-9
# Compare weighted and unweighted result X=1:10 W=1:10 CoefVar(X) CoefVar(X,W) data(Tourism) #Coefficient of variation for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight CoefVar(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 CoefVar(X) CoefVar(X,W) data(Tourism) #Coefficient of variation for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight CoefVar(X,W)
Computes generalized entropy index of a given variable taking into account weights.
Entropy(X, W = rep(1, length(X)), power = 0.5, zeroes = "include")
Entropy(X, W = rep(1, length(X)), power = 0.5, zeroes = "include")
X |
is a data vector |
W |
is a vector of weights |
power |
is a entropy parameter |
zeroes |
defines what to do with zeroes in the data vector. Possible options are "remove" and "include". See Details for more. |
Entropy coefficient with respect to parameter is equal to Theil_L(X,W) whenever
,
is equal to Theil_T(X,W) whenever
, and whenever
we have
where is a sum of weights and
is the arithmetic mean of
.
Entropy coefficient is not well-defined for data vector with zero values whenever parameter is zero or one.
In such case, entropy index coincides with the definition of Theil L index and Theil T index, respectively, and entropy index is calculated with corresponding Theil function.
Theil L always removes zeroes. Theil T enables two ways to deal with zeroes by parameter zeroes.
Option "remove" discard these X's and corresponding weights. Works for power>0.
Option "include" puts
due to limiting property of
in zero preserving zero value in dataset. It is valid only for Theil T index, that is power=0.
The value of generalized entropy index
Shorrocks A. F.: (1980) The Class of Additively Decomposable Inequality Measures. Econometrica
Pielou E.C.: (1966) The measurement of diversity in different types of biological collections. Journal of Theoretical Biology
# Compare weighted and unweighted result X=1:10 W=1:10 Entropy(X) Entropy(X,W) data(Tourism) # Generalized entropy index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Entropy(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Entropy(X) Entropy(X,W) data(Tourism) # Generalized entropy index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Entropy(X,W)
Computes Gini coefficient of a given variable taking into account weights.
Gini(X, W = rep(1, length(X)), fast = TRUE, rounded.weights = FALSE)
Gini(X, W = rep(1, length(X)), fast = TRUE, rounded.weights = FALSE)
X |
is a data vector |
W |
is a vector of weights |
fast |
logical, if TRUE (default), Gini is calculated via matrix operations - fast but may cause memory allocation problems. If FALSE, Gini is calculated via vector operations - slower but with better memory allocation |
rounded.weights |
logical, may be run when fast=FALSE. If TRUE (default), Gini is calculated through alternative formula based on ordered X and integer weights. Choose it when dealing with memory allocation problems. |
Gini coefficient is given by:
The value of Gini coefficient.
Dixon P. M., Weiner, J., Mitchell-Olds, T., and Woodley, R.: (1987) Bootstrapping the Gini Coefficient of Inequality. Ecology , Volume 68 (5)
Firebaugh G.: (1999) Empirics of World Income Inequality, American Journal of Sociology
Deininger K.; Squire L.: (1996) A New Data Set Measuring Income Inequality, The World Bank Economic Review, Vol. 10, No. 3
# Compare weighted and unweighted result X=1:10 W=1:10 Gini(X) Gini(X,W) data(Tourism) #Gini coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Gini(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Gini(X) Gini(X,W) data(Tourism) #Gini coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Gini(X,W)
Computes Hoover inequality measure of a given variable taking into account weights.
Hoover(X, W = rep(1, length(X)))
Hoover(X, W = rep(1, length(X)))
X |
is a data vector |
W |
is a vector of weights |
Let be the income of the i-th person and
be the mean income. Then the Hoover index H is:
The value of Hoover coefficient.
Hoover E. M. Jr.: (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics, 18
Hoover E. M. Jr.: (1984) An Introduction to Regional Economics, ISBN 0-07-554440-7
# Compare weighted and unweighted result X=1:10 W=1:10 Hoover(X) Hoover(X,W) data(Tourism) #Hoover index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Hoover(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Hoover(X) Hoover(X,W) data(Tourism) #Hoover index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Hoover(X,W)
Calculates weighted mean and sum of X (or median of X), and a set of relevant inequality measures.
ineq.weighted( X, W = rep(1, length(X)), AF.norm = TRUE, Atkinson.e = 1, Jenkins.alfa = 0.8, Entropy.power = 0.5, zeroes = "include", Kolm.p = 1, Kolm.scale = "Standardization", Leti.norm = T, AN_Y.a = 1, AN_Y.b = 1, Apouey.a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), Apouey.b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1), BL.withsqrt = FALSE )
ineq.weighted( X, W = rep(1, length(X)), AF.norm = TRUE, Atkinson.e = 1, Jenkins.alfa = 0.8, Entropy.power = 0.5, zeroes = "include", Kolm.p = 1, Kolm.scale = "Standardization", Leti.norm = T, AN_Y.a = 1, AN_Y.b = 1, Apouey.a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), Apouey.b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1), BL.withsqrt = FALSE )
X |
is a data vector |
W |
is a vector of weights |
AF.norm |
(logical). If TRUE (default) then index is divided by its maximum possible value |
Atkinson.e |
is a parameter for Atkinson coefficient |
Jenkins.alfa |
is a parameter for Jenkins coefficient |
Entropy.power |
is a generalized entropy index parameter |
zeroes |
defines what to do with zeroes in the data vector. Possible options are "remove" and "include". See Entropy function for details. |
Kolm.p |
is a parameter for Kolm index |
Kolm.scale |
method of data standardization before computing |
Leti.norm |
(logical). If TRUE (default) then Leti index is divided by a maximum possible value |
AN_Y.a |
is a positive parameter for Abul Naga and Yalcin inequality measure |
AN_Y.b |
is a parameter for Abul Naga and Yalcin inequality measure |
Apouey.a |
is a parameter for Apouey inequality measure |
Apouey.b |
is a parameter for Apouey inequality measure |
BL.withsqrt |
if TRUE function returns index given by BL2, elsewhere by BL (default). See more in details of BL function. |
Function checks if X is a numeric or an ordered factor. Then it calculates all appropriate inequality measures.
The data frame with weighted mean and sum of X, and all inequality measures relevant for a numeric data. In a case of an ordered factor, the data frame with median of X, and all relevant inequality measures.
# Compare weighted and unweighted result. X=1:10 W=1:10 ineq.weighted(X) ineq.weighted(X,W) data(Tourism) # Results for Total expenditure with sample weights: X=Tourism$`Total expenditure` W=Tourism$`Sample weight` ineq.weighted(X) ineq.weighted(X,W)
# Compare weighted and unweighted result. X=1:10 W=1:10 ineq.weighted(X) ineq.weighted(X,W) data(Tourism) # Results for Total expenditure with sample weights: X=Tourism$`Total expenditure` W=Tourism$`Sample weight` ineq.weighted(X) ineq.weighted(X,W)
For weighted mean and weighted total of X (or median of X) as well as for each relevant inequality measure, returns outputs from ineq.weighted and bootstrap outcomes: expected value, bias (in %), standard deviation, coefficient of variation, lower and upper bound of confidence interval.
ineq.weighted.boot( X, W = rep(1, length(X)), B = 100, AF.norm = TRUE, Atkinson.e = 1, Jenkins.alfa = 0.8, Entropy.power = 0.5, zeroes = "include", Kolm.p = 1, Kolm.scale = "Standardization", Leti.norm = T, AN_Y.a = 1, AN_Y.b = 1, Apouey.a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), Apouey.b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1), BL.withsqrt = FALSE, keepSamples = FALSE, keepMeasures = FALSE, conf.alpha = 0.05, calib.boot = FALSE, Xs = rep(1, length(X)), total = sum(W), calib.method = "truncated", bounds = c(low = 0, upp = 10) )
ineq.weighted.boot( X, W = rep(1, length(X)), B = 100, AF.norm = TRUE, Atkinson.e = 1, Jenkins.alfa = 0.8, Entropy.power = 0.5, zeroes = "include", Kolm.p = 1, Kolm.scale = "Standardization", Leti.norm = T, AN_Y.a = 1, AN_Y.b = 1, Apouey.a = 2/(1 - length(W[!is.na(W) & !is.na(X)])), Apouey.b = length(W[!is.na(W) & !is.na(X)])/(length(W[!is.na(W) & !is.na(X)]) - 1), BL.withsqrt = FALSE, keepSamples = FALSE, keepMeasures = FALSE, conf.alpha = 0.05, calib.boot = FALSE, Xs = rep(1, length(X)), total = sum(W), calib.method = "truncated", bounds = c(low = 0, upp = 10) )
X |
is a data vector |
W |
is a vector of weights |
B |
is a number of bootstrap samples. |
AF.norm |
(logical). If TRUE (default) then index is divided by its maximum possible value |
Atkinson.e |
is a parameter for Atkinson coefficient |
Jenkins.alfa |
is a parameter for Jenkins coefficient |
Entropy.power |
is a generalized entropy index parameter |
zeroes |
defines what to do with zeroes in the data vector. Possible options are "remove" and "include". See Entropy function for details. |
Kolm.p |
is a parameter for Kolm index |
Kolm.scale |
method of data standardization before computing |
Leti.norm |
(logical). If TRUE (default) then Leti index is divided by a maximum possible value |
AN_Y.a |
is a positive parameter for Abul Naga and Yalcin inequality measure |
AN_Y.b |
is a parameter for Abul Naga and Yalcin inequality measure |
Apouey.a |
is a parameter for Apouey inequality measure |
Apouey.b |
is a parameter for Apouey inequality measure |
BL.withsqrt |
if TRUE function returns index given by BL2, elsewhere by BL (default). See more in details of BL function. |
keepSamples |
if TRUE, it returns bootstrap samples of data (Xb) and weights (Wb) |
keepMeasures |
if TRUE, it returns values of all inequality measures for each bootstrap sample |
conf.alpha |
significance level for confidence interval |
calib.boot |
if FALSE, then naive bootstrap is performed, calibrated bootstrap elsewhere |
Xs |
matrix of calibration variables. By default it is a vector of 1's, applied if calib.boot is TRUE |
total |
vector of population totals. By default it is a sum of weights, applied if calib.boot is TRUE |
calib.method |
weights' calibration method for function calib (sampling) |
bounds |
vector of bounds for the g-weights used in the truncated and logit methods; 'low' is the smallest value and 'upp' is the largest value |
By default, naive bootstrap is performed, that is no weights calibration is conducted.
You can choose calibrated bootstrap to calibrate weights with respect to provided variables (Xs) and totals (total).
Confidence interval is simply derived with quantile of order and
where
is a significance level for confidence interval.
This functions returns a data frame from ineq.weighted extended with bootstrap results: expected value, bias (in %), standard deviation, coefficient of variation, lower and upper bound of confidence interval. If keepSamples=TRUE or keepMeasures==TRUE then the output becomes a list. If keepSamples=TRUE, the functions returns Xb and Wb, which are the samples of vector data and the samples of weights, respectively. If keepMeasures==TRUE, the functions returns Mb, which is a set of inequality measures from bootstrapping.
# Inequality measures with additional statistics for numeric variable X=1:10 W=1:10 ineq.weighted.boot(X,W,B=10) # Inequality measures with additional statistics for ordered factor variable X=factor(c('H','H','M','M','L','L'),levels = c('L','M','H'),ordered = TRUE) W=c(2,2,3,3,8,8) ineq.weighted.boot(X,W,B=10)
# Inequality measures with additional statistics for numeric variable X=1:10 W=1:10 ineq.weighted.boot(X,W,B=10) # Inequality measures with additional statistics for ordered factor variable X=factor(c('H','H','M','M','L','L'),levels = c('L','M','H'),ordered = TRUE) W=c(2,2,3,3,8,8) ineq.weighted.boot(X,W,B=10)
Computes Jenkins as well as Cowell and Flachaire inequality measure of a given variable taking into account weights.
Jenkins(X, W = rep(1, length(X)), alfa = 0.8)
Jenkins(X, W = rep(1, length(X)), alfa = 0.8)
X |
is a data vector |
W |
is a vector of weights |
alfa |
is the Jenkins coefficient parameter |
Jenkins coefficient is given by:
where GL is Generalized Lorenz curve.
Cowell and Flachaire coefficient with alpha parameter is given by:
for , and
for .
The value of Jenkins, Cowell and Flachaire coefficient.
Jenkins S. P. and P. J. Lambert: (1997) Three ‘I’s of Poverty Curves, with an Analysis of U.K. Poverty Trends
Cowell F. A.: (2000) Measurement of Inequality, Handbook of Income Distribution
Cowell F. A., Flachaire E.: (2017) Inequality with Ordinal Data
# Compare weighted and unweighted result X=1:10 W=1:10 Jenkins(X) Jenkins(X,W) data(Tourism) #Jenkins, Cowell and Flachaire coefficients for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Jenkins(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Jenkins(X) Jenkins(X,W) data(Tourism) #Jenkins, Cowell and Flachaire coefficients for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Jenkins(X,W)
Computes Kolm inequality measure of a given variable taking into account weights.
Kolm(X, W = rep(1, length(X)), parameter = 1, scale = "None")
Kolm(X, W = rep(1, length(X)), parameter = 1, scale = "None")
X |
is a data vector |
W |
is a vector of weights |
parameter |
is a Kolm parameter |
scale |
method of data scaling (None, Normalization, Unitarization, Standardization) |
Kolm index with parameter is defined as:
Kolm index is scale-dependent. Basic normalization methods can be applied before final computation.
The value of Kolm coefficient.
Kolm S. C.: (1976) Unequal inequalities I and II
Kolm S. C.: (1996) Intermediate measures of inequality
Chakravarty S. R.: (2009) Inequality, Polarization and Poverty e-ISBN 978-0-387-79253-8
# Compare weighted and unweighted result X=1:10 W=1:10 Kolm(X) Kolm(X,W) # Compare raw and standardized data. Kolm(X,W) Kolm(X,W, scale ="Standardization") # Changing units has an impact on the final result Kolm(X) Kolm(10*X) # Changing units has no impact on the final result with standardized data Kolm(X,scale ="Standardization") Kolm(10*X,scale ="Standardization")
# Compare weighted and unweighted result X=1:10 W=1:10 Kolm(X) Kolm(X,W) # Compare raw and standardized data. Kolm(X,W) Kolm(X,W, scale ="Standardization") # Changing units has an impact on the final result Kolm(X) Kolm(10*X) # Changing units has no impact on the final result with standardized data Kolm(X,scale ="Standardization") Kolm(10*X,scale ="Standardization")
Computes Leti inequality measure of a given variable taking into account weights.
Leti(X, W = rep(1, length(X)), norm = T)
Leti(X, W = rep(1, length(X)), norm = T)
X |
is a data vector (ordered factor or numeric) |
W |
is a vector of weights |
norm |
(logical). If TRUE (default) then Leti index is divided by a maximum possible value which is |
Let be the number of individuals in category
and let
be the total sample size.
Cumulative distribution is given by
. Leti index is defined as:
The value of Leti coefficient.
Leti G.: (1983). Statistica descrittiva, il Mulino, Bologna. ISBN: 8-8150-0278-2
# Compare weighted and unweighted result X=1:10 W=1:10 Leti(X) Leti(X,W) data(Tourism) #Leti index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Leti(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Leti(X) Leti(X,W) data(Tourism) #Leti index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Leti(X,W)
Computes weighted sum of values not greater then a quantile derived for the given probability.
LowerSum(X, W = rep(1, length(X)), p = 0.5)
LowerSum(X, W = rep(1, length(X)), p = 0.5)
X |
is a numeric data vector |
W |
is a vector of weights |
p |
is a probability to derive corresponding quantile |
Calculates weighted sum of values not greater then a quantile derived for the given probability based on cumulative distribution. Linear interpolation is applied to deal with a frequency distribution.
The weighted sum of values not greater then a quantile.
# Suppose X represents incomes. Compare total incomes with incomes of poorer half of population. X=1:10 W=10:1 sum(W*X) LowerSum(X,W,0.5)
# Suppose X represents incomes. Compare total incomes with incomes of poorer half of population. X=1:10 W=10:1 sum(W*X) LowerSum(X,W,0.5)
Computes median of ordered factor or numeric variable taking into account weights.
medianf(X, W = rep(1, length(X)))
medianf(X, W = rep(1, length(X)))
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
Calculates median based on cumulative distribution. Tailored for ordered factors.
The median category (number or label) of ordered factor.
# Compare weighted and unweighted result X=factor(c('H','H','M','M','L','L'),levels = c('L','M','H'),ordered = TRUE) W=c(2,2,3,3,8,8) medianf(X) medianf(X,W)
# Compare weighted and unweighted result X=factor(c('H','H','M','M','L','L'),levels = c('L','M','H'),ordered = TRUE) W=c(2,2,3,3,8,8) medianf(X) medianf(X,W)
Palma proportion - originally the ratio of the total income of the 10% richest people to the 40% poorest people.
Palma(X, W = rep(1, length(X)))
Palma(X, W = rep(1, length(X)))
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
Palma index is calculated by the following formula:
where is share of 10% of the highest values,
is share of 40% of the lowest values.
The value of Palma coefficient.
Cobham A., Sumner A.: (2013) Putting the Gini Back in the Bottle? 'The Palma' as a Policy-Relevant Measure of Inequality
Palma J. G.: (2011) Homogeneous middles vs. heterogeneous tails, and the end of the ‘Inverted-U’: the share of the rich is what it’s all about
# Compare weighted and unweighted result X=1:10 W=1:10 Palma(X) Palma(X,W) data(Tourism) #Palma index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Palma(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Palma(X) Palma(X,W) data(Tourism) #Palma index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Palma(X,W)
20:20 ratio - originally the ratio of the total income of the 20% richest people to the 20% poorest people.
Prop20_20(X, W = rep(1, length(X)))
Prop20_20(X, W = rep(1, length(X)))
X |
is a data vector (numeric or ordered factor) |
W |
is a vector of weights |
20:20 ratio is calculated as follows:
where is share of 20% of the highest values,
is share of 20% of the lowest values.
The value of 20:20 ratio coefficient.
Panel Data Econometrics: Theoretical Contributions And Empirical Applications edited by Badi Hani Baltag
Notes on Statistical Sources and Methods - The Equality Trust.
# Compare weighted and unweighted result X=1:10 W=1:10 Prop20_20(X) Prop20_20(X,W) data(Tourism) #Prop20_20 proportion for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Prop20_20(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Prop20_20(X) Prop20_20(X,W) data(Tourism) #Prop20_20 proportion for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Prop20_20(X,W)
Computes quantile derived for the given probability taking into account weights.
Quantile(X, W = rep(1, length(X)), p = 0.5)
Quantile(X, W = rep(1, length(X)), p = 0.5)
X |
is a numeric data vector |
W |
is a vector of weights |
p |
is a probability to derive corresponding quantile |
Linear interpolation is applied to deal with a frequency distribution.
The quantile for weighted data.
# Compare weighted and unweighted result X=1:10 W=10:1 Quantile(X,p=0.5) Quantile(X,W,p=0.5)
# Compare weighted and unweighted result X=1:10 W=10:1 Quantile(X,p=0.5) Quantile(X,W,p=0.5)
Computes Ricci and Schutz inequality measure of a given variable taking into account weights.
RicciSchutz(X, W = rep(1, length(X)))
RicciSchutz(X, W = rep(1, length(X)))
X |
is a data vector |
W |
is a vector of weights |
In the case of an empirical distribution with n elements where denotes the wealth of household
and
the sample average, the Ricci and Schutz coefficient can be expressed as:
The value of Ricci and Schutz coefficient.
Coulter P. B.: (1989) Measuring Inequality ISBN 0-8133-7726-9
Eliazar I. I., Sokolov I. M.: (2010) Measuring statistical heterogeneity: The Pietra index
Costa R. N., Pérez-Duarte S.: (2019) Not all inequality measures were created equal, Statistics Paper Series, No 31
# Compare weighted and unweighted result X=1:10 W=1:10 RicciSchutz(X) RicciSchutz(X,W) data(Tourism) #Ricci and Schutz index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight RicciSchutz(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 RicciSchutz(X) RicciSchutz(X,W) data(Tourism) #Ricci and Schutz index for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight RicciSchutz(X,W)
Computes Theil_L inequality measure of a given variable taking into account weights.
Theil_L(X, W = rep(1, length(X)))
Theil_L(X, W = rep(1, length(X)))
X |
is a data vector |
W |
is a vector of weights |
Theil L index is defined as:
where
Theil L index can be computed only for positive values. By default, this functions discard zero X's and corresponding weights.
The value of Theil_L coefficient.
Serebrenik A., van den Brand M.: Theil index for aggregation of software metrics values. 26th IEEE International Conference on Software Maintenance. IEEE Computer Society.
Conceição P., Ferreira P.: (2000) The Young Person’s Guide to the Theil Index: Suggesting Intuitive Interpretations and Exploring Analytical Applications
OECD: (2020) Regions and Cities at a Glance 2020, Chapter: Indexes and estimation techniques
# Compare weighted and unweighted result X=1:10 W=1:10 Theil_L(X) Theil_L(X,W) data(Tourism) # Theil L coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Theil_L(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Theil_L(X) Theil_L(X,W) data(Tourism) # Theil L coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Theil_L(X,W)
Computes Theil_T
inequality measure of a given variable taking into account weights.
Theil_T(X, W = rep(1, length(X)), zeroes = "include")
Theil_T(X, W = rep(1, length(X)), zeroes = "include")
X |
is a data vector |
W |
is a vector of weights |
zeroes |
defines what to do with zeroes in the data vector. Possible options are "remove" and "include". See Details for more. |
Theil T index is defined as:
where
Formally, Theil index is defined for positive values due to logarithms.
Nevertheless, in data analysis zero values may occur.
There are two way we can deal with them.
Option "remove" discard these X's and corresponding weights.
Option "include" puts due to limiting property of
in zero preserving zero value in dataset.
The value of Theil_T
coefficient.
Serebrenik A., van den Brand M.: Theil index for aggregation of software metrics values. 26th IEEE International Conference on Software Maintenance. IEEE Computer Society.
Conceição P., Ferreira P.: (2000) The Young Person’s Guide to the Theil Index: Suggesting Intuitive Interpretations and Exploring Analytical Applications
OECD: (2020) Regions and Cities at a Glance 2020, Chapter: Indexes and estimation techniques
# Compare weighted and unweighted result X=1:10 W=1:10 Theil_T(X) Theil_T(X,W) data(Tourism) # Theil T coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Theil_T(X,W)
# Compare weighted and unweighted result X=1:10 W=1:10 Theil_T(X) Theil_T(X,W) data(Tourism) # Theil T coefficient for Total expenditure with sample weights X=Tourism$Total_expenditure W=Tourism$Sample_weight Theil_T(X,W)
Data from sample survey on trips conducted in Polish households.
data(Tourism)
data(Tourism)
A data frame with 5319 observations of 17 variables
Year
Country
Country code
World region
Purpose of trip
Accommodation type
Number of trip's participants
Nights spent
Travel agency (organiser)
Sample weight
Total expenditure
Expenditure for organiser
Private expenditure
Expenditure on accommodation
Expenditure on restaurants & café
Expenditure on transport
Expenditure on commodities
Answers were modified due to disclosure control. Data presents only part of full database.
Data from sample survey on quality of life conducted on Polish-Ukrainian border in 2015 and 2019.
data(Well_being)
data(Well_being)
A data frame with 1197 observations of 27 variables
Area. Rural and urban
Gender. Male and female
Year. Year of survey (2015 and 2019)
V1. I have good opportunities to use my talents and skills at work
V2. I am treated with respect by others at work
V3. I have adequate opportunities for vacations or leisure activities
V4. The quality of local services where (I) live is good
V5. There is very little pollution from cars or other sources where I spend most of my time
V6. There are parks and green areas near my residence
V7. I have the freedom to plan my life the way I want to
V8. I feel safe walking around my neighborhood during the day
V9. Overall, to what extent are you currently satisfied with your life
V10. Overall, to what extent do you feel that the things you do in life are worthwhile
V11. How do you rate your health
V12. How do you rate your work
V13. How do you rate your sleep
V14. How do you rate your leisure time
V15. How do you rate your family life
V16. How do you rate your community and public affairs life
V17. How do you rate your personal plans
V18. How do you rate your housing conditions
V19. How do you rate your personal income
V20. How do you rate your personal prospects
V21. Does being part of the local community make you feel good about yourself
V22. Do you have a say in what the local community is like
V23. Is your neighborhood a good place for you to live
Weight. Sample weight for each household
Questions are on Likert scale: 1 - the worst assessment, 5 - the best assessment. Only 23 questions were selected out of over 100 questions. Answers were modified due to disclosure control.