library(ineq.2d)
This is the introduction to the ineq.2d package.
The package contains functions performing two-dimensional decomposition of the Theil index (see Giammatteo, 2007) and the squared coefficient of variation (see Garcia-Penalosa & Orgiazzi, 2013). Both measures can be decomposed by some feature that members of the studied population possess (e.g., sex, education, age) and their income source at the same time.
Researchers and students interested in studying income or wealth inequality can benefit from fast and simple inequality decomposition offered by this package.
First, let us load the test dataset to the environment and examine its content.
data(us16)
str(us16)
#> 'data.frame': 1000 obs. of 8 variables:
#> $ hitotal : num 15000 18332 15709 16601 5000 ...
#> $ hitransfer: num 0 18332 15709 16475 5000 ...
#> $ hilabour : int 15000 0 0 0 0 0 14000 0 0 0 ...
#> $ hicapital : int 0 0 0 126 0 0 169 0 414 52 ...
#> $ hpopwgt : num 257 1404 2214 3510 1101 ...
#> $ age : int 42 48 45 67 50 69 45 38 60 56 ...
#> $ sex : chr "male" "female" "female" "male" ...
#> $ educ : chr "low" "medium" "medium" "medium" ...
This dataset contains several income variables: hitotal, hilabour, hicapital, and hitransfer. This is a household-level data. This is why every variable name begins with “h”. hitotal represents total income of a given household. The other three income variables are components of hitotal (i.e., their sum equals hitotal).
Additionally, this dataset contains three variables representing some feature of the household head: sex, educ, and age.
Finally, the dataset contains population weights for every household: hpopwgt.
Let us now try decomposing both indexes only by sex. This is an example of one-dimensional decomposition.
We decompose the Theil index first:
theil.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
#> source male.W female.W male.B female.B
#> 1 hitotal 0.1932357 0.1629971 0.055266 -0.04967665
Remember that the Theil index contains natural logarithm in its formula. This is why non-positive values are automatically removed during calculation.
Decomposition of the squared coefficient of variation (SCV) is done similarly:
scv.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
#> source male.W female.W male.B female.B
#> 1 hitotal 0.3244869 0.2139148 0.05541207 -0.04983837
Every column of the output data frame represents a value of the feature used for decomposition (here, it is sex). There can be inequality within groups formed by this feature and between them - there are twice as much columns as values of the given feature. Whether a column contains a value of within or between-group inequality is indicated by “.W” and “.B” suffixes respectively.
Now, we can try two-dimensional decomposition. That is, we decompose both inequality measures by sex and by income source at the same time.
First, we decompose the Theil index:
theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt")
#> source male.W female.W male.B female.B
#> 1 hilabour 0.19954833 0.17473828 0.043080908 -0.037620658
#> 2 hicapital 0.01718640 0.01398155 0.003286276 -0.002651754
#> 3 hitransfer -0.02349904 -0.02572277 0.008898821 -0.009404242
Then, we decompose SCV:
scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt")
#> source male.W female.W male.B female.B
#> 1 hilabour 3.076806e-01 0.202138672 3.557383e-02 -3.154309e-02
#> 2 hicapital 1.688225e-02 0.013089150 3.993496e-04 -3.406632e-04
#> 3 hitransfer 5.377197e-05 0.000041555 1.091775e-06 -1.064178e-06
Now we have both rows and columns in this data frame. Every row of the data frame represents an income source. Thus, in case of two- dimensional decomposition, every value in this data frame is the contribution of inequality in income earned from i-th source by members of j-th population cohort to overall income inequality.
Remember that overall Theil index, which is the sum of all values in the data frame, is always positive. However, some components of the index can have negative contribution to inequality.
If you want the functions to return percentage shares of every inequality component in overall inequality rather than indexes, then set the option “perc” to “TRUE”.
theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt", perc = TRUE)
#> source male.W female.W male.B female.B
#> 1 hilabour 55.150952 48.293976 11.9066546 -10.397557
#> 2 hicapital 4.749959 3.864206 0.9082575 -0.732889
#> 3 hitransfer -6.494640 -7.109233 2.4594465 -2.599134
scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt", perc = TRUE)
#> source male.W female.W male.B female.B
#> 1 hilabour 56.561486913 37.159519153 6.5396016534 -5.7986229469
#> 2 hicapital 3.103495468 2.406202130 0.0734131702 -0.0626247425
#> 3 hitransfer 0.009884998 0.007639131 0.0002007029 -0.0001956297
Overall inequality measures can be obtained in two ways. The first one is to sum the values in the output data frame:
<- theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
theil1 "hitransfer"), "hpopwgt")
sum(theil1[,-1])
#> [1] 0.3618221
<- scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital",
scv1 "hitransfer"), "hpopwgt")
sum(scv1[,-1])
#> [1] 0.5439755
The second way is to avoid specifying the feature and income sources:
theil.2d(us16, "hitotal", weights = "hpopwgt")
#> source all.W
#> 1 hitotal 0.3618221
scv.2d(us16, "hitotal", weights = "hpopwgt")
#> source all.W
#> 1 hitotal 0.5439755
Decomposition by education level is done the same way as demonstrated above. You only need to specify “educ” instead of “sex” in function inputs.
Decomposition by age represents a more complicated example. Unlike sex and educ, which assume two and three values respectively, age can assume multiple values because it is measured in years. To decompose the indexes by age, one needs to add column indicating that a household is a member of some age cohort. This can be done as follows:
$cohort <- 0
us16$age < 25, "cohort"] <- "t24"
us16[us16$age >= 25 & us16$age < 50, "cohort"] <- "f25t49"
us16[us16$age >= 50 & us16$age < 75, "cohort"] <- "f50t74"
us16[us16$age >= 75, "cohort"] <- "f75" us16[us16
After this variable has been created, we can decompose the indexes by the age cohorts and income sources:
theil.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt")
#> source f25t49.W f50t74.W f75.W t24.W f25t49.B
#> 1 hilabour 0.140814426 0.16408920 0.002682527 2.947955e-02 0.040099285
#> 2 hicapital 0.006838849 0.02247595 0.001883919 -8.898296e-05 0.001180499
#> 3 hitransfer -0.012024130 -0.01896720 0.005636069 -1.386040e-03 0.002583661
#> f50t74.B f75.B t24.B
#> 1 0.012722769 -0.005164298 -4.976593e-03
#> 2 0.001738889 -0.002204361 -2.229027e-05
#> 3 0.004222416 -0.029104874 -6.871409e-04
scv.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital",
"hitransfer"), "hpopwgt")
#> source f25t49.W f50t74.W f75.W t24.W f25t49.B
#> 1 hilabour 1.931008e-01 0.2078082352 1.390179e-03 7.426800e-02 8.449500e-02
#> 2 hicapital 3.473434e-03 0.0247662186 8.886131e-04 4.635124e-06 -6.920235e-04
#> 3 hitransfer 1.277082e-05 0.0000546742 1.018621e-05 2.604266e-06 -1.471064e-05
#> f50t74.B f75.B t24.B
#> 1 -1.866661e-02 -2.951913e-02 9.735254e-04
#> 2 1.914497e-03 -1.880487e-04 -1.372351e-04
#> 3 1.529066e-05 1.602344e-05 -1.484399e-06
References:
Garcia-Penalosa, C., & Orgiazzi, E. (2013). Factor Components of Inequality: A Cross-Country Study. Review of Income and Wealth, 59(4), 689-727.
Giammatteo, M. (2007). The Bidimensional Decomposition of Inequality: A nested Theil Approach. LIS Working papers, Article 466, 1-30.