Introduction to ineq.2d package

The package contains functions performing two-dimensional decomposition of the Theil index (see Giammatteo, 2007) and the squared coefficient of variation (see Garcia-Penalosa & Orgiazzi, 2013). Both measures can be decomposed by some feature that members of the studied population possess (e.g., sex, education, age) and their income source at the same time.

Researchers and students interested in studying income or wealth inequality can benefit from fast and simple inequality decomposition offered by this package.

data(us16)
str(us16)
#> 'data.frame':    1000 obs. of  8 variables:
#>  $ hitotal   : num  15000 18332 15709 16601 5000 ...
#>  $ hitransfer: num  0 18332 15709 16475 5000 ...
#>  $ hilabour  : int  15000 0 0 0 0 0 14000 0 0 0 ...
#>  $ hicapital : int  0 0 0 126 0 0 169 0 414 52 ...
#>  $ hpopwgt   : num  257 1404 2214 3510 1101 ...
#>  $ age       : int  42 48 45 67 50 69 45 38 60 56 ...
#>  $ sex       : chr  "male" "female" "female" "male" ...
#>  $ educ      : chr  "low" "medium" "medium" "medium" ...

This dataset contains several income variables: hitotal, hilabour, hicapital, and hitransfer. This is a household-level data. This is why every variable name begins with “h”. hitotal represents total income of a given household. The other three income variables are components of hitotal (i.e., their sum equals hitotal).

Additionally, this dataset contains three variables representing some feature of the household head: sex, educ, and age.

Let us now try decomposing both indexes only by sex. This is an example of one-dimensional decomposition.

theil.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
#>    source    male.W  female.W   male.B    female.B
#> 1 hitotal 0.1932357 0.1629971 0.055266 -0.04967665

Remember that the Theil index contains natural logarithm in its formula. This is why non-positive values are automatically removed during calculation.

scv.2d(us16, "hitotal", "sex", "hitotal", "hpopwgt")
#>    source    male.W  female.W     male.B    female.B
#> 1 hitotal 0.3244869 0.2139148 0.05541207 -0.04983837

Every column of the output data frame represents a value of the feature used for decomposition (here, it is sex). There can be inequality within groups formed by this feature and between them - there are twice as much columns as values of the given feature. Whether a column contains a value of within or between-group inequality is indicated by “.W” and “.B” suffixes respectively.

Now, we can try two-dimensional decomposition. That is, we decompose both inequality measures by sex and by income source at the same time.

theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                   "hitransfer"), "hpopwgt")
#>       source      male.W    female.W      male.B     female.B
#> 1   hilabour  0.19954833  0.17473828 0.043080908 -0.037620658
#> 2  hicapital  0.01718640  0.01398155 0.003286276 -0.002651754
#> 3 hitransfer -0.02349904 -0.02572277 0.008898821 -0.009404242

scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                 "hitransfer"), "hpopwgt")
#>       source       male.W    female.W       male.B      female.B
#> 1   hilabour 3.076806e-01 0.202138672 3.557383e-02 -3.154309e-02
#> 2  hicapital 1.688225e-02 0.013089150 3.993496e-04 -3.406632e-04
#> 3 hitransfer 5.377197e-05 0.000041555 1.091775e-06 -1.064178e-06

Now we have both rows and columns in this data frame. Every row of the data frame represents an income source. Thus, in case of two- dimensional decomposition, every value in this data frame is the contribution of inequality in income earned from i-th source by members of j-th population cohort to overall income inequality.

Remember that overall Theil index, which is the sum of all values in the data frame, is always positive. However, some components of the index can have negative contribution to inequality.

If you want the functions to return percentage shares of every inequality component in overall inequality rather than indexes, then set the option “perc” to “TRUE”.

theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                   "hitransfer"), "hpopwgt", perc = TRUE)
#>       source    male.W  female.W     male.B   female.B
#> 1   hilabour 55.150952 48.293976 11.9066546 -10.397557
#> 2  hicapital  4.749959  3.864206  0.9082575  -0.732889
#> 3 hitransfer -6.494640 -7.109233  2.4594465  -2.599134

scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                 "hitransfer"), "hpopwgt", perc = TRUE)
#>       source       male.W     female.W       male.B      female.B
#> 1   hilabour 56.561486913 37.159519153 6.5396016534 -5.7986229469
#> 2  hicapital  3.103495468  2.406202130 0.0734131702 -0.0626247425
#> 3 hitransfer  0.009884998  0.007639131 0.0002007029 -0.0001956297

Overall inequality measures can be obtained in two ways. The first one is to sum the values in the output data frame:

theil1 <- theil.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                             "hitransfer"), "hpopwgt")
sum(theil1[,-1])
#> [1] 0.3618221

scv1 <- scv.2d(us16, "hitotal", "sex", c("hilabour", "hicapital", 
                                         "hitransfer"), "hpopwgt")
sum(scv1[,-1])
#> [1] 0.5439755

theil.2d(us16, "hitotal", weights = "hpopwgt")
#>    source     all.W
#> 1 hitotal 0.3618221

scv.2d(us16, "hitotal", weights = "hpopwgt")
#>    source     all.W
#> 1 hitotal 0.5439755

Decomposition by education level is done the same way as demonstrated above. You only need to specify “educ” instead of “sex” in function inputs.

Decomposition by age represents a more complicated example. Unlike sex and educ, which assume two and three values respectively, age can assume multiple values because it is measured in years. To decompose the indexes by age, one needs to add column indicating that a household is a member of some age cohort. This can be done as follows:

us16$cohort <- 0
us16[us16$age < 25, "cohort"] <- "t24"
us16[us16$age >= 25 & us16$age < 50, "cohort"] <- "f25t49"
us16[us16$age >= 50 & us16$age < 75, "cohort"] <- "f50t74"
us16[us16$age >= 75, "cohort"] <- "f75"

After this variable has been created, we can decompose the indexes by the age cohorts and income sources:

theil.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital", 
                                      "hitransfer"), "hpopwgt")
#>       source     f25t49.W    f50t74.W       f75.W         t24.W    f25t49.B
#> 1   hilabour  0.140814426  0.16408920 0.002682527  2.947955e-02 0.040099285
#> 2  hicapital  0.006838849  0.02247595 0.001883919 -8.898296e-05 0.001180499
#> 3 hitransfer -0.012024130 -0.01896720 0.005636069 -1.386040e-03 0.002583661
#>      f50t74.B        f75.B         t24.B
#> 1 0.012722769 -0.005164298 -4.976593e-03
#> 2 0.001738889 -0.002204361 -2.229027e-05
#> 3 0.004222416 -0.029104874 -6.871409e-04

scv.2d(us16, "hitotal", "cohort", c("hilabour", "hicapital", 
                                    "hitransfer"), "hpopwgt")
#>       source     f25t49.W     f50t74.W        f75.W        t24.W      f25t49.B
#> 1   hilabour 1.931008e-01 0.2078082352 1.390179e-03 7.426800e-02  8.449500e-02
#> 2  hicapital 3.473434e-03 0.0247662186 8.886131e-04 4.635124e-06 -6.920235e-04
#> 3 hitransfer 1.277082e-05 0.0000546742 1.018621e-05 2.604266e-06 -1.471064e-05
#>        f50t74.B         f75.B         t24.B
#> 1 -1.866661e-02 -2.951913e-02  9.735254e-04
#> 2  1.914497e-03 -1.880487e-04 -1.372351e-04
#> 3  1.529066e-05  1.602344e-05 -1.484399e-06

Garcia-Penalosa, C., & Orgiazzi, E. (2013). Factor Components of Inequality: A Cross-Country Study. Review of Income and Wealth, 59(4), 689-727.

Giammatteo, M. (2007). The Bidimensional Decomposition of Inequality: A nested Theil Approach. LIS Working papers, Article 466, 1-30.