Making price indexes

Most price indexes are made with a two-step procedure, where period-over-period elemental indexes are calculated for a collection of elemental aggregates at each point in time, and then aggregated according to a price index aggregation structure. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. The piar package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes in R.

The purpose of this vignette is to give several extended examples of how to use the functions in this package to make different types of price indexes. This should serve both as a introduction to the functionality in piar, and a reference for solving specific index-number problems.

Matched-sample index

The first example covers calculating a matched-sample index, where a fixed set of businesses each provide prices for a collection of products over time. The products reported by a businesses can change over time, but the set of businesses is fixed for the duration of the sample. Each businesses has a weight that is established when the sample is drawn, and represents a particular segment of the economy.

The usual approach for calculating a matched-sample index starts by computing the elemental index for each business as an equally-weighted geometric mean of price relatives (i.e., a Jevons index). From there, index values for different segments of the economy are calculated as an arithmetic mean of the elemental indexes, using the businesses-level weights (either a Young or Lowe index, depending how the weights are constructed).

The ms_prices dataset has price data for five businesses over four quarters, and the ms_weights dataset has the weight data. Note that these data have fairly realistic patterns of missing data.

library(piar)

head(ms_prices)
#>   period business product price
#> 1 202001       B1       1  1.14
#> 2 202001       B1       2    NA
#> 3 202001       B1       3  6.09
#> 4 202001       B2       4  6.23
#> 5 202001       B2       5  8.61
#> 6 202001       B2       6  6.40

ms_weights
#>   business classification weight
#> 1       B1             11    553
#> 2       B2             11    646
#> 3       B3             11    312
#> 4       B4             12    622
#> 5       B5             12    330

The elemental_index() function makes, well, elemental indexes, using information on price relatives, elemental aggregates (businesses), and time periods (quarters). By default it makes a Jevons index, but any bilateral generalized-mean index is possible. The only wrinkle is that price data here are in levels, and not relatives, but the price_relative() function can make the necessary conversion.

relatives <- with(
  ms_prices,
  price_relative(price, period = period, product = product)
)

ms_elemental <- with(
  ms_prices,
  elemental_index(relatives, period = period, ea = business, na.rm = TRUE)
)

ms_elemental
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001    202002    202003   202004
#> B1      1 0.8949097 0.3342939      NaN
#> B2      1       NaN       NaN 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4    NaN       NaN       NaN 4.576286

(Homogeneous elemental aggregates often leads to unit-value elemental indexes that are not based on price relatives. These cases can be dealt with by first aggregating prices for each elemental aggregate, aggregate(price ~ period + product, ms_prices, mean), at each point in time with an arithmetic mean, then forming price relatives to feed into elemental_index().)

As with most functions in R, missing values are contagious by default in piar. Setting na.rm = TRUE in elemental_index() means that missing price relatives are ignored, which is equivalent to imputing these missing relatives with the value of the elemental index for the respective businesses (i.e., parental or overall mean imputation). Other types of imputation are possible, and are the topic of a subsequent example.

The elemental_index() function returns a special index object, and there are a number of methods for working with these objects. Probably the most useful of these methods allows the resulting elemental indexes to be extracted like a matrix, even though it’s not a matrix. (Note that there are only indexes for four businesses, not five, because the fifth business never reports any prices; as will be seen in another example, an elemental index can be made for this business with a small change to the call to elemental_index().)

ms_elemental[, "202004"]
#> Period-over-period price index for 4 levels over 1 time periods 
#>      202004
#> B1      NaN
#> B2 2.770456
#> B3 0.537996
#> B4 4.576286

ms_elemental["B1", ]
#> Period-over-period price index for 1 levels over 4 time periods 
#>    202001    202002    202003 202004
#> B1      1 0.8949097 0.3342939    NaN

With the elemental indexes out of the way, it’s time to make a price-index aggregation structure that maps each business to its position in the aggregation hierarchy. The only hiccup is unpacking the digit-wise classification for each businesses that defines the hierarchy. That’s the job of the expand_classification() function.

hierarchy <- with(
  ms_weights, 
  c(expand_classification(classification), list(business))
)

pias <- aggregation_structure(hierarchy, weights = ms_weights$weight)

It is now simple to aggregate the elemental indexes according to this aggregation structure with the aggregate() function. As with the elemental indexes, missing values are ignored by setting na.rm = TRUE, which is equivalent to parentally imputing missing values. Note that, unlike the elemental indexes, missing values are filled in to ensure the index can be chained over time.

ms_index <- aggregate(ms_elemental, pias, na.rm = TRUE)

ms_index
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003   202004
#> 1       1 1.3007239 1.0630743 2.734761
#> 11      1 1.3007239 1.0630743 1.574515
#> 12      1 1.3007239 1.0630743 4.576286
#> B1      1 0.8949097 0.3342939 1.574515
#> B2      1 1.3007239 1.0630743 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4      1 1.3007239 1.0630743 4.576286
#> B5      1 1.3007239 1.0630743 4.576286

Although simple, this example covers the core functionality of piar. The remaining examples in the vignette build on this one by adding complexities that often arise in practice.

Chaining

The elemental_index() function makes period-over-period elemental indexes by default, which can then be aggregated to make a period-over-period index. Chaining an index is the process of taking the cumulative product of each of these period-over-period indexes to make a time series that compares prices to a fixed base period.

The chain() function can be used to chain the values in an index object.

ms_index_chained <- chain(ms_index)

ms_index_chained
#> Fixed-base price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.3007239 1.3827662 3.7815355
#> 11      1 1.3007239 1.3827662 2.1771866
#> 12      1 1.3007239 1.3827662 6.3279338
#> B1      1 0.8949097 0.2991629 0.4710366
#> B2      1 1.3007239 1.3827662 3.8308934
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.3007239 1.3827662 6.3279338
#> B5      1 1.3007239 1.3827662 6.3279338

This gives almost the same result as directly manipulating the index as a matrix, except that the former returns an index object (not a matrix).

t(apply(as.matrix(ms_index), 1, cumprod))
#>    202001    202002    202003    202004
#> 1       1 1.3007239 1.3827662 3.7815355
#> 11      1 1.3007239 1.3827662 2.1771866
#> 12      1 1.3007239 1.3827662 6.3279338
#> B1      1 0.8949097 0.2991629 0.4710366
#> B2      1 1.3007239 1.3827662 3.8308934
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.3007239 1.3827662 6.3279338
#> B5      1 1.3007239 1.3827662 6.3279338

Chained indexes often need be to rebased, and this can be done with the rebase() function. For example, rebasing the index so that 202004 is the base period just requires dividing the chained index by the slice for 202004.

rebase(ms_index_chained, ms_index_chained[, "202004"])
#> Fixed-base price index for 8 levels over 4 time periods 
#>       202001    202002    202003 202004
#> 1  0.2644428 0.3439671 0.3656626      1
#> 11 0.4593084 0.5974334 0.6351161      1
#> 12 0.1580295 0.2055527 0.2185178      1
#> B1 2.1229774 1.8998731 0.6351161      1
#> B2 0.2610357 0.3395354 0.3609514      1
#> B3 0.5626806 1.1366169 1.8587499      1
#> B4 0.1580295 0.2055527 0.2185178      1
#> B5 0.1580295 0.2055527 0.2185178      1

In some cases the base period is the average of several periods; setting the base period to the second half of 2020 just requires averaging the index over subperiods before rebasing.

rebase(
  ms_index_chained,
  mean(window(ms_index_chained, "202003"))
)
#> Fixed-base price index for 8 levels over 4 time periods 
#>       202001    202002    202003    202004
#> 1  0.3872740 0.5037366 0.5355095 1.4644905
#> 11 0.5618052 0.7307535 0.7768452 1.2231548
#> 12 0.2593798 0.3373815 0.3586616 1.6413384
#> B1 2.5967299 2.3238388 0.7768452 1.2231548
#> B2 0.3836077 0.4989677 0.5304398 1.4695602
#> B3 0.3936550 0.7951845 1.3003935 0.6996065
#> B4 0.2593798 0.3373815 0.3586616 1.6413384
#> B5 0.2593798 0.3373815 0.3586616 1.6413384

Multi-dimensional aggregation structures

Price indexes are often aggregated over multiple dimensions. Matched sample indexes that use sequential Poisson sampling are a good example, as there are usually take-all and take-some strata in addition to, say, an industry classification.

ms_weights$stratum <- c("TS", "TA", "TS", "TS", "TS")

ms_weights
#>   business classification weight stratum
#> 1       B1             11    553      TS
#> 2       B2             11    646      TA
#> 3       B3             11    312      TS
#> 4       B4             12    622      TS
#> 5       B5             12    330      TS

The easiest way to deal with multiple digit-wise classifications is to turn them into one classification. In this example the “stratum” dimension comes before the “classification” dimension for the purposes of parental imputation.

classification_sps <- with(ms_weights, paste0(classification, stratum))

classification_sps
#> [1] "11TS" "11TA" "11TS" "12TS" "12TS"

This classification can be expanded with the expand_classification() function as before, just with an extra instruction to say that the last “digit” in the classification is two characters wide, not one.

classification_sps <- expand_classification(
  classification_sps,
  width = c(1, 1, 2)
)

pias_sps <- with(
  ms_weights, 
  aggregation_structure(c(classification_sps, list(business)), weight)
)

The elemental indexes can now be aggregated according to this new aggregation structure.

index_sps <- aggregate(ms_elemental, pias_sps, na.rm = TRUE)
index_sps
#> Period-over-period price index for 11 levels over 4 time periods 
#>      202001    202002    202003   202004
#> 1         1 1.3007239 1.0630743 2.684412
#> 11        1 1.3007239 1.0630743 1.492443
#> 12        1 1.3007239 1.0630743 4.576286
#> 11TS      1 1.3007239 1.0630743 0.537996
#> 11TA      1 1.3007239 1.0630743 2.770456
#> 12TS      1 1.3007239 1.0630743 4.576286
#> B1        1 0.8949097 0.3342939 0.537996
#> B2        1 1.3007239 1.0630743 2.770456
#> B3        1 2.0200036 1.6353355 0.537996
#> B4        1 1.3007239 1.0630743 4.576286
#> B5        1 1.3007239 1.0630743 4.576286

When a price index has many dimensions (e.g., industry, sampling stratum, region), it can be useful to interact the classifications for these different dimensions to get all possible aggregation structures. The aggregated index can then be re-aggregated to get index values for all dimensions.

Continuing with the example, the industry and strata structures can be interacted to get two aggregation structures that can be used to re-aggregate index_sps.

interacted_hierarchy <- with(
  ms_weights,
  interact_classifications(
    expand_classification(classification),
    expand_classification(stratum)
  )
)

pias_sps2 <- lapply(
  interacted_hierarchy,
  \(x) aggregation_structure(c(x, ms_weights["business"]), ms_weights$weight)
)

index_sps2 <- lapply(pias_sps2, \(x) aggregate(index_sps, x, include_ea = FALSE))

The resulting indexes can be merged together to give an index that includes all combinations of industry and sampling stratum.

Reduce(merge, index_sps2)
#> Period-over-period price index for 8 levels over 4 time periods 
#>       202001   202002   202003   202004
#> 1:T        1 1.300724 1.063074 2.684412
#> 1:TS       1 1.300724 1.063074 2.653820
#> 1:TA       1 1.300724 1.063074 2.770456
#> 11:T       1 1.300724 1.063074 1.492443
#> 12:T       1 1.300724 1.063074 4.576286
#> 11:TS      1 1.300724 1.063074 0.537996
#> 11:TA      1 1.300724 1.063074 2.770456
#> 12:TS      1 1.300724 1.063074 4.576286

Matrix aggregation

Aggregating a price index can be done as a matrix operation. Although this approach is less flexible than the aggregate() method, it can be considerably faster for larger indexes. The key is to turn the aggregation structure into an aggregation matrix.

pias_matrix <- as.matrix(pias)
pias_matrix
#>           B1        B2        B3        B4        B5
#> 1  0.2245229 0.2622818 0.1266748 0.2525376 0.1339829
#> 11 0.3659828 0.4275314 0.2064858 0.0000000 0.0000000
#> 12 0.0000000 0.0000000 0.0000000 0.6533613 0.3466387

Multiplying this matrix by a matrix of fixed-base elemental indexes now computes the aggregate index in each time period.

pias_matrix %*% as.matrix(chain(ms_index[ms_weights$business]))
#>    202001   202002   202003   202004
#> 1       1 1.300724 1.382766 3.781536
#> 11      1 1.300724 1.382766 2.177187
#> 12      1 1.300724 1.382766 6.327934

Computing the shadow of an index

It’s often useful to determine which higher-level index values are missing, and subsequently get imputed during aggregation (i.e., compute the shadow of an index). This is simple to do if there’s an elemental index for each elemental aggregate in the aggregation structure.

ms_elemental2 <- elemental_index(
  ms_prices,
  relatives ~ period + factor(business, ms_weights$business),
  na.rm = TRUE
)

The idea is to simply aggregate an indicator for missingness to get a matrix that gives the share of missing elemental indexes for each higher-level index.

pias_matrix <- as.matrix(pias) > 0 
pias_matrix %*% is.na(ms_elemental2) / rowSums(pias_matrix)
#>    202001    202002    202003    202004
#> 1     0.4 0.6000000 0.6000000 0.4000000
#> 11    0.0 0.3333333 0.3333333 0.3333333
#> 12    1.0 1.0000000 1.0000000 0.5000000

A value of 1 means that there are no non-missing elemental indexes, and that the value for this level of the index is imputed from its parent in the aggregation structure. A value below 1 but above zero means that some but not all elemental indexes are missing, and the index value for this level is based on the non-missing elemental indexes. A value of zero means there’s no imputation for this level of the index.

Non-parental imputation during aggregation

Parental imputation is the usual way to impute missing index values during aggregation, and it is simple to do with aggregate(). In some cases, however, a business-level index may get imputed with the value for, say, another business, rather than for an entire group of businesses. The simplest way to do this sort of imputation is to alter the elemental indexes prior to aggregation. It is also possible to augment the aggregation structure with an imputation layer, but this is more complex.

Suppose that missing index values for business B2 should be imputed as 1, rather than the value for group 11. This replacement can be done as if the index was a matrix.

ms_elemental2 <- ms_elemental
ms_elemental2["B2", 2:3] <- 1
ms_elemental2
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001    202002    202003   202004
#> B1      1 0.8949097 0.3342939      NaN
#> B2      1 1.0000000 1.0000000 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4    NaN       NaN       NaN 4.576286

The index can now be aggregated as usual.

aggregate(ms_elemental2, pias, na.rm = TRUE)
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003   202004
#> 1       1 1.1721550 1.0400686 2.626560
#> 11      1 1.1721550 1.0400686 1.398142
#> 12      1 1.1721550 1.0400686 4.576286
#> B1      1 0.8949097 0.3342939 1.398142
#> B2      1 1.0000000 1.0000000 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4      1 1.1721550 1.0400686 4.576286
#> B5      1 1.1721550 1.0400686 4.576286

Alternate index-number formulas

By default, the elemental_index() function calculates a Jevons index. Although this is the standard index-number formula for making elemental indexes, many other types of index-numbers are possible. The Carli index (equally-weighted arithmetic mean of price relatives) is the main competitor to the Jevons, and requires specifying the order of the index r when calling elemental_index(). An order of 1 corresponds to an arithmetic mean.

elemental_index(ms_prices, relatives ~ period + business, na.rm = TRUE, r = 1)
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001     202002    202003   202004
#> B1      1  0.8949097 0.3342939      NaN
#> B2      1        NaN       NaN 5.155942
#> B3      1 23.7480455 2.4900997 0.607197
#> B4    NaN        NaN       NaN 9.368610

The Coggeshall index (equally-weighted harmonic mean of price relatives) is another competitor to the Jevons, but is seldom used in practice. Despite it being more exotic, it is just as easy to make by specifying an order r of -1.

elemental_index(ms_prices, relatives ~ period + business, na.rm = TRUE, r = -1)
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001    202002    202003    202004
#> B1      1 0.8949097 0.3342939       NaN
#> B2      1       NaN       NaN 1.7205750
#> B3      1 0.6591433 0.8185743 0.4746769
#> B4    NaN       NaN       NaN 2.2353790

The type of mean used to aggregate elemental indexes can be controlled in the same way in the call to aggregate(). The default makes an arithmetic index, but any type of generalized-mean index is possible.

Many superlative indexes can be made by supplying unequal and (usually) time-varying weights when making the elemental indexes. These weights often come from information on quantities.

ms_prices2 <- transform(ms_prices, quantity = 10 - price)

The Tornqvist index is a popular superlative index-number formula, using average period-over-period value shares as the weights in a geometric mean. The only tricky part is making the weights from data on prices and quantities.

library(gpindex)

tw <- grouped(index_weights("Tornqvist"))

ms_prices2[c("back_price", "back_quantity")] <- 
  ms_prices2[back_period(ms_prices2$period, ms_prices2$product),
             c("price", "quantity")]

ms_prices2 <- na.omit(ms_prices2) # can't have NAs for Tornqvist weights

ms_prices2$weight <- with(
  ms_prices2,
  tw(
    price, back_price, quantity, back_quantity,
    group = interaction(period, business)
  )
)

As elemental_index() makes a geometric index by default, all that is needed to make a Tornqvist index is to provide the weights.

elemental_index(
  ms_prices2,
  price / back_price ~ period + business,
  weights = weight
)
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001    202002    202003   202004
#> B1      1 0.8949097 0.3342939      NaN
#> B2      1       NaN       NaN 2.165152
#> B3      1 0.9520982 1.5913929 0.542372
#> B4    NaN       NaN       NaN 5.904237

Percent-change contributions

It’s often convenient to decompose an index into the (additive) contribution of each price relative, also known as the percent-change contributions. This can be done with the same work flow as in the previous examples, specifying contrib = TRUE when calling elemental_index().

ms_elemental <- elemental_index(
  ms_prices,
  relatives ~ period + business,
  contrib = TRUE, na.rm = TRUE
)

As with index values, percent-change contributions for a given level of the index can be extracted as a matrix.

contrib(ms_elemental)
#>   202001     202002     202003 202004
#> 1      0  0.0000000  0.0000000      0
#> 2     NA         NA -0.6657061      0
#> 3      0 -0.1050903         NA     NA

Aggregating the elemental indexes automatically aggregates percent-change contributions, so no extra steps are needed after the elemental indexes are made.

contrib(aggregate(ms_elemental, pias, na.rm = TRUE))
#>    202001      202002     202003       202004
#> 1       0  0.00000000  0.0000000  0.000000000
#> 10      0 -0.08782076  0.2731949 -0.078173579
#> 11      0  0.00000000         NA  0.059392635
#> 12      0  0.00000000         NA  1.322915301
#> 2      NA          NA -0.2928098  0.000000000
#> 3       0 -0.06718490         NA           NA
#> 4       0          NA         NA -0.018209690
#> 5       0          NA         NA  0.094562963
#> 6       0          NA         NA  0.427935081
#> 7       0  0.51646606 -0.2054665 -0.011177530
#> 8       0  0.01906845  0.1755868 -0.003784845
#> 9       0 -0.07980493  0.1125689 -0.058699008

Updating a basket

The functions in piar are all designed to work within a “basket”, which is a fancy way of saying within a given aggregation structure. Over time, however, aggregation structures change as the weights used to aggregate an index get updated, and new samples of businesses are drawn. The general approach to keep the time series going is to “chain” the index across baskets.

It is easier to see how to chain an index over time with a simple example that just splits the ms_prices data in two.

ms_prices1 <- subset(ms_prices, period <= "202003")
ms_prices2 <- subset(ms_prices, period >= "202003")

The index for the first basket can be calculate as usual.

ms_elemental1 <- elemental_index(
  ms_prices1, 
  price_relative(price, period = period, product = product) ~ period + business,
  na.rm = TRUE
)

ms_index1 <- aggregate(ms_elemental1, pias, na.rm = TRUE)

ms_index1
#> Period-over-period price index for 8 levels over 3 time periods 
#>    202001    202002    202003
#> 1       1 1.3007239 1.0630743
#> 11      1 1.3007239 1.0630743
#> 12      1 1.3007239 1.0630743
#> B1      1 0.8949097 0.3342939
#> B2      1 1.3007239 1.0630743
#> B3      1 2.0200036 1.6353355
#> B4      1 1.3007239 1.0630743
#> B5      1 1.3007239 1.0630743

Nothing special needs to be done to make the elemental indexes for the new basket, but it’s easier to remove the index values of 1 for quarter 3 2020.

ms_elemental2 <- ms_prices2 |>
  transform(rel = price_relative(price, period = period, product = product)) |>
  subset(period > "202003") |>
  elemental_index(rel ~ period + business, na.rm = TRUE)

Aggregating these elemental indexes, however, requires an aggregation structure. The results of the first example can be reproduced by simply “price updating” the original weights, then building the aggregation structure as usual.

ms_index2 <- aggregate(ms_elemental2, update(pias, ms_index1), na.rm = TRUE)

ms_index2
#> Period-over-period price index for 8 levels over 1 time periods 
#>      202004
#> 1  2.734761
#> 11 1.574515
#> 12 4.576286
#> B1 1.574515
#> B2 2.770456
#> B3 0.537996
#> B4 4.576286
#> B5 4.576286

This produces two sets of period-over-period indexes that can be stacked together and then chained.

chain(stack(ms_index1, ms_index2))
#> Fixed-base price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.3007239 1.3827662 3.7815355
#> 11      1 1.3007239 1.3827662 2.1771866
#> 12      1 1.3007239 1.3827662 6.3279338
#> B1      1 0.8949097 0.2991629 0.4710366
#> B2      1 1.3007239 1.3827662 3.8308934
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.3007239 1.3827662 6.3279338
#> B5      1 1.3007239 1.3827662 6.3279338

Carry-forward imputation

The previous examples used parental imputation to both impute missing price relatives when calculating the elemental indexes, and to impute missing elemental indexes during aggregation. Another common imputation strategy when making elemental indexes is to carry forward the previous price to impute for missing prices, and parentally impute missing elemental indexes during aggregation. As the elemental_index() function accepts price relatives as its input, other types of imputations can be done prior to passing price relatives to this function.

ms_elemental2 <- ms_prices |>
  transform(imputed_price = carry_forward(price, period = period, product = product)) |>
  elemental_index(
    price_relative(imputed_price, period = period, product = product) ~
      period + business,
    na.rm = TRUE
  )

ms_elemental2
#> Period-over-period price index for 4 levels over 4 time periods 
#>    202001    202002    202003   202004
#> B1      1 0.8949097 0.5781816 1.000000
#> B2      1 1.0000000 0.1777227 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4    NaN       NaN       NaN 4.576286

Aggregation follows the same steps as in the previous examples, with missing values set to be ignored in order to parentally impute missing elemental indexes.

ms_index <- aggregate(ms_elemental2, pias, na.rm = TRUE)
 
ms_index
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.1721550 0.8082981 2.2653614
#> 11      1 1.1721550 0.8082981 0.8093718
#> 12      1 1.1721550 0.8082981 4.5762862
#> B1      1 0.8949097 0.5781816 1.0000000
#> B2      1 1.0000000 0.1777227 2.7704563
#> B3      1 2.0200036 1.6353355 0.5379960
#> B4      1 1.1721550 0.8082981 4.5762862
#> B5      1 1.1721550 0.8082981 4.5762862

Calculating an index from multiple sources

All of the examples so far have built an index from a single source of price data. In many cases the elemental indexes are built from multiple sources of data, either because no single source of data has the necessary coverage, or different index-number formulas are employed for different elemental aggregates.

It is straightforward to merge index objects together, provided they’re for the same time periods. To keep the example simple, suppose that ms_prices is split in two.

ms_prices1 <- subset(ms_prices, business %in% c("B1", "B2", "B3"))
ms_prices2 <- subset(ms_prices, business == "B4")

Elemental indexes can be made for both groups separately with the usual recipe. Note that there is no data for business B4 in the first two periods, so the time periods need to be made explicit.

ms_elemental1 <- elemental_index(
  ms_prices1,
  price_relative(price, period = period, product = product) ~ period + business,
  na.rm = TRUE
)

ms_elemental1
#> Period-over-period price index for 3 levels over 4 time periods 
#>    202001    202002    202003   202004
#> B1      1 0.8949097 0.3342939      NaN
#> B2      1       NaN       NaN 2.770456
#> B3      1 2.0200036 1.6353355 0.537996

ms_elemental2 <- ms_prices2 |>
  transform(period = factor(period, levels = time(ms_elemental1))) |>
  elemental_index(
    price_relative(price, period = period, product = product) ~ period + business,
    na.rm = TRUE
  )

ms_elemental2
#> Period-over-period price index for 1 levels over 4 time periods 
#>    202001 202002 202003   202004
#> B4    NaN    NaN    NaN 4.576286

Once the elemental indexes are made, they can be merged together and then aggregated.

aggregate(merge(ms_elemental1, ms_elemental2), pias, na.rm = TRUE)
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003   202004
#> 1       1 1.3007239 1.0630743 2.734761
#> 11      1 1.3007239 1.0630743 1.574515
#> 12      1 1.3007239 1.0630743 4.576286
#> B1      1 0.8949097 0.3342939 1.574515
#> B2      1 1.3007239 1.0630743 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4      1 1.3007239 1.0630743 4.576286
#> B5      1 1.3007239 1.0630743 4.576286

A slightly more complex case is when some of the input data are already a price index. For example, suppose the index values for businesses B4 and B5 come from some outside process, and are taken as inputs.

ms_prices2 <- subset(
  as.data.frame(aggregate(ms_elemental, pias, na.rm = TRUE)),
  level %in% c("B4", "B5")
)

ms_prices2
#>    period level    value
#> 7  202001    B4 1.000000
#> 8  202001    B5 1.000000
#> 15 202002    B4 1.300724
#> 16 202002    B5 1.300724
#> 23 202003    B4 1.063074
#> 24 202003    B5 1.063074
#> 31 202004    B4 4.576286
#> 32 202004    B5 4.576286

All that is required is to pass the pre-existing indexes to as_index() to cast them into the correct form. This won’t affect their values, but will allow them to be merged with the other elemental indexes, and aggregated.

ms_elemental2 <- as_index(ms_prices2)

aggregate(merge(ms_elemental1, ms_elemental2), pias, na.rm = TRUE)
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003   202004
#> 1       1 1.3007239 1.0630743 2.734761
#> 11      1 1.3007239 1.0630743 1.574515
#> 12      1 1.3007239 1.0630743 4.576286
#> B1      1 0.8949097 0.3342939 1.574515
#> B2      1 1.3007239 1.0630743 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4      1 1.3007239 1.0630743 4.576286
#> B5      1 1.3007239 1.0630743 4.576286

Aggregating a Paasche index

All of the examples so far have used a single set of weights to aggregate an index. Although this is by far the most common case, there are situations where the aggregation weights change every period. The Paasche index is the notable example, as the weights for aggregation are the current-period revenue shares in each period.

weights <- data.frame(
  period = rep(c("202001", "202002", "202003", "202004"), each = 5),
  classification = ms_weights$classification,
  weight = 1:20
)

head(weights)
#>   period classification weight
#> 1 202001             11      1
#> 2 202001             11      2
#> 3 202001             11      3
#> 4 202001             12      4
#> 5 202001             12      5
#> 6 202002             11      6

The only new tools needed to deal with time-varying weights are the stack() and unstack() functions. stack() appends a later index series onto an earlier one for the same levels, whereas unstack() pulls apart an index series for many periods into a collection of one-period indexes.

The first step to making a Paasche index is to unstack the elemental indexes into a list of elemental indexes for each period. (Trying to make the elemental indexes period-by-period can be dangerous when there are missing values.)

ms_elemental <- unstack(ms_elemental)

ms_elemental
#> $`202001`
#> Period-over-period price index for 4 levels over 1 time periods 
#>    202001
#> B1      1
#> B2      1
#> B3      1
#> B4    NaN
#> 
#> $`202002`
#> Period-over-period price index for 4 levels over 1 time periods 
#>       202002
#> B1 0.8949097
#> B2       NaN
#> B3 2.0200036
#> B4       NaN
#> 
#> $`202003`
#> Period-over-period price index for 4 levels over 1 time periods 
#>       202003
#> B1 0.3342939
#> B2       NaN
#> B3 1.6353355
#> B4       NaN
#> 
#> $`202004`
#> Period-over-period price index for 4 levels over 1 time periods 
#>      202004
#> B1      NaN
#> B2 2.770456
#> B3 0.537996
#> B4 4.576286

The second step is to make a sequence of aggregation structures for each set of weights.

pias <- with(
  weights, 
  Map(aggregation_structure, list(hierarchy), split(weight, period))
)

Making the Paasche index for each period is now just a case of mapping the aggregate() function to each elemental index and aggregation structure, and then reducing the result with the stack() function.

paasche <- Reduce(
  stack,
  Map(aggregate, ms_elemental, pias, na.rm = TRUE, r = -1)
)

paasche
#> Period-over-period price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.3127080 0.5874490 1.3591916
#> 11      1 1.3127080 0.5874490 0.8839797
#> 12      1 1.3127080 0.5874490 4.5762862
#> B1      1 0.8949097 0.3342939 0.8839797
#> B2      1 1.3127080 0.5874490 2.7704563
#> B3      1 2.0200036 1.6353355 0.5379960
#> B4      1 1.3127080 0.5874490 4.5762862
#> B5      1 1.3127080 0.5874490 4.5762862

Making a Fisher index

With a Paasche index in hand, it is now trivial to make a Fisher index by first making the period-over-period Laspeyres index, and then doing a simple matrix operation.

laspeyres <- Reduce(
  stack,
  Map(aggregate, ms_elemental, pias[c(1, 1, 2, 3)], na.rm = TRUE)
)

fisher <- sqrt(as.matrix(laspeyres) * as.matrix(paasche))

fisher
#>    202001    202002    202003   202004
#> 1       1 1.5107763 0.7956890 1.996688
#> 11      1 1.5107763 0.7956890 1.192826
#> 12      1 1.5107763 0.7956890 4.576286
#> B1      1 0.8949097 0.3342939 1.192826
#> B2      1 1.5107763 0.7956890 2.770456
#> B3      1 2.0200036 1.6353355 0.537996
#> B4      1 1.5107763 0.7956890 4.576286
#> B5      1 1.5107763 0.7956890 4.576286

Percent-change contributions can similarly be computed with a matrix operation.

geometric_weights <- transmute_weights(0, 1)

w <- mapply(
  \(x, y) scale_weights(geometric_weights(c(x, y))),
  as.numeric(laspeyres[1]),
  as.numeric(paasche[1])
)

laspeyres_contrib <- contrib(laspeyres)
paasche_contrib <- contrib(paasche)

fisher_contrib <- w[1, col(laspeyres_contrib)] * laspeyres_contrib +
  w[2, col(paasche_contrib)] * paasche_contrib

fisher_contrib
#>    202001      202002      202003       202004
#> 1       0  0.00000000  0.00000000  0.000000000
#> 10      0 -0.13327742  0.17296129 -0.131946938
#> 11      0  0.00000000          NA  0.039533034
#> 12      0  0.00000000          NA  0.880561300
#> 2      NA          NA -0.42962331  0.000000000
#> 3       0 -0.04756479          NA           NA
#> 4       0          NA          NA -0.012019870
#> 5       0          NA          NA  0.062419211
#> 6       0          NA          NA  0.282471798
#> 7       0  0.78379264 -0.13008203 -0.018866231
#> 8       0  0.02893842  0.11116498 -0.006388331
#> 9       0 -0.12111255  0.07126804 -0.099076370

Despite being a matrix, the resulting Fisher index can be chained just like any other index.

chain(fisher)
#> Fixed-base price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.5107763 1.2021080 2.4002342
#> 11      1 1.5107763 1.2021080 1.4339054
#> 12      1 1.5107763 1.2021080 5.5011904
#> B1      1 0.8949097 0.2991629 0.3568492
#> B2      1 1.5107763 1.2021080 3.3303878
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.5107763 1.2021080 5.5011904
#> B5      1 1.5107763 1.2021080 5.5011904

A chained Fisher index can also be made by first chaining the Laspeyres and Paasche indexes, then taking the geometric mean.

sqrt(as.matrix(chain(laspeyres)) * as.matrix(chain(paasche)))
#>    202001    202002    202003    202004
#> 1       1 1.5107763 1.2021080 2.4002342
#> 11      1 1.5107763 1.2021080 1.4339054
#> 12      1 1.5107763 1.2021080 5.5011904
#> B1      1 0.8949097 0.2991629 0.3568492
#> B2      1 1.5107763 1.2021080 3.3303878
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.5107763 1.2021080 5.5011904
#> B5      1 1.5107763 1.2021080 5.5011904