Manipulate Competition Results

Evgeni Chasnovski

2023-02-28

This vignette will describe comperes functionality for manipulating (summarising and transforming) competition results (hereafter - results):

We will need the following packages:

library(comperes)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)

Example results in long format:

cr_long <- tibble(
  game   = c("a1", "a1", "a1", "a2", "a2", "b1", "b1", "b2"),
  player = c(1, NA, NA, 1, 2, 2, 1, 2),
  score  = 1:8,
  season = c(rep("A", 5), rep("B", 3))
) %>%
  as_longcr()

Functions discussed in these topics leverage dplyr’s grammar of data manipulation. Only basic knowledge is enough to use them. Also a knowledge of rlang’s quotation mechanism is preferred.

Item summaries

Item summary is understand as some summary measurements (of arbitrary nature) of item (one or more columns) present in data. To compute them, comperes offers summarise_*() family of functions in which summary functions should be provided as in dplyr::summarise(). Basically, they are wrappers for grouped summarise with forced ungrouping, conversion to tibble and possible adding prefix to summaries. Note that if one of columns in item is a factor with implicit NAs (present in vector but not in levels), there will be a warning suggesting to add NA to levels. This is due to group_by() functionality in dplyr after 0.8.0 version.

Couple of examples:

cr_long %>% summarise_player(mean_score = mean(score))
#> # A tibble: 3 × 2
#>   player mean_score
#>    <dbl>      <dbl>
#> 1      1       4   
#> 2      2       6.33
#> 3     NA       2.5

cr_long %>% summarise_game(min_score = min(score), max_score = max(score))
#> # A tibble: 4 × 3
#>   game  min_score max_score
#>   <chr>     <int>     <int>
#> 1 a1            1         3
#> 2 a2            4         5
#> 3 b1            6         7
#> 4 b2            8         8

cr_long %>% summarise_item("season", sd_score = sd(score))
#> # A tibble: 2 × 2
#>   season sd_score
#>   <chr>     <dbl>
#> 1 A          1.58
#> 2 B          1

For convenient transformation of results there are join_*_summary() family of functions, which compute respective summaries and join them to original data:

cr_long %>%
  join_item_summary("season", season_mean_score = mean(score)) %>%
  mutate(score = score - season_mean_score)
#> # A longcr object:
#> # A tibble: 8 × 5
#>   game  player score season season_mean_score
#>   <chr>  <dbl> <dbl> <chr>              <dbl>
#> 1 a1         1    -2 A                      3
#> 2 a1        NA    -1 A                      3
#> 3 a1        NA     0 A                      3
#> 4 a2         1     1 A                      3
#> 5 a2         2     2 A                      3
#> 6 b1         2    -1 B                      7
#> 7 b1         1     0 B                      7
#> 8 b2         2     1 B                      7

For common summary functions comperes has a list summary_funs with 8 quoted expressions to be used with rlang’s unquoting mechanism:

# Use .prefix to add prefix to summary columns
cr_long %>%
  join_player_summary(!!!summary_funs[1:2], .prefix = "player_") %>%
  join_item_summary("season", !!!summary_funs[1:2], .prefix = "season_")
#> # A longcr object:
#> # A tibble: 8 × 8
#>   game  player score season player_min_score player_max_score season_m…¹ seaso…²
#>   <chr>  <dbl> <int> <chr>             <int>            <int>      <int>   <int>
#> 1 a1         1     1 A                     1                7          1       5
#> 2 a1        NA     2 A                     2                3          1       5
#> 3 a1        NA     3 A                     2                3          1       5
#> 4 a2         1     4 A                     1                7          1       5
#> 5 a2         2     5 A                     5                8          1       5
#> 6 b1         2     6 B                     5                8          6       8
#> 7 b1         1     7 B                     1                7          6       8
#> 8 b2         2     8 B                     5                8          6       8
#> # … with abbreviated variable names ¹​season_min_score, ²​season_max_score

Head-to-Head values

Head-to-Head value is a summary statistic of direct confrontation between two players. It is assumed that this value can be computed based only on the players’ matchups, data of actual participation for ordered pair of players in one game.

To compute matchups, comperes has get_matchups(), which returns a widecr object with all matchups actually present in results (including matchups of players with themselves). Note that missing values in player column are treated as separate players. It allows operating with games where multiple players’ identifiers are not known. However, when computing Head-to-Head values they treated as single player. Example:

get_matchups(cr_long)
#> # A widecr object:
#> # A tibble: 18 × 5
#>    game  player1 score1 player2 score2
#>    <chr>   <dbl>  <int>   <dbl>  <int>
#>  1 a1          1      1       1      1
#>  2 a1          1      1      NA      2
#>  3 a1          1      1      NA      3
#>  4 a1         NA      2       1      1
#>  5 a1         NA      2      NA      2
#>  6 a1         NA      2      NA      3
#>  7 a1         NA      3       1      1
#>  8 a1         NA      3      NA      2
#>  9 a1         NA      3      NA      3
#> 10 a2          1      4       1      4
#> 11 a2          1      4       2      5
#> 12 a2          2      5       1      4
#> 13 a2          2      5       2      5
#> 14 b1          2      6       2      6
#> 15 b1          2      6       1      7
#> 16 b1          1      7       2      6
#> 17 b1          1      7       1      7
#> 18 b2          2      8       2      8

Head-to-Head values can be stored in two ways:

cr_long %>%
  h2h_long(
    abs_diff = mean(abs(score1 - score2)),
    num_wins = sum(score1 > score2)
  )
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 4
#>   player1 player2 abs_diff num_wins
#>     <dbl>   <dbl>    <dbl>    <int>
#> 1       1       1      0          0
#> 2       1       2      1          1
#> 3       1      NA      1.5        0
#> 4       2       1      1          1
#> 5       2       2      0          0
#> 6       2      NA     NA         NA
#> 7      NA       1      1.5        2
#> 8      NA       2     NA         NA
#> 9      NA      NA      0.5        1
cr_long %>% h2h_mat(sum_score = sum(score1 + score2))
#> # A matrix format of Head-to-Head values:
#>       1  2 <NA>
#> 1    24 22    7
#> 2    22 38   NA
#> <NA>  7 NA   20

comperes also offers a list h2h_funs of 9 common Head-to-Head functions as quoted expressions to be used with rlang’s unquoting mechanism:

cr_long %>% h2h_long(!!!h2h_funs)
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 11
#>   player1 player2 mean_score_d…¹ mean_…² mean_…³ sum_s…⁴ sum_s…⁵ sum_s…⁶ num_w…⁷
#>     <dbl>   <dbl>          <dbl>   <dbl>   <dbl>   <int>   <dbl>   <int>   <dbl>
#> 1       1       1            0       0      4          0       0      12       0
#> 2       1       2            0       0      5.5        0       0      11       1
#> 3       1      NA           -1.5     0      1         -3       0       2       0
#> 4       2       1            0       0      5.5        0       0      11       1
#> 5       2       2            0       0      6.33       0       0      19       0
#> 6       2      NA           NA      NA     NA         NA      NA      NA      NA
#> 7      NA       1            1.5     1.5    2.5        3       3       5       2
#> 8      NA       2           NA      NA     NA         NA      NA      NA      NA
#> 9      NA      NA            0       0      2.5        0       0      10       1
#> # … with 2 more variables: num_wins2 <dbl>, num <int>, and abbreviated variable
#> #   names ¹​mean_score_diff, ²​mean_score_diff_pos, ³​mean_score, ⁴​sum_score_diff,
#> #   ⁵​sum_score_diff_pos, ⁶​sum_score, ⁷​num_wins

To compute Head-to-Head for only subset of players or include values for players that are not in the results, use factor player column. Notes:

cr_long_fac <- cr_long %>%
  mutate(player = factor(player, levels = c(1, 2, 3)))

cr_long_fac %>%
  h2h_long(abs_diff = mean(abs(score1 - score2)),
           fill = list(abs_diff = -100))
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 3
#>   player1 player2 abs_diff
#>   <fct>   <fct>      <dbl>
#> 1 1       1              0
#> 2 1       2              1
#> 3 1       3           -100
#> 4 2       1              1
#> 5 2       2              0
#> 6 2       3           -100
#> 7 3       1           -100
#> 8 3       2           -100
#> 9 3       3           -100

cr_long_fac %>%
  h2h_mat(mean(abs(score1 - score2)),
          fill = -100)
#> # A matrix format of Head-to-Head values:
#>      1    2    3
#> 1    0    1 -100
#> 2    1    0 -100
#> 3 -100 -100 -100

Conversion

To convert between long and matrix formats of Head-to-Head values, comperes has to_h2h_long() and to_h2h_mat() which convert from matrix to long and from long to matrix respectively. Note that output of to_h2h_long() has player1 and player2 columns as characters. Examples:

cr_long %>% h2h_mat(mean(score1)) %>% to_h2h_long()
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 3
#>   player1 player2 h2h_value
#>   <chr>   <chr>       <dbl>
#> 1 1       1            4   
#> 2 1       2            5.5 
#> 3 1       <NA>         1   
#> 4 2       1            5.5 
#> 5 2       2            6.33
#> 6 2       <NA>        NA   
#> 7 <NA>    1            2.5 
#> 8 <NA>    2           NA   
#> 9 <NA>    <NA>         2.5

cr_long %>%
  h2h_long(mean_score1 = mean(score1), mean_score2 = mean(score2)) %>%
  to_h2h_mat()
#> Using mean_score1 as value.
#> # A matrix format of Head-to-Head values:
#>        1        2 <NA>
#> 1    4.0 5.500000  1.0
#> 2    5.5 6.333333   NA
#> <NA> 2.5       NA  2.5

All this functionality is powered by useful outside of comperes functions long_to_mat() and mat_to_long(). They convert general pair-value data between long and matrix format:

pair_value_long <- tibble(
  key_1 = c(1, 1, 2),
  key_2 = c(2, 3, 3),
  val = 1:3
)

pair_value_mat <- pair_value_long %>%
  long_to_mat(row_key = "key_1", col_key = "key_2", value = "val")
pair_value_mat
#>    2 3
#> 1  1 2
#> 2 NA 3

pair_value_mat %>%
  mat_to_long(
    row_key = "key_1", col_key = "key_2", value = "val",
    drop = TRUE
  )
#> # A tibble: 3 × 3
#>   key_1 key_2   val
#>   <chr> <chr> <int>
#> 1 1     2         1
#> 2 1     3         2
#> 3 2     3         3

Pairgames

For some ranking algorithms it crucial that games should only be between two players. comperes has function to_pairgames() for this. It removes games with one player. Games with three and more players to_pairgames() splits into separate games between unordered pairs of different players without specific order. Note that game identifiers are changed to integers but order of initial games is preserved. Example:

to_pairgames(cr_long)
#> # A widecr object:
#> # A tibble: 5 × 5
#>    game player1 score1 player2 score2
#>   <int>   <dbl>  <int>   <dbl>  <int>
#> 1     1       1      1      NA      2
#> 2     2       1      1      NA      3
#> 3     3      NA      2      NA      3
#> 4     4       1      4       2      5
#> 5     5       2      6       1      7