This vignette will describe comperes
functionality for
manipulating (summarising and transforming) competition results
(hereafter - results):
We will need the following packages:
library(comperes)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rlang)
Example results in long format:
cr_long <- tibble(
game = c("a1", "a1", "a1", "a2", "a2", "b1", "b1", "b2"),
player = c(1, NA, NA, 1, 2, 2, 1, 2),
score = 1:8,
season = c(rep("A", 5), rep("B", 3))
) %>%
as_longcr()
Functions discussed in these topics leverage dplyr
’s
grammar of data manipulation. Only basic knowledge is enough to use
them. Also a knowledge of rlang
’s quotation mechanism is
preferred.
Item summary is understand as some summary measurements (of arbitrary
nature) of item (one or more columns) present in data. To compute them,
comperes
offers summarise_*()
family of
functions in which summary functions should be provided as in
dplyr::summarise()
. Basically, they are wrappers for
grouped summarise with forced ungrouping, conversion to
tibble
and possible adding prefix to summaries.
Note that if one of columns in item is a factor with
implicit NA
s (present in vector but not in levels), there
will be a warning suggesting to add NA
to levels. This is
due to group_by()
functionality in dplyr
after
0.8.0 version.
Couple of examples:
cr_long %>% summarise_player(mean_score = mean(score))
#> # A tibble: 3 × 2
#> player mean_score
#> <dbl> <dbl>
#> 1 1 4
#> 2 2 6.33
#> 3 NA 2.5
cr_long %>% summarise_game(min_score = min(score), max_score = max(score))
#> # A tibble: 4 × 3
#> game min_score max_score
#> <chr> <int> <int>
#> 1 a1 1 3
#> 2 a2 4 5
#> 3 b1 6 7
#> 4 b2 8 8
cr_long %>% summarise_item("season", sd_score = sd(score))
#> # A tibble: 2 × 2
#> season sd_score
#> <chr> <dbl>
#> 1 A 1.58
#> 2 B 1
For convenient transformation of results there are
join_*_summary()
family of functions, which compute
respective summaries and join them to original data:
cr_long %>%
join_item_summary("season", season_mean_score = mean(score)) %>%
mutate(score = score - season_mean_score)
#> # A longcr object:
#> # A tibble: 8 × 5
#> game player score season season_mean_score
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 a1 1 -2 A 3
#> 2 a1 NA -1 A 3
#> 3 a1 NA 0 A 3
#> 4 a2 1 1 A 3
#> 5 a2 2 2 A 3
#> 6 b1 2 -1 B 7
#> 7 b1 1 0 B 7
#> 8 b2 2 1 B 7
For common summary functions comperes
has a list
summary_funs
with 8 quoted expressions to be used with
rlang
’s unquoting mechanism:
# Use .prefix to add prefix to summary columns
cr_long %>%
join_player_summary(!!!summary_funs[1:2], .prefix = "player_") %>%
join_item_summary("season", !!!summary_funs[1:2], .prefix = "season_")
#> # A longcr object:
#> # A tibble: 8 × 8
#> game player score season player_min_score player_max_score season_m…¹ seaso…²
#> <chr> <dbl> <int> <chr> <int> <int> <int> <int>
#> 1 a1 1 1 A 1 7 1 5
#> 2 a1 NA 2 A 2 3 1 5
#> 3 a1 NA 3 A 2 3 1 5
#> 4 a2 1 4 A 1 7 1 5
#> 5 a2 2 5 A 5 8 1 5
#> 6 b1 2 6 B 5 8 6 8
#> 7 b1 1 7 B 1 7 6 8
#> 8 b2 2 8 B 5 8 6 8
#> # … with abbreviated variable names ¹season_min_score, ²season_max_score
Head-to-Head value is a summary statistic of direct confrontation between two players. It is assumed that this value can be computed based only on the players’ matchups, data of actual participation for ordered pair of players in one game.
To compute matchups, comperes
has
get_matchups()
, which returns a widecr
object
with all matchups actually present in results (including matchups of
players with themselves). Note that missing values in
player
column are treated as separate players. It allows
operating with games where multiple players’ identifiers are not known.
However, when computing Head-to-Head values they treated as single
player. Example:
get_matchups(cr_long)
#> # A widecr object:
#> # A tibble: 18 × 5
#> game player1 score1 player2 score2
#> <chr> <dbl> <int> <dbl> <int>
#> 1 a1 1 1 1 1
#> 2 a1 1 1 NA 2
#> 3 a1 1 1 NA 3
#> 4 a1 NA 2 1 1
#> 5 a1 NA 2 NA 2
#> 6 a1 NA 2 NA 3
#> 7 a1 NA 3 1 1
#> 8 a1 NA 3 NA 2
#> 9 a1 NA 3 NA 3
#> 10 a2 1 4 1 4
#> 11 a2 1 4 2 5
#> 12 a2 2 5 1 4
#> 13 a2 2 5 2 5
#> 14 b1 2 6 2 6
#> 15 b1 2 6 1 7
#> 16 b1 1 7 2 6
#> 17 b1 1 7 1 7
#> 18 b2 2 8 2 8
Head-to-Head values can be stored in two ways:
tibble
with columns
player1
and player2
which identify ordered
pair of players, and columns corresponding to Head-to-Head values.
Computation is done with h2h_long()
which returns an object
of class h2h_long
. Head-to-Head functions are specified as
in dplyr
’s grammar for results
matchups:cr_long %>%
h2h_long(
abs_diff = mean(abs(score1 - score2)),
num_wins = sum(score1 > score2)
)
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 4
#> player1 player2 abs_diff num_wins
#> <dbl> <dbl> <dbl> <int>
#> 1 1 1 0 0
#> 2 1 2 1 1
#> 3 1 NA 1.5 0
#> 4 2 1 1 1
#> 5 2 2 0 0
#> 6 2 NA NA NA
#> 7 NA 1 1.5 2
#> 8 NA 2 NA NA
#> 9 NA NA 0.5 1
h2h_mat()
which returns an object of class
h2h_mat
. Head-to-Head functions are specified as in
h2h_long()
:cr_long %>% h2h_mat(sum_score = sum(score1 + score2))
#> # A matrix format of Head-to-Head values:
#> 1 2 <NA>
#> 1 24 22 7
#> 2 22 38 NA
#> <NA> 7 NA 20
comperes
also offers a list h2h_funs
of 9
common Head-to-Head functions as quoted expressions to be used with
rlang
’s unquoting mechanism:
cr_long %>% h2h_long(!!!h2h_funs)
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 11
#> player1 player2 mean_score_d…¹ mean_…² mean_…³ sum_s…⁴ sum_s…⁵ sum_s…⁶ num_w…⁷
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
#> 1 1 1 0 0 4 0 0 12 0
#> 2 1 2 0 0 5.5 0 0 11 1
#> 3 1 NA -1.5 0 1 -3 0 2 0
#> 4 2 1 0 0 5.5 0 0 11 1
#> 5 2 2 0 0 6.33 0 0 19 0
#> 6 2 NA NA NA NA NA NA NA NA
#> 7 NA 1 1.5 1.5 2.5 3 3 5 2
#> 8 NA 2 NA NA NA NA NA NA NA
#> 9 NA NA 0 0 2.5 0 0 10 1
#> # … with 2 more variables: num_wins2 <dbl>, num <int>, and abbreviated variable
#> # names ¹mean_score_diff, ²mean_score_diff_pos, ³mean_score, ⁴sum_score_diff,
#> # ⁵sum_score_diff_pos, ⁶sum_score, ⁷num_wins
To compute Head-to-Head for only subset of players or include values
for players that are not in the results, use factor player
column. Notes:
fill
argument to replace NA
s
in certain columns after computing Head-to-Head values.summarise_item()
, there
will be a warning in case of implicit NA
s in factor
columns.cr_long_fac <- cr_long %>%
mutate(player = factor(player, levels = c(1, 2, 3)))
cr_long_fac %>%
h2h_long(abs_diff = mean(abs(score1 - score2)),
fill = list(abs_diff = -100))
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 3
#> player1 player2 abs_diff
#> <fct> <fct> <dbl>
#> 1 1 1 0
#> 2 1 2 1
#> 3 1 3 -100
#> 4 2 1 1
#> 5 2 2 0
#> 6 2 3 -100
#> 7 3 1 -100
#> 8 3 2 -100
#> 9 3 3 -100
cr_long_fac %>%
h2h_mat(mean(abs(score1 - score2)),
fill = -100)
#> # A matrix format of Head-to-Head values:
#> 1 2 3
#> 1 0 1 -100
#> 2 1 0 -100
#> 3 -100 -100 -100
To convert between long and matrix formats of Head-to-Head values,
comperes
has to_h2h_long()
and
to_h2h_mat()
which convert from matrix to long and from
long to matrix respectively. Note that output of
to_h2h_long()
has player1
and
player2
columns as characters. Examples:
cr_long %>% h2h_mat(mean(score1)) %>% to_h2h_long()
#> # A long format of Head-to-Head values:
#> # A tibble: 9 × 3
#> player1 player2 h2h_value
#> <chr> <chr> <dbl>
#> 1 1 1 4
#> 2 1 2 5.5
#> 3 1 <NA> 1
#> 4 2 1 5.5
#> 5 2 2 6.33
#> 6 2 <NA> NA
#> 7 <NA> 1 2.5
#> 8 <NA> 2 NA
#> 9 <NA> <NA> 2.5
cr_long %>%
h2h_long(mean_score1 = mean(score1), mean_score2 = mean(score2)) %>%
to_h2h_mat()
#> Using mean_score1 as value.
#> # A matrix format of Head-to-Head values:
#> 1 2 <NA>
#> 1 4.0 5.500000 1.0
#> 2 5.5 6.333333 NA
#> <NA> 2.5 NA 2.5
All this functionality is powered by useful outside of
comperes
functions long_to_mat()
and
mat_to_long()
. They convert general pair-value data between
long and matrix format:
pair_value_long <- tibble(
key_1 = c(1, 1, 2),
key_2 = c(2, 3, 3),
val = 1:3
)
pair_value_mat <- pair_value_long %>%
long_to_mat(row_key = "key_1", col_key = "key_2", value = "val")
pair_value_mat
#> 2 3
#> 1 1 2
#> 2 NA 3
pair_value_mat %>%
mat_to_long(
row_key = "key_1", col_key = "key_2", value = "val",
drop = TRUE
)
#> # A tibble: 3 × 3
#> key_1 key_2 val
#> <chr> <chr> <int>
#> 1 1 2 1
#> 2 1 3 2
#> 3 2 3 3
For some ranking algorithms it crucial that games should only be
between two players. comperes
has function
to_pairgames()
for this. It removes games with one player.
Games with three and more players to_pairgames()
splits
into separate games between unordered pairs of
different players without specific order. Note that
game identifiers are changed to integers but order of initial games is
preserved. Example: