This vignette demonstrates the win ratio (or net benefit) approach to multivariate ordinal data (Buyse 2010; Bebu and Lachin 2015; Mao 2024), under consensus or prioritized order between components.
Let \(Y = (y_1,\ldots, y_K)\) be a \(K\)-vector of ordinal responses, e.g., \(y_k =0,1,\ldots, m_k -1\) for some \(m_k\in\mathbb Z^+\). The components may be rating scores provided by different readers on the same subject’s medical image. Such vector-valued responses can be partially ordered by consensus among components, that is, we say \(Y_i\prec Y_j\) if \(Y_i\leq Y_j\) component-wise, with strict inequality for at least one component. Alternatively, if some reader, say the first one, is more experienced than the rest, their score can be prioritized. Then we say \(Y_i\prec Y_j\) if \(y_{1i} < y_{1j}\), or \(y_{1i} = y_{1j}\) and a consensus order holds on the other components. In either case, we assume that a higher score is better, so that \(Y_i\succ Y_j\) (or \(Y_j\prec Y_i\)) means a “win” for \(Y_i\) and a “loss” for \(Y_j\).
Let \(Z = 1, 0\) be a binary indicator for treatment and control groups, respectively. The probability of a treated winning against an untreated is \(P(Y_i\succ Y_j\mid Z_i = 1, Z_j = 0)\). Likewise, that of a treated losing to an untreated is \(P(Y_i\prec Y_j\mid Z_i = 1, Z_j = 0)\). To summarize the relative favorability of treatment against control, consider \[\begin{align}\tag{1} \mbox{Win ratio: } \hspace{1ex}& WR = \frac{P(Y_i\succ Y_j\mid Z_i = 1, Z_j = 0)}{P(Y_i\prec Y_j\mid Z_i = 1, Z_j = 0)}\\ \mbox{Net benefit: } \hspace{1ex}& NB = P(Y_i\succ Y_j\mid Z_i = 1, Z_j = 0) - P(Y_i\prec Y_j\mid Z_i = 1, Z_j = 0). \end{align}\]
Given \((Y_i, Z_i)\) \((i=1,\ldots, n)\), a random \(n\)-sample of \((Y, Z)\), standard two-sample \(U\)-statistics can be used to estimate the win and loss probabilities, which can then replace the target quantities in (1) for nonparametric estimates of WR and NB.
If \(Z\) is non-binary but rather a \(p\)-vector of possibly continuous components, the nonparametric approach no longer works. To reduce dimension, we posit a multiplicative win ratio model \[\begin{align}\tag{2} WR(Z_i, Z_j) := \frac{P(Y_i\succ Y_j\mid Z_i, Z_j)}{P(Y_i\prec Y_j\mid Z_i, Z_j)} =\exp\{\beta^{\rm T}(Z_i - Z_j)\}. \end{align}\] This means that unit increases in the covariates lead to win ratios \(\exp(\beta)\) (component-wise). Standard estimators of \(\exp(\beta)\) reduce to the two-sample win ratio when \(Z = 1, 0\).
Load the package:
For two-sample WR/NB:
where Y1
and Y0
are response matrices in
the treatment and control, respectively, each with \(K\) columns for the \(K\) components. fun
can be a
user-defined function for the partial order that takes two \(K\)-vectors and outputs \(1\), \(-1\), \(0\) if the first wins, loses, or ties with
the second, respectively. The default is the function wprod
for the consensus (product) order.
For win ratio regression (2):
where Y
is an \(n\times
K\) response matrix and \(Z\) is
an \(n\times p\) design (covariate)
matrix. Again, the win function can be customized in fun
.
Regression results are summarized by summary(obj)
.
A total of 186 patients with non-alcoholic fatty liver disease were recruited at the University of Wisconsin Hospitals in 2017. The patients underwent computed tomography scan of the liver for the presence of non-alcoholic steato-hepatitis, the most severe form of non-alcoholic fatty liver disease. The image was subsequently assessed by two radiologists using a scale of 1 to 5, with higher values indicating greater likelihood of disease. Predictors of rating scores include patient sex, the presence of advanced fibrosis (AF), and quantitative biomarkers such as percent of steatosis, i.e., liver fat content, liver mean gray level intensity (SSF2), and liver surface nodularity (LSN) score.
head(liver)
#> R1NASH R2NASH Sex AF Steatosis SSF2 LSN
#> 1 3 2 M FALSE 30 0.21 2.33
#> 2 1 1 F FALSE 5 0.38 2.86
#> 3 4 2 M FALSE 70 0.58 3.65
#> 4 4 4 F TRUE 30 -0.08 2.73
#> 5 4 3 M TRUE 70 -0.04 2.53
#> 6 3 3 M FALSE 10 0.02 2.88
First, compare the bivariate scores between AF and non-AF by win ratio/net benefit:
# lower score is better
Y1 <- 5 - liver[liver$AF, c("R1NASH", "R2NASH")] # advanced
Y0 <- 5 - liver[!liver$AF, c("R1NASH", "R2NASH")] # not advanced
obj <- wrtest(Y1, Y0)
obj
#> Call:
#> wrtest(Y1 = Y1, Y0 = Y0)
#>
#> Two-sample (Y1 vs Y0) win ratio/net benefit analysis
#>
#> Number of pairs: N1 x N0 = 69 x 116 = 8004
#> Win: 2392 (29.9%)
#> Loss: 4251 (53.1%)
#> Tie: 1361 (17%)
#>
#> Win ratio (95% CI): 0.56 (0.37, 0.86), p-value = 0.00856547
#> Net benefit (95% CI): -0.232 (-0.4, -0.065), p-value = 0.006577537
This shows, in particular, that AF is \(1-0.56=44\%\) less likely than non-AF to have favorable scores by consensus of the two readers.
To regress the scores against other covariates:
Y <- 5 - liver[, c("R1NASH", "R2NASH")] # lower score is better
Z <- cbind("Female" = liver$Sex == "F",
liver[, c("AF", "Steatosis", "SSF2", "LSN")]) # covariates
obj <- wreg(Y, Z) # fit model
obj
#> Call:
#> wreg(Y = Y, Z = Z)
#>
#> n = 154 subjects with complete data
#> Comparable (win/loss) pairs: 9548/11781 = 81%
#>
#> Female AF Steatosis SSF2 LSN
#> -0.18956 -0.9660827 -0.02779146 -0.007926333 -0.1029914
Some basic information of the model is printed, like the number and percentage of comparable pairs used in the regression, as well as the regression coefficients (log-WR).
For more detailed inference results:
summary(obj)
#> Call:
#> wreg(Y = Y, Z = Z)
#>
#> n = 154 subjects with complete data
#> Comparable (win/loss) pairs: 9548/11781 = 81%
#>
#> Newton-Raphson algoritm converged in 7 iterations
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> Female -0.189560 0.8273 0.259988 -0.729 0.465934
#> AF -0.966083 0.3806 0.280313 -3.446 0.000568 ***
#> Steatosis -0.027791 0.9726 0.005281 -5.262 1.42e-07 ***
#> SSF2 -0.007926 0.9921 0.003953 -2.005 0.044953 *
#> LSN -0.102991 0.9021 0.125718 -0.819 0.412657
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> Female 0.82732 1.20872 0.49702 1.3771
#> AF 0.38057 2.62763 0.21970 0.6592
#> Steatosis 0.97259 1.02818 0.96258 0.9827
#> SSF2 0.99210 1.00796 0.98445 0.9998
#> LSN 0.90213 1.10848 0.70512 1.1542
#>
#> Overall Wald test = 79.129 on 5 df, p = 1.221245e-15
Advanced fibrosis status and percent of steatosis are strongly and significantly associated with the likelihood of non-alcoholic steato-hepatitis. In particular, AF is \(38.1\%\) times as likely to have favorable reader assessments as non-AF. Furthermore, one percentage-point increase in steatosis results in \(1-0.97259\doteq 2.7\%\) reduction in the likelihood of favorable assessments.
wreg()
with AF
as the only
covariate in Z
produces the same results as
wrtest()
does.fun
and compare the results with those
under the consensus order.