BayesVolcano

Katja Danielzik (katja.danielzik@uni-due.de)

Background

Bayesian models are essential for studying complex biological systems of interest (SOIs), such as irradiation effects on metabolite concentrations or vaccine impacts on immune gene repertoires. In these models layered parameters describe key biological features of the SOIs. After model fitting, each parameter is characterized by a posterior distribution: a probability distribution representing all plausible effect values given the observed data. With thousands of such posteriors in high-dimensional analyses, identifying large, reliable effects becomes challenging.

Traditional volcano plots address this for frequentist analyses by plotting fold-changes against –log(p-values). We introduce Bayesian volcano plots that instead visualize the posterior mean effect size of parameter \(i\) (\(\theta_i\)) against the probability, \(\pi_i\), where the \(\pi_i\) quantifies the posterior probability that \(\theta_i\) is not equal the null effect. This directly highlights both magnitude and biological relevance. While Sousa et al. (2020) conceptualized Bayesian volcano plots, our R package provides the first practical implementation for automated \(\pi_i\) calculation and visualization of complex biological effects.

Installation

if (!require("BayesVolcano", quietly = TRUE)) {
  install.packages("BayesVolcano")
}

Input data

To create a Bayesian volcano plot, BayesVolcano needs two main inputs:

  1. Posterior samples: A data frame where each column represents a biological effect (e.g., gene expression change) and each row shows posterior values.
  2. Annotation metadata: A data frame mapping parameter names to biological entities.

If you used rstan or brms to fit your model, our extract_fit() function automatically converts your results into the right format. Just install the corresponding package (rstan or brms) first, then run extract_fit(your_model,parameter_name) to prepare your data.

Posterior Samples

data("posterior")
head(posterior[, 1:4])
#>   delta_mu.1.2.3.4 delta_mu.2.2.3.4 delta_mu.3.2.3.4 delta_mu.4.2.3.4
#> 1        1.1527980       0.24442670       -0.5283300       -2.8424959
#> 2        1.5428293       0.41885565       -0.8167788        0.2726311
#> 3        1.4201598      -0.92882009       -0.6627982       -1.8188774
#> 4        0.3277584      -0.40256872        0.3988543       -0.7784557
#> 5        0.8694004      -0.06791806       -0.9230497        0.3559686
#> 6        0.2562036      -2.11318184        1.0385491       -1.1102383

Annotation Metadata

This data frame maps parameter names to biological entities with at least a column named “parameter” and a column named “label”. Additional columns can be provided for later visualization (categorical like “group” or numerical like “value”).

data("annotation_df")
head(annotation_df)
#>          parameter                              label group     value
#> 1 delta_mu.1.2.3.4 1-Aminocyclopropanecarboxylic acid     A  9.533826
#> 2 delta_mu.2.2.3.4                    2-Aminomuconate     B 12.031100
#> 3 delta_mu.3.2.3.4             2-Phosphoglyceric acid     A  3.567905
#> 4 delta_mu.4.2.3.4              3-Hydroxybutyric acid     B  8.407870
#> 5 delta_mu.5.2.3.4                       Acetoacetate     A  9.891394
#> 6 delta_mu.6.2.3.4                  Acetyl coenzyme A     B  9.806375

Workflow

Step 1: Prepare input data

The prepare_volcano_input() function takes as input the posterior and the annotation data frame. Run this now:

d <- prepare_volcano_input(posterior = posterior, annotation = annotation_df)

The output contains

str(d)
#> 'data.frame':    98 obs. of  11 variables:
#>  $ parameter       : chr  "delta_mu.1.2.3.4" "delta_mu.2.2.3.4" "delta_mu.3.2.3.4" "delta_mu.4.2.3.4" ...
#>  $ pi.value        : num  0.58 0.138 0.292 0.184 0.594 0.916 0.88 0.27 0.23 0.518 ...
#>  $ null.effect     : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ parameter.median: num  0.643 -0.156 -0.271 -0.213 -0.9 ...
#>  $ parameter.low   : num  -1.23 -2.06 -1.98 -2.11 -3.31 ...
#>  $ parameter.high  : num  2.59 1.96 1.34 1.77 1.5 ...
#>  $ CrI.width       : num  3.82 4.02 3.32 3.88 4.8 ...
#>  $ CrI.level       : num  0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
#>  $ label           : chr  "1-Aminocyclopropanecarboxylic acid" "2-Aminomuconate" "2-Phosphoglyceric acid" "3-Hydroxybutyric acid" ...
#>  $ group           : chr  "A" "B" "A" "B" ...
#>  $ value           : num  9.53 12.03 3.57 8.41 9.89 ...

Customizing Input Parameters

You can specify the null effect and credible interval level:

d <- prepare_volcano_input(
  posterior = posterior,
  annotation = annotation_df,
  null.effect = 0.5,
  CrI_level = 0.7
)

Step 2: Understanding \(\pi\)

The function prepare_volcano_input() uses the posterior of each parameter from the annotations (e.g., effect size \(\theta_i\) of parameter \(i\)) to calculate \(\pi_i\):

\(\pi_i = 2 \cdot \max\left(\int_{\theta_i = -\infty}^{\bar{\theta}} p(\theta_i)\mathrm{d}\theta_i, \int_{\theta_i = \bar{\theta}}^{\infty} p(\theta_i)\mathrm{d}\theta_i\right) - 1\)

Where \(\bar{\theta}\) is the null effect. This measures the probability that the effect is in the “direction” away from the null.

Important Note: A \(\pi\)-value near 0 doesn’t prove the absence of an effect - it indicates the posterior is widely distributed around the null.

Step 3: Creating the volcano plot

The plot_volcano() function creates a ggplot visual based on the formatted input:

plot_volcano(d)

This plot shows:

Where each point is a parameter with median effect size (x-axis) and \(\pi\) (y-axis). The null effect is shown by the vertical line.

Adding Credible Intervals

To visualize effect size uncertainty, add credible intervals as error bars:

plot_volcano(d,
  CrI = TRUE
)

This adds:

  • Grey error bars showing 95% credible intervals
  • Subtitle describing the interval level

Visualizing Uncertainty with Point Size

You can also represent credible interval width through point size:

plot_volcano(d,
  CrI = TRUE,
  CrI_width = TRUE
)

  • Wide intervals = Small points
  • Narrow intervals = Large points

Color-Coding Parameters

Use metadata columns for color-coding:

plot_volcano(d,
  color = "group"
)

plot_volcano(d,
  color = "value"
)

Combine with other visualizations:

plot_volcano(d,
  CrI = TRUE,
  CrI_width = TRUE,
  color = "group"
)

Step 4: Customizing Your Plot

Adding Titles and Labels

# Customization requires the ggplot2 package
if (!require("BayesVolcano", quietly = TRUE)) {
  install.packages("BayesVolcano")
}
library(ggplot2)

p <- plot_volcano(d)

p + xlab("my informative parameter") +
  ggtitle("My amazing plot")

Adding Labels

Use geom_text() for basic labeling or ggrepel to avoid overplotting:

p <- plot_volcano(d)
p + geom_text(aes(label = label))


# ggrepel version
# library(ggrepel)
# p +
# geom_text_repel(aes(label=label))

Conditional Labeling

Show labels only for reliable effects:

p <- plot_volcano(d)

p + geom_text(aes(label = ifelse(abs(parameter.median) > 1.6 & # only show for parameter value > 0.5
  pi > 0.95, # only show for pi > 0.95
label, # which variable contains the label
""
))) # do not display label if outside of set ranges

This highlights:

  • Parameters with large effect sizes (|median| > 1.6)
  • Parameters with strong evidence (\(\pi\) > 0.95)

References

Julie de Sousa, Ondřej Vencálek, Karel Hron, Jan Václavík, David Friedecký, Tomáš Adam, Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data, Analytica Chimica Acta, Volume 1097, 2020, Pages 49-61, ISSN 0003-2670, https://doi.org/10.1016/j.aca.2019.11.006

Corresponding GitHub Repository: https://github.com/sousaju/BayesVolcano

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_4.0.2      BayesVolcano_1.0.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.7.2        cli_3.6.5          knitr_1.51         rlang_1.1.7       
#>  [5] xfun_0.56          otel_0.2.0         purrr_1.2.1        generics_0.1.4    
#>  [9] S7_0.2.1           jsonlite_2.0.0     labeling_0.4.3     glue_1.8.0        
#> [13] htmltools_0.5.9    sass_0.4.10        scales_1.4.0       rmarkdown_2.30    
#> [17] grid_4.5.1         tibble_3.3.1       evaluate_1.0.5     jquerylib_0.1.4   
#> [21] fastmap_1.2.0      yaml_2.3.12        lifecycle_1.0.5    compiler_4.5.1    
#> [25] dplyr_1.2.0        HDInterval_0.2.4   RColorBrewer_1.1-3 pkgconfig_2.0.3   
#> [29] tidyr_1.3.2        rstudioapi_0.18.0  farver_2.1.2       digest_0.6.39     
#> [33] viridisLite_0.4.3  R6_2.6.1           tidyselect_1.2.1   pillar_1.11.1     
#> [37] magrittr_2.0.4     bslib_0.10.0       withr_3.0.2        tools_4.5.1       
#> [41] gtable_0.3.6       cachem_1.1.0