The {maldipickr} package helps microbiologists reduce
duplicate/clonal bacteria from their cultures and eventually exclude
previously selected bacteria. {maldipickr} achieve this
feat by grouping together data from MALDI Biotyper and helps choose
representative bacteria from each group using user-relevant metadata – a
process known as cherry-picking.
{maldipickr} cherry-picks bacterial isolates with MALDI
Biotyper:
First make sure {maldipickr} is installed and loaded,
alternatively follow
the instructions to install the package.
Cherry-picking four isolates based on their taxonomic identification
by the MALDI Biotyper is done in a few steps with
{maldipickr}.
We import an example Biotyper CSV report and glimpse at the table.
report_tbl <- read_biotyper_report(
  system.file("biotyper_unknown.csv", package = "maldipickr")
)
report_tbl %>%
  dplyr::select(name, bruker_species, bruker_log) %>% knitr::kable()| name | bruker_species | bruker_log | 
|---|---|---|
| unknown_isolate_1 | not reliable identification | 1.33 | 
| unknown_isolate_2 | not reliable identification | 1.40 | 
| unknown_isolate_3 | Faecalibacterium prausnitzii | 1.96 | 
| unknown_isolate_4 | Faecalibacterium prausnitzii | 2.07 | 
Delineate clusters from the identifications after filtering the reliable ones and cherry-pick one representative spectra.
Unreliable identifications based on the log-score are replaced by “not reliable identification”, but stay tuned as they do not represent the same isolates!
report_tbl <- report_tbl %>%
  dplyr::mutate(
      bruker_species = dplyr::if_else(bruker_log >= 2, bruker_species,
                                      "not reliable identification")
  )
knitr::kable(report_tbl)| name | sample_name | hit_rank | bruker_quality | bruker_species | bruker_taxid | bruker_hash | bruker_log | 
|---|---|---|---|---|---|---|---|
| unknown_isolate_1 | NA | 1 | - | not reliable identification | NA | 3e920566-2734-43dd-85d0-66cf23a2d6ef | 1.33 | 
| unknown_isolate_2 | NA | 1 | - | not reliable identification | NA | 88a85875-eeb5-4858-966e-98a077325dc3 | 1.40 | 
| unknown_isolate_3 | NA | 1 | + | not reliable identification | 137408536 | 2d266f20-5428-428d-96ec-ddd40200794b | 1.96 | 
| unknown_isolate_4 | NA | 1 | +++ | Faecalibacterium prausnitzii | 137408536 | 2d266f20-5428-428d-96ec-ddd40200794b | 2.07 | 
The chosen ones are indicated by to_pick column.
report_tbl %>%
  delineate_with_identification() %>%
  pick_spectra(report_tbl, criteria_column = "bruker_log") %>%
  dplyr::relocate(name, to_pick, bruker_species) %>% 
  knitr::kable()
#> Generating clusters from single report| name | to_pick | bruker_species | membership | cluster_size | sample_name | hit_rank | bruker_quality | bruker_taxid | bruker_hash | bruker_log | 
|---|---|---|---|---|---|---|---|---|---|---|
| unknown_isolate_1 | TRUE | not reliable identification | 2 | 1 | NA | 1 | - | NA | 3e920566-2734-43dd-85d0-66cf23a2d6ef | 1.33 | 
| unknown_isolate_2 | TRUE | not reliable identification | 3 | 1 | NA | 1 | - | NA | 88a85875-eeb5-4858-966e-98a077325dc3 | 1.40 | 
| unknown_isolate_3 | TRUE | not reliable identification | 4 | 1 | NA | 1 | + | 137408536 | 2d266f20-5428-428d-96ec-ddd40200794b | 1.96 | 
| unknown_isolate_4 | TRUE | Faecalibacterium prausnitzii | 1 | 1 | NA | 1 | +++ | 137408536 | 2d266f20-5428-428d-96ec-ddd40200794b | 2.07 | 
In parallel to taxonomic identification reports,
{maldipickr} process spectra data. Make sure
{maldipickr} is installed and loaded, alternatively follow
the instructions to install the package.
Cherry-picking six isolates from three species based on their spectra
data obtained from the MALDI Biotyper is done in a few steps with
{maldipickr}.
We set up the directory location of our example spectra data, but
adjust for your requirements. We import and process the spectra which
gives us a named list of three objects: spectra, peaks and metadata
(more details in Value section of process_spectra()).
Delineate spectra clusters using Cosine similarity and cherry-pick
one representative spectra. The chosen ones are indicated by
to_pick column.
processed %>%
  list() %>%
  merge_processed_spectra() %>%
  coop::tcosine() %>%
  delineate_with_similarity(threshold = 0.92) %>%
  set_reference_spectra(processed$metadata) %>%
  pick_spectra() %>%
  dplyr::relocate(name, to_pick) %>% 
  knitr::kable()| name | to_pick | membership | cluster_size | SNR | peaks | is_reference | 
|---|---|---|---|---|---|---|
| species1_G2 | FALSE | 1 | 4 | 5.089590 | 21 | FALSE | 
| species2_E11 | FALSE | 2 | 2 | 5.543735 | 22 | FALSE | 
| species2_E12 | TRUE | 2 | 2 | 5.633540 | 23 | TRUE | 
| species3_F7 | FALSE | 1 | 4 | 4.889949 | 26 | FALSE | 
| species3_F8 | TRUE | 1 | 4 | 5.558884 | 25 | TRUE | 
| species3_F9 | FALSE | 1 | 4 | 5.398429 | 25 | FALSE | 
This provides only a brief overview of the features of
{maldipickr}, browse the other vignettes to learn more
about additional features.
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] maldipickr_1.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.4              cli_3.6.1                knitr_1.48              
#>  [4] rlang_1.1.4              xfun_0.44                coop_0.6-3              
#>  [7] purrr_1.0.2              generics_0.1.3           jsonlite_1.8.7          
#> [10] glue_1.6.2               htmltools_0.5.6.1        sass_0.4.7              
#> [13] fansi_1.0.5              rmarkdown_2.28           tibble_3.2.1            
#> [16] evaluate_0.22            jquerylib_0.1.4          fastmap_1.1.1           
#> [19] yaml_2.3.7               lifecycle_1.0.4          compiler_4.3.1          
#> [22] dplyr_1.1.4              pkgconfig_2.0.3          tidyr_1.3.0             
#> [25] readBrukerFlexData_1.9.1 rstudioapi_0.15.0        digest_0.6.33           
#> [28] R6_2.5.1                 tidyselect_1.2.1         utf8_1.2.3              
#> [31] pillar_1.9.0             parallel_4.3.1           magrittr_2.0.3          
#> [34] bslib_0.5.1              withr_2.5.1              tools_4.3.1             
#> [37] MALDIquant_1.22.1        cachem_1.0.8