Note: This vignette presents some performance tests ran between
non-parallel and parallel versions of fundiversity
functions. Note that to avoid the dependency on other packages, this
vignette is pre-computed.
Within fundiversity
the computation of most indices can
be parallelized using the future
package. The indices that
currently support parallelization are: FRic,
FDis, FDiv, and FEve.
The goal of this vignette is to explain how to toggle and use
parallelization in fundiversity
.
The future
package provides a simple and general
framework to allow asynchronous computation depending on the resources
available for the user. The first vignette of
future
gives a general overview of all its features.
The main idea being that the user should write the code once and that it
would run seamlessly sequentially, or in parallel on a single computer,
or on a cluster, or distributed over several computers.
fundiversity
can thus run on all these different backends
following the user’s choice.
library("fundiversity")
data("traits_birds", package = "fundiversity")
data("site_sp_birds", package = "fundiversity")
By default the fundiversity
code will run sequentially
on a single core. To trigger parallelization the user needs to define a
future::plan()
object with a parallel backend such as
future::multisession
to split the execution across multiple
R sessions.
# Sequential execution
<- fd_fric(traits_birds)
fric1
# Parallel execution
::plan(future::multisession) # Plan definition
future<- fd_fric(traits_birds) # The code resolve in similar fashion
fric2
identical(fric1, fric2)
#> [1] TRUE
Within the future::multisession
backend you can specify
the number of cores on which the function should be parallelized over
through the argument workers
, you can change it in the
future::plan()
call:
::plan(future::multisession, workers = 2) # Only 2 cores are used
future<- fd_fric(traits_birds)
fric3
identical(fric3, fric2)
#> [1] TRUE
To learn more about the different backends available and the related
arguments needed, please refer to the documentation of
future::plan()
and the overview vignette of
future
.
We can now compare the difference in performance to see the performance gain thanks to parallelization:
::plan(future::sequential)
future<- microbenchmark::microbenchmark(
non_parallel_bench non_parallel = {
fd_fric(traits_birds)
},times = 20
)
::plan(future::multisession)
future<- microbenchmark::microbenchmark(
parallel_bench parallel = {
fd_fric(traits_birds)
},times = 20
)
rbind(non_parallel_bench, parallel_bench)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> non_parallel 8.756378 8.892243 9.841818 9.072241 9.218554 23.9519 20 a
#> parallel 56.374332 167.680385 218.073077 172.888927 185.670312 1247.8534 20 b
The non parallelized code runs faster than the parallelized one!
Indeed, the parallelization in fundiversity
parallelize the
computation across different sites. So parallelization should be used
when you have many sites on which you want to compute similar
indices.
# Function to make a bigger site-sp dataset
<- function(n) {
make_more_sites <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE))
site_sp rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp)))
site_sp }
For example with a dataset 5000 times bigger:
<- make_more_sites(5000)
bigger_site
::microbenchmark(
microbenchmarkseq = {
::plan(future::sequential)
futurefd_fric(traits_birds, bigger_site)
},multisession = {
::plan(future::multisession, workers = 4)
futurefd_fric(traits_birds, bigger_site)
},multicore = {
::plan(future::multicore, workers = 4)
futurefd_fric(traits_birds, bigger_site)
times = 20
},
)#> Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio
#> because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R
#> sessions, see ?parallelly::supportsMulticore
#> Unit: seconds
#> expr min lq mean median uq max neval cld
#> seq 15.58688 15.67587 15.97552 15.97047 16.24568 16.54392 20 a
#> multisession 21.17851 21.75313 22.02965 21.88691 22.26971 23.50062 20 b
#> multicore 15.53872 15.75567 16.06103 16.01595 16.35790 16.98102 20 a
#> seconds needed to generate this document: 1095.27 sec elapsed
#> ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os Ubuntu 20.04.5 LTS
#> system x86_64, linux-gnu
#> ui RStudio
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2022-11-15
#> rstudio 2022.02.0+443 Prairie Trillium (server)
#> pandoc 2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown)
#>
#> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> ! package * version date (UTC) lib source
#> P abind 1.4-5 2016-07-21 [3] CRAN (R 4.2.0)
#> assertthat 0.2.1 2019-03-21 [3] CRAN (R 4.1.3)
#> cachem 1.0.6 2021-08-19 [3] CRAN (R 4.1.3)
#> VP cli 3.4.0 2022-09-23 [?] CRAN (R 4.2.1) (on disk 3.4.1)
#> codetools 0.2-18 2020-11-04 [5] CRAN (R 4.0.3)
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
#> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
#> DBI 1.1.2 2021-12-20 [3] CRAN (R 4.1.3)
#> P digest 0.6.29 2021-12-01 [3] CRAN (R 4.2.0)
#> dplyr * 1.0.10 2022-09-01 [1] CRAN (R 4.2.1)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.18 2022-11-07 [1] CRAN (R 4.2.1)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
#> P fastmap 1.1.0 2021-01-25 [3] CRAN (R 4.2.1)
#> fundiversity * 0.2.1.9000 2022-04-12 [3] Github (bisaloo/fundiversity@87652ba)
#> VP future 1.26.1 2022-09-02 [3] CRAN (R 4.2.1) (on disk 1.28.0)
#> VP future.apply 1.9.0 2022-11-05 [3] CRAN (R 4.2.1) (on disk 1.10.0)
#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)
#> P geometry 0.4.6 2022-04-18 [3] CRAN (R 4.2.0)
#> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)
#> VP globals 0.15.0 2022-08-28 [3] CRAN (R 4.2.1) (on disk 0.16.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)
#> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1)
#> knitr 1.40 2022-08-24 [1] CRAN (R 4.2.1)
#> lattice 0.20-45 2021-09-22 [3] CRAN (R 4.1.3)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
#> P listenv 0.8.0 2019-12-05 [3] CRAN (R 4.2.1)
#> P magic 1.6-0 2022-02-09 [3] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> MASS 7.3-58.1 2022-08-03 [3] CRAN (R 4.2.1)
#> Matrix 1.4-1 2022-03-23 [3] CRAN (R 4.1.3)
#> memoise 2.0.1 2021-11-26 [3] CRAN (R 4.1.3)
#> microbenchmark 1.4.9 2021-11-09 [3] CRAN (R 4.1.3)
#> multcomp 1.4-19 2022-04-26 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.0)
#> VP parallelly 1.31.1 2022-07-21 [3] CRAN (R 4.2.1) (on disk 1.32.1)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> P Rcpp 1.0.8.3 2022-03-17 [3] CRAN (R 4.2.0)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
#> rmarkdown 2.13 2022-03-10 [3] CRAN (R 4.1.3)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
#> sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.2.0)
#> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [3] CRAN (R 4.1.3)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
#> survival 3.3-1 2022-03-03 [3] CRAN (R 4.1.3)
#> TH.data 1.1-1 2022-04-26 [1] CRAN (R 4.2.0)
#> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
#> tictoc 1.0.1 2021-04-19 [3] CRAN (R 4.1.3)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
#> vctrs 0.5.0 2022-10-22 [1] CRAN (R 4.2.1)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.34 2022-10-18 [1] CRAN (R 4.2.1)
#> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.1)
#> zoo 1.8-10 2022-04-15 [1] CRAN (R 4.2.0)
#>
#> [1] /home/ke76dimu/R-library/4.2
#> [2] /usr/local/lib/R/site-library
#> [3] /data/library/4.1
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#>
#> V ── Loaded and on-disk version mismatch.
#> P ── Loaded and on-disk path mismatch.
#>
#> ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────