RAthena is dependent on data.table to read
data into R. This is down to the amazing speed
data.table offers when reading files into R.
However a new package, with equally impressive read speeds, has come
onto the scene called vroom. As
vroom has been designed to only read data into
R, similarly to readr, data.table
is still used for all of the heavy lifting. However if a user wishes to
use vroom as the file parser, RAthena_options
function has been created to enable this:
library(DBI)
library(RAthena)
con = dbConnect(athena())
RAthena_options(file_parser = c("data.table", "vroom"))By setting the file_parser to "vroom" then
the backend will change to allow vroom’s file parser to be
used instead of data.table.
data.tableTo go back to using data.table as the file parser it is
a simple as calling the RAthena_options function:
This makes it very flexible to swap between each file parser even between each query execution:
library(DBI)
library(RAthena)
con = dbConnect(athena())
# upload data
dbWriteTable(con, "iris", iris)
# use default data.table file parser
df1 = dbGetQuery(con, "select * from iris")
# use vroom as file parser
RAthena_options("vroom")
df2 = dbGetQuery(con, "select * from iris")
# return back to data.table file parser
RAthena_options()
df3 = dbGetQuery(con, "select * from iris")vroom?If you aren’t sure whether to use vroom over
data.table, I draw your attention to vroom
boasting a whopping 1.40GB/sec throughput.
Statistics taken from vroom’s github readme
| package | version | time (sec) | speed-up | throughput |
|---|---|---|---|---|
| vroom | 1.1.0 | 1.14 | 58.44 | 1.40 GB/sec |
| data.table | 1.12.8 | 11.88 | 5.62 | 134.13 MB/sec |
| readr | 1.3.1 | 29.02 | 2.30 | 54.92 MB/sec |
| read.delim | 3.6.2 | 66.74 | 1.00 | 23.88 MB/sec |