dataquieR (development
version)
New features
- improved support for categorical variables, including:
- time trends w/ and w/o grouping variable
- observer/device effects
- distribution plots
dq_report2()
can store results on the disk instead of
the RAM with the new argument storr_factory
. This can be
useful in reducing issues of memory consumption, but we suggest to use
fast SSD
s or NVMe
s
- all indicator functions now create result objects with nice print
functions (visible in the Data Viewer instead of the Console window).
However, this also implies, that warnings, errors and messages are
returned as part of the result object and are printed with that object.
If you want to restore the original behavior, use the option
options(dataquieR.dontwrapresults = TRUE)
. With
options(dataquieR.testdebug = TRUE)
, you can switch off
this behavior.
dataquieR
can provision your function arguments from
the metadata. In order to enable lapply
and
Vectorize(SIMPLIFY = FALSE)
with indicator functions, the
first argument is now always resp_vars
for item level
functions. dataquieR
tries to guess if a function that
features both resp_vars
and study_data
as its
first arguments was called w/o resp_vars
but only with
study_data
as its first unnamed argument. If that is the
case, it sets resp_vars
to the default for
resp_vars
(typically all variables). With
options(dataquieR.testdebug = TRUE)
, you can switch off
this behavior, if you need.
- an improved version of
dq_report_by
, in which it is
possible to specify:
- how to split the data in parts (strata and/or variable groups)
- strata and/or variable groups to include/exclude
- how to filter observational units
- a selection of variables to analyze (
resp_vars
)
- a variable/s containing ID information for merging data frames
(
id_vars
)
- a new function
int_encoding_errors
checking invalid
characters present in the text with respect to the expected character
encoding / code page, e.g., a code place in the latin1
table is used but the encoding is utf8
resulting in damaged
text output
- a new dashboard in the General menu, in
Item-level data quality dashboard
, usable to customize data
summaries
- new selection buttons are now present in the report to select
visible columns in the displayed tables (it also applied to the export
buttons)
- support for a sheet
CODE_LIST_TABLE
in the metadata,
where it is possible to state both value label tables and missing list
tables all in one table.
- support for a sheet
item_computation_level
in the
metadata, where it is possible to state variables to be computed from
the provided study data.
Breaking changes
- moved example data from the package to our website. If you are
already using
prep_get_data_frame("ship")
or
prep_get_data_frame("study_data")
in your code to access
example data, no change is needed. If you are still accessing example
data using system.file()
(e.g. using
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
),
you need to switch to prep_get_data_frame()
, i.e.:
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
would become
study_data <- prep_get_data_frame("study_data")
- changes in the output names:
- renamed
SummaryData
in ResultData
(functions: acc_shape_or_scale
, acc_margins
,
com_segment_missingness
)
- removed column
GRADING
from SummaryData
outputs. SummaryTable
outputs still feature the column,
since these are meant to be a machine readable interface
con_contradictions_redcap
used to return a result named
SummaryTable
, while the documentation spoke about
SummaryData
. Alas, it should have been
VariableGroupTable
in both cases. If you relied on
SummaryTable
in the results of
con_contradictions_redcap
, you need to change your code to
use now the correct output name VariableGroupTable
. Also,
the table has been slightly modified.
VariableGroupData
as returned by
con_contradictions_redcap
is a version optimized for human
readers.
- in
VariableGroupTable
as returned by
con_contradictions_redcap
the column category
has been renamed to CONTRADICTION_TYPE
- in
con_contradictions_redcap
, if
summarize_categories
is selected the result will now be in
a sub-list named Other
- in
prep_add_computed_variables
, the column
resp_vars
is now named VAR_NAMES
, to be more
in line with other data frames.
Reporting
- improved button to export Excel, pdf, and print (colors
supported)
- improved rendering time introducing thumbnails as first visible
result in the report. Clicking on the image, the thumbnail is replaced
by
plotly
’s interactive figures
- implementation of
[.dataquieR_resultset2
and
[[.dataquieR_result
and related functions have changed
slightly. You can now for a report
(r <- dq_report2(...)
) call, e.g.,
r[, "com_item_missingness", "ReportSummaryTable"]
to get a
balloon plot or r[, "com_item_missingness", "SummaryData"]
to get a table, for all variables that were assessed with
com_item_missingness()
in the report r
- if you print a list of
dataquieR_result
objects, these
will be combined, but due to restrictions in R
, this only
works, if you call print()
explicitly on this list, not
with “auto-printing” (see https://stackoverflow.com/a/53983005), for
example:
a <- lapply(c("v00001", "v00004", "v00005", "v00006"), acc_loess, meta_data_v2 = "meta_data_v2", study_data = "study_data")
print(a)
works, but typing a
alone does not.
You have to call print()
or to put lapply()
in
brackets: (lapply())
acc_distributions()
was split in
acc_distributions()
and
acc_distributions_ecdf()
(prep_acc_distributions_with_ecdf()
creates the original
plot)
- there is a new function
acc_cat_distributions()
- all functions now feature:
- a
meta_data_v2
argument
- new argument
item_level
, as synonyms for
meta_data
, new argument segment_level
, as
synonyms for meta_data_segment
, new argument
dataframe_level
, as synonyms for
meta_data_dataframe
, new argument
cross-item_level
, as synonyms for
meta_data_cross_item
, new argument
item_computation_level
, as synonyms for
meta_data_item_computation
- if you call functions without
label_col
, the
label_col
will now default to LABEL
, except
you set the option options(dataquieR.testdebug = TRUE)
or
options(dataquieR.dontwrapresults = TRUE)
- the argument
resp_vars
in
prep_scalelevel_from_data_and_metadata()
was never working
correctly and not used neither, so it has been deprecated. It is already
not functional and it never was
- the function
des_summary
is still present, but you can
now get results for continuous or categorical variables only, using
des_summary_continuous
and
des_summary_categorical
respectively
con_contradictions_redcap
plot colors vary depending on
CONTRADICTION_TYPES
acc_loess()
uses lowess
instead of
loess
(both from the stats
package)
General
- test coverage increased, again
- fixed bug in
prep_check_for_dataquieR_updates()
, so,
maybe, you need to manually install the latest beta release using
devtools::install_gitlab("libreumg/dataquieR", auth_token = NULL)
- figure sizes have been overworked in the default report
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")
is now the default assuming a one-fits-all-metadata-file (see
? dataquieR.ELEMENT_MISSMATCH_CHECKTYPE
)
- fewer custom implementations of stuff available from
rlang
or withr
, most prominently a faster
prep_prepare_dataframes()
and rlang
compatible
condition (error) handling.
- small changes in the behavior of the
dataquieR_result
class, which is now applied also to results outside a pipeline.
- many small fixes to figures
- small fixes to menu titles
- bug fixes
dataquieR 2.1.0
- renamed metadata column
SEGMENT_ID_TABLE
to
SEGMENT_ID_REF_TABLE
in segment level metadata
- scale level metadata support and heuristics
- significantly improved data quality summaries
- consolidated some of the indicator functions (limits, work in
progress)
- many minor optimizing changes
- figure sizing (work in progress), also resize handles
- improved report files structure
- improved
dq_report_by
files structure
- fixes, e.g., in rules, fixes in label shortening, computation speed
and cache pre-filling-control
- improved Excel export from the
HTML
reports
- missing codes: column
CODE_INTERPRET
changed to be in
line with the AAPOR
definitions, so the following
translation: PP -> P; P -> I; OH -> UO
- fixed tests
- updated concept excerpt
- excluded nominal and ordinal variables from marginal means
analysis
- improved data type handling
dataquieR 2.0.1
Reporting
- New functions
prep_save_report
and
prep_load_report
- Update and simplification of summary overview, empty columns/rows
omitted from the matrices. Also, better classification of errors
- Many small updates in the usability of the report
- Fixes in
HTML/JS
output for Firefox
- Bug fixes of report outputs that were not looking as expected (in
contradiction checks and limit violations)
- Fixed mixed distribution plots called several times
- Enable auto-resizing of
plot.ly
-plots
- Fixed rendering problems for the new, automatically size-reduced
plots causing the report rendering to fail if having
gginnards
installed; removed dependency from
gginnards
.
- Do not show superfluous axis labels (e.g., variables, if variable
names are on an axis because these usually overlap without improving the
output)
- Prevent a warning of
robustbase
about
doScale
- Less noisy display of conditions (e.g., warnings, errors, messages)
with the results in
dq_report2
reports
summarytools
are included in dq_report2
reports, if installed.
- New report rendering code polished, parallel execution of
HTML
generation prepared
- New parallel mode for
dq_report2
using a queue improves
speed
- Full support for
VARIABLE_ROLES
in
dq_report2
and suppressing helper variable outputs in
dq_report_by
- Do not show conditions (e.g., warnings, messages, errors) in reports
if they address the call of the function (e.g., “using default for
argument…”) by
dq_report2
and not directly by the user
- No unit-missingness in
dq_report2
because it is not so
useful in its current implementation
- More robust
dq_report_by
for large reports (can write
and optionally render results to disk rather than returning them)
- Bug fix in
dq_report_by
causing
DATA_PROCESS
not to work
- Fixed some errors and
TODO
’s in
dq_report_by
and add dependent variables on the fly but
with VARIABLE_ROLE
suppress:
- If no role is given, add “primary” by default for single reports as
well as for
dq_report_by
- Support meta_data_v2 in
dq_report_by
- FIXED: referred variables did not correctly resolve co_vars and
labels instead of variable names
- Several bug fixes:
- Addressed most parts of
https://gitlab.com/libreumg/dataquier/-/issues/242
- Addressed https://gitlab.com/libreumg/dataquier/-/issues/244 and
https://gitlab.com/libreumg/dataquier/-/issues/212
- Default for result-slot-filter was not set
(
filter_result_slots
in dq_report2
)
- Sometimes, long labels in the first columns of a
JS
-table prevented controlling the table
- Fixed missed check for missing cross-item level metadata and earlier
check for valid item-level metadata
- Control crude segment missingness output, so that we see it only if
there is more than one segment on the item-level after the removal of
VARIABLE_ROLES
filtered items
- Outliers should work with empty metadata in
UNIVARIATE_OUTLIER_CHECKTYPE
and
MULTIVARIATE_OUTLIER_CHECKTYPE
- Fixed successive dates to ignore empty dates
- New functions in
REDCap
syntax:
strictly_successive_dates
and
successive_dates
- Bug fixes for
REDCap
rules and NA
handling
and DATA_PROCESS
.
- Checked, that code is in line with
https://gitlab.com/libreumg/dataquier/-/issues/243#note_1419465360
- Default for contradictions with the new syntax is now that hard
limits and missing codes are not removed. The argument
use_value_labels
is not supported anymore. You can specify
the behavior on the rules level in the new cross-item-level metadata
column DATA_PREPARATION
- Compute end digit preferences only if explicitly requested by a new
item-level metadata column
END_DIGIT_CHECK
in
dq_report2
, (DATA_ENTRY_TYPE
is still
supported and auto-converted). If missing, END_DIGIT_CHECK
defaults to FALSE
- Bug fix: Contradiction rules failed in specific cases if
NA
were in the data
- Bug fix: cross-item_level normalization crashed, causing rules to
fail, e.g.,
JUMP_LIST
could be added to the item-level
metadata if missing, but causing this type of failing rules
- Bug fixes for
Windows
and uncommon variable names
General
- Workbooks can now be loaded from the internet (using
prep_load_workbook_like_file
and
meta_data_v2 =
formal in dq_report2
)
supporting http
and https
URLs (e.g.,
Excel
or OpenOffice
workbooks)
- Documentation updates
dataquieR 2.0.0
dq_report2
replaces dq_report
. Please use
dq_report2
from now on.
- Full new reporting engine (needs
htmtools
and supports
plotly
)
- Better report layout and improved functionality
- Support for reading and referring to data in files/URLs
- Support for the integrity dimension in data quality report
- Included distribution and multivariate outlier (provide cross-item
level metadata for the latter) plots in data quality report
- Metadata scheme update (segment,
data.frame
, and
cross-item levels). No required action by user, previous version still
supported
REDCap
rules for contradictions (cross-item level
metadata), previous contradictions function still supported
- Support metadata describing segment data and study data tables
(segment and
data.frame
-level metadata)
- New item-level metadata version (backwards compatible)
- Support for computation of qualified missingness based on labels
from the
AAPOR
concept
acc_univariate_outlier
and
acc_multivariate_outlier
now allow selecting the methods
used to flag outliers
- Included distributional checks in the accuracy dimension for
location and proportion
- Rotation of plots can now be controlled
- Improved many figures
- Better control over warnings
- If
whoami
is installed, reports now show a more
suitable user name
- Many minor improvements
- Updated citations
dataquieR 1.0.13
- fixed a left-over
~
from the ggplot2
updates causing acc_margins
to fail for categorical
variables
dataquieR 1.0.12
- Addressed a problem with the markdown template underlying the
dq_report
reports with wrong brackets
- Addressed deprecations from
ggplot2 3.4.0
- Added
ORCIDs
for two authors
- Updated the
CITATION
file
- Updated the
README.md
file adding the funding
sources.
dataquieR 1.0.11
- Addressed a problem with some test platforms
- Added funding agencies in the manual
dataquieR 1.0.10
- Fixed
NEWS.md
file
- Fixed documentation
dataquieR 1.0.9
- Fixed bug in
sigmagap
and made missing guessing more
robust.
- Fixed checks on missing code detection failing for
logical
.
- Fixed a damaged check for numeric threshold values in
acc_margins
.
- Fixed wrongly named
GRADING
columns.
- Improved parallel execution by automatic detection of cores.
- Tidy html dependency
dataquieR 1.0.8
- Removed formal arguments from
rbind.ReportSummaryTable
since these are not needed anyways and the inherited documentation for
those arguments rbind
from base
contains an
invalid URL triggering a NOTE
.
dataquieR 1.0.7
- Fixed bugs in example metadata.
- Figures now have size hints as attributes.
- Added simple type conversion check indicator function of dimension
integrity,
int_datatype_matrix
.
- Corrected some error classifications
prep_study2meta
can now also convert factors to
dataquieR
compatible
meta_data
/study_data
- Slightly improved documentation.
- Bug fix in
com_item_missingness
for textual response
variables.
- Added new output slot with heat-map like tables. Implemented some
generics for those.
dataquieR 1.0.6
- Robustness: Ensure
DT JS
is always loaded when a
dq_report
report is rendered
- Bug fix: More robust handling of DECIMALS variable attribute, if
this is delivered as a character.
- Bug Fix:
com_segment_missingness
with
strata_vars
/ group_vars
did not work
- Bug Fix: If
label_col
was set to something else than
LABEL
, strata_vars
did not work for
com_unit_missingness
- More precise documentation.
- Fixed a bug in a utility function for the univariate outliers
indicator function, which caused many data points flagged as outliers by
the sigma- gap criterion.
- Made outlier function aware of too many non-outlier points causing
too complex graphics (e.g. pdf rendering crashes the PDF reader).
- Fixes and small improvements in
dq_report
.
- Switched from
cowplot
to patchwork
in
acc_margins
yielding figures that can be easier
manipulated. Please note, that this change could break existing output
manipulations, since the structure of the margins plots has changed
internally. However, output manipulations were hardly possible for
margins plots before, so it is unlikely, that there are pipelines
affected.
- More control about the output of the
acc_loess
function.
- More robust
prep_create_meta
handling length-0
arguments by ignoring these variable attributes at all.
- Added a classification system for warnings and error messages to
distinguish errors based on mismatching variables for a function from
other error messages.
- JOSS
- Some tidy up and more tests.
dataquieR 1.0.5
- Fixed two bugs in
con_inadmissible_categorical
(one
resp_var
only and value-limits all the same for all
resp_vars
)
- Changed LICENSE to BSD-2
- Slightly updated documentation
- Updated
README
-File
dataquieR 1.0.4
- Fixed CITATION, a broken reference in Rd and a problem with the
vignette on
pandoc
-less systems
- Improved an inaccurate argument description for multivariate
outliers
- Fixed a problem with error messages, if a
dataquieR
function was called by a generated function f
that lives in
an environment directly inheriting from the empty environment, e.g.
environment(f) <- new.env(parent = emptyenv())
.
- Marked some examples as
dontrun
, because they sometimes
caused NOTE
s on rhub
.
dataquieR 1.0.3
- Addressed all comments by the CRAN reviewers, thank you.
dataquieR 1.0.2
- Bug Fix: If an empty data frame was delivered in the
SummaryTable
entry of a result within a
dq_report
output, the summary
and also
print
generic did not work on the report.
dataquieR 1.0.1
- Skipping some of the slower tests on CRAN now. On my local system, a
full
devtools::check(cran = TRUE, env_vars = c(NOT_CRAN = "false"))
takes 2:22 minutes now.
dataquieR 1.0.0
- Initial CRAN release candidate