Basic R-Usage Guide for rMZQC

This vignette serves as a quickstart guide for R users to create and save an mzQC document.

Target Audience: R users

Read an mzQC file and extract some data

library(rmzqc)
data = readMZQC(system.file("./testdata/test.mzQC", package = "rmzqc", mustWork = TRUE))
cat("This file has ", length(data$runQualities), " runqualities\n")
## This file has  1  runqualities
cat("  - file: ", data$runQualities[[1]]$metadata$inputFiles[[1]]$name, "\n")
##   - file:  special.raw
cat("  - # of metrics: ", length(data$runQualities[[1]]$qualityMetrics), "\n")
##   - # of metrics:  1
cat("    - metric #1 name: ", data$runQualities[[1]]$qualityMetrics[[1]]$name, "\n")
##     - metric #1 name:  number of MS1 spectra
cat("    - metric #1 value: ", data$runQualities[[1]]$qualityMetrics[[1]]$value, "\n")
##     - metric #1 value:  13405

Hint: if you receive an error such as

Error in parse_con(txt, bigint_as_char) : 
lexical error: invalid char in json text.
cursor_int": [ NaN,NaN,NaN,NaN,825282.0,308263
(right here) ------^

when callingreadMZQC this indicates that the mzQC is not valid JSON, since NaN values should be quoted ("NaN") or replaced by null (unquoted), depending on context. In short: null may become an NA in R if part of an array, see https://github.com/jeroen/jsonlite/issues/70#issuecomment-431433773.

Create a minimal mzQC document

library(rmzqc)
## we need a proper URI (i.e. no backslashes and a scheme, e.g. 'file:')
## otherwise writing will fail
raw_file = localFileToURI("c:\\data\\special.raw", FALSE)

file_format = getCVTemplate(accession = filenameToCV(raw_file))
## Downloading obo from 'https://github.com/HUPO-PSI/psi-ms-CV/releases/download/v4.1.194/psi-ms.obo' ...
ptxqc_software = toAnalysisSoftware(id = "MS:1003162", version = "1.0.13") ## you could use 'version = packageVersion("PTXQC")' to automate further
run1_qc = MzQCrunQuality$new(metadata = MzQCmetadata$new(label = raw_file,
                         inputFiles = list(MzQCinputFile$new(basename(raw_file),
                                                             raw_file,
                                                             file_format)),
                         analysisSoftware = list(ptxqc_software)),
                         qualityMetrics = list(toQCMetric(id = "MS:4000059", value = 13405), ## number of MS1 scans
                                               toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3, "UO:0000191" = c(0.1, 0.8, 0.1))) # MS2 known precursor charges fractions
                                               )
                         )

mzQC_document = MzQCmzQC$new(version = "1.0.0", 
                             creationDate = MzQCDateTime$new(), 
                             contactName = Sys.info()["user"], 
                             contactAddress = "test@user.info", 
                             description = "A minimal mzQC test document with bogus data",
                             runQualities = list(run1_qc),
                             setQualities = list(), 
                             controlledVocabularies = list(getCVInfo()))

## write it out
mzqc_filename = paste0(getwd(), "/test.mzQC")
writeMZQC(mzqc_filename, mzQC_document)
cat(mzqc_filename, "written to disk!\n")
## C:/Users/bielow/AppData/Local/Temp/RtmpCWj8M7/Rbuild890019086f21/rmzqc/vignettes/test.mzQC written to disk!
## read it again
mq = readMZQC(mzqc_filename)

## print some basic stats
gettextf("This mzQC was created on %s and has %d quality metric(s) in total.", dQuote(mq$creationDate$datetime), length(mq$runQualities) + length(mq$setQualities))
## [1] "This mzQC was created on \"2025-06-03T00:00:00Z\" and has 1 quality metric(s) in total."

Different Value Types (single value, n-tuple, table, matrix) for QualityMetric’s

mzQC allows for 4 different kinds of value types for a qualityMetric: - Single value - n-tuple - table - matrix

See toQCMetric() for more examples.

Below we will create an example for each of them, so you know what R datastructures to employ. Using a data.frame, instead of a list, will give you the wrong JSON formatting and leads to validation failures.

Single Value

A single value (MS:4000003) is the easiest. Just assign a string or a number to the value attribute:

qualityMetric_singleValue = toQCMetric(id = "MS:4000059", value = 13405) ## number of MS1 scans

## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_singleValue, pretty = TRUE, auto_unbox = TRUE)
## {
##   "accession": "MS:4000059",
##   "name": "number of MS1 spectra",
##   "description": "\"The number of MS1 events in the run.\" [PSI:MS]",
##   "value": 13405
## }

n-tuple

An n-tuple (MS:4000004) is just a vector in R:

qualityMetric_tuple = toQCMetric(id = "MS:4000061", value = c(0.45, 0.76, 0.23)) ## MS1 density quantiles

## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_tuple, pretty = TRUE, auto_unbox = TRUE)
## {
##   "accession": "MS:4000061",
##   "name": "MS1 density quantiles",
##   "description": "\"The first to n-th quantile of MS1 peak density (scan peak counts). A value triplet represents the original QuaMeter metrics, the quartiles of MS1 density. The number of values in the tuple implies the quantile mode.\" [PSI:MS]",
##   "value": [0.45, 0.76, 0.23]
## }

Table

An table (MS:4000005) is a list (not a data.frame!) of columns.

The example below shows a table with two columns and three rows:

qualityMetric_table =   # MS2 known precursor charges fractions
                  toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3,   ## charge state
                                                             "UO:0000191" = c(0.1, 0.8, 0.1)))   ## fraction of precursors with that charge
                     
## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_table, pretty = TRUE, auto_unbox = TRUE)
## {
##   "accession": "MS:4000063",
##   "name": "MS2 known precursor charges fractions",
##   "description": "\"The fraction of MS/MS precursors of the corresponding charge. The fractions [0,1] are given in the 'Fraction' column, corresponding charges in the 'Charge state' column. The highest charge state is to be interpreted as that charge state or higher.\" [PSI:MS]",
##   "value": {
##     "MS:1000041": [1, 2, 3],
##     "UO:0000191": [0.1, 0.8, 0.1]
##   }
## }

Matrix

A matrix (MS:4000006) is an R matrix:

At the point of writing, there is no matrix CV term yet. So we just make one up:

qualityMetric_matrix = toQCMetric(id = "MS:40000??", value = matrix(1:9, 3, 3), allow_unknown_id = TRUE)
## Warning in CV$byID(id): Could not find id 'MS:40000??' in CV list (length:
## 6807)
qualityMetric_matrix$name = "unknown metric"
## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_matrix, pretty = TRUE, auto_unbox = TRUE)
## {
##   "accession": "MS:40000??",
##   "name": "unknown metric",
##   "value": [
##     [1, 4, 7],
##     [2, 5, 8],
##     [3, 6, 9]
##   ]
## }