2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide information about the number of counts of concepts in tables. Specifically, there are two key functions that facilitate this, summariseConceptIdCounts() and tableConceptIdCounts(). The former one creates a summary statistics results with the number of counts per each concept in the clinical table, and the latter one displays the result in a table.

2.1 Create a mock cdm

Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using mockOmopSketch().

library(duckdb)
#> Loading required package: DBI
library(OmopSketch)
library(dplyr)


cdm <- mockOmopSketch()

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of mockOmopSketch ─────────────────────────────
#> • omop tables: person, observation_period, cdm_source, concept, vocabulary,
#> concept_relationship, concept_synonym, concept_ancestor, drug_strength,
#> condition_occurrence, death, drug_exposure, measurement, observation,
#> procedure_occurrence, visit_occurrence, device_exposure
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise concept id counts

We now use the summariseConceptIdCounts() function from the OmopSketch package to retrieve counts for each concept id and name, as well as for each source concept id and name, across the clinical tables.

summariseConceptIdCounts(cdm, omopTableName = "drug_exposure") |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
  glimpse()
#> Rows: 31
#> Columns: 7
#> $ group_level      <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name    <chr> "glucagon Nasal Powder [Baqsimi]", "Sisymbrium offici…
#> $ variable_level   <chr> "1361368", "1830282", "35604883", "35604884", "374980…
#> $ estimate_name    <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value   <chr> "100", "100", "100", "100", "100", "100", "100", "100…
#> $ additional_name  <chr> "source_concept_id", "source_concept_id", "source_con…
#> $ additional_level <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…

By default, the function returns the number of records (estimate_name == "count_records") for each concept_id. To include counts by person, you can set the countBy argument to "person" or to c("record", "person") to obtain both record and person counts.

summariseConceptIdCounts(cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) |>
  select( variable_name, estimate_name, estimate_value) 
#> # A tibble: 62 × 3
#>    variable_name                                    estimate_name estimate_value
#>    <chr>                                            <chr>         <chr>         
#>  1 glucagon Nasal Powder [Baqsimi]                  count_records 100           
#>  2 glucagon Nasal Powder [Baqsimi]                  count_subjec… 63            
#>  3 Sisymbrium officianale whole extract 10 MG Nasa… count_records 100           
#>  4 Sisymbrium officianale whole extract 10 MG Nasa… count_subjec… 63            
#>  5 sumatriptan Nasal Powder [Onzetra]               count_records 100           
#>  6 sumatriptan Nasal Powder [Onzetra]               count_subjec… 60            
#>  7 sumatriptan 11 MG Nasal Powder [Onzetra]         count_records 100           
#>  8 sumatriptan 11 MG Nasal Powder [Onzetra]         count_subjec… 59            
#>  9 Bos taurus catalase preparation                  count_records 100           
#> 10 Bos taurus catalase preparation                  count_subjec… 64            
#> # ℹ 52 more rows

Further stratification can be applied using the interval, sex, and ageGroup arguments. The interval argument supports “overall” (no time stratification), “years”, “quarters”, or “months”.

summariseConceptIdCounts(cdm,
  omopTableName = "condition_occurrence",
  countBy = "person",
  interval = "years",
  sex = TRUE,
  ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
  select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
  glimpse()
#> Rows: 1,289
#> Columns: 5
#> $ group_level      <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Manic mood", "Manic symptoms co-occurrent and due to…
#> $ estimate_name    <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseConceptIdCounts(cdm,
                                             omopTableName = "condition_occurrence",
                                             dateRange = as.Date(c("1990-01-01", "2010-01-01"))) 
summarisedResult |>
  omopgenerics::settings()|>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_concept_id_counts"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "0.4.0"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> "source_concept_id"
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

Finally, you can summarise concept counts on a subset of records by specifying the sample argument.

summariseConceptIdCounts(cdm,
                         omopTableName = "condition_occurrence",
                         sample = 50) |>
  select(group_level, variable_name, estimate_name) |>
  glimpse()
#> Rows: 6
#> Columns: 3
#> $ group_level   <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Elevated mood", "Victim of vehicular AND/OR traffic acc…
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…

3.1 Display the results

Finally, concept counts can be visualised using tableConceptIdCounts(). By default, it generates an interactive reactable table, but DT datatables are also supported.

result <- summariseConceptIdCounts(cdm,
  omopTableName = "measurement",
  countBy = "record"
) 
tableConceptIdCounts(result, type = "reactable")
Database name
OMOP table
Standard concept name
Standard concept id
Source concept id
N records
mockOmopSketch (1)
measurement (20)
Alkaline phosphatase - bone isoenzyme measurement
4154344
0
100
Alkaline phosphatase bone isoenzyme activity measurement
44810792
0
100
Alkaline phosphatase bone isoenzyme measurement, serum
4197973
0
100
Alkaline phosphatase isoenzymes measurement
4220699
0
100
Alkaline phosphatase.bone [enzymatic activity/volume] in serum or plasma
3001467
0
100
Alkaline phosphatase.bone [mass/volume] in serum or plasma
3018910
0
100
Alkaline phosphatase.bone [presence] in serum or plasma
3042479
0
100
Alkaline phosphatase.bone/alkaline phosphatase.total in serum or plasma
3002069
0
100
Cyclohexanone [mass/volume] in urine
3026972
0
100
Phenx - caffeine protocol 050301
40765028
0
100
Plasma alkaline phosphatase bone isoenzyme measurement
4195342
0
100
Uroporphyrin 3 isomer [mass/time] in 24 hour stool
3046614
0
100
Uroporphyrin 3 isomer [mass/time] in 24 hour urine
21492201
0
100
Uroporphyrin 3 isomer [moles/mass] in stool
3046700
0
100
Uroporphyrin 3 isomer [moles/time] in 24 hour urine
3041300
0
100
Uroporphyrin 3 isomer [moles/volume] in serum or plasma
3011539
0
100
Uroporphyrin 3 isomer [moles/volume] in stool
3026074
0
100
Uroporphyrin 3 isomer [moles/volume] in urine
3012056
0
100
Uroporphyrin iii measurement
4004858
0
100
Uroporphyrin measurement
4150381
0
100
tableConceptIdCounts(result, type = "datatable")
Standard concept nameStandard concept idSource concept idmockOmopSketch
0
Showing 1 to 11 of 20 entries

The display argument in tableConceptIdCounts() controls which concept counts are shown. Available options include display = "overall". It is the default option and it shows both standard and source concept counts.

tableConceptIdCounts(result, display = "overall")
Database name
OMOP table
Standard concept name
Standard concept id
Source concept id
N records
mockOmopSketch (1)
measurement (20)
Alkaline phosphatase - bone isoenzyme measurement
4154344
0
100
Alkaline phosphatase bone isoenzyme activity measurement
44810792
0
100
Alkaline phosphatase bone isoenzyme measurement, serum
4197973
0
100
Alkaline phosphatase isoenzymes measurement
4220699
0
100
Alkaline phosphatase.bone [enzymatic activity/volume] in serum or plasma
3001467
0
100
Alkaline phosphatase.bone [mass/volume] in serum or plasma
3018910
0
100
Alkaline phosphatase.bone [presence] in serum or plasma
3042479
0
100
Alkaline phosphatase.bone/alkaline phosphatase.total in serum or plasma
3002069
0
100
Cyclohexanone [mass/volume] in urine
3026972
0
100
Phenx - caffeine protocol 050301
40765028
0
100
Plasma alkaline phosphatase bone isoenzyme measurement
4195342
0
100
Uroporphyrin 3 isomer [mass/time] in 24 hour stool
3046614
0
100
Uroporphyrin 3 isomer [mass/time] in 24 hour urine
21492201
0
100
Uroporphyrin 3 isomer [moles/mass] in stool
3046700
0
100
Uroporphyrin 3 isomer [moles/time] in 24 hour urine
3041300
0
100
Uroporphyrin 3 isomer [moles/volume] in serum or plasma
3011539
0
100
Uroporphyrin 3 isomer [moles/volume] in stool
3026074
0
100
Uroporphyrin 3 isomer [moles/volume] in urine
3012056
0
100
Uroporphyrin iii measurement
4004858
0
100
Uroporphyrin measurement
4150381
0
100

If display = "standard" the table shows only standard concept_id and concept_name counts.

tableConceptIdCounts(result, display = "standard")
Database name
OMOP table
Standard concept name
Standard concept id
N records
mockOmopSketch (1)
measurement (20)
Alkaline phosphatase - bone isoenzyme measurement
4154344
100
Alkaline phosphatase bone isoenzyme activity measurement
44810792
100
Alkaline phosphatase bone isoenzyme measurement, serum
4197973
100
Alkaline phosphatase isoenzymes measurement
4220699
100
Alkaline phosphatase.bone [enzymatic activity/volume] in serum or plasma
3001467
100
Alkaline phosphatase.bone [mass/volume] in serum or plasma
3018910
100
Alkaline phosphatase.bone [presence] in serum or plasma
3042479
100
Alkaline phosphatase.bone/alkaline phosphatase.total in serum or plasma
3002069
100
Cyclohexanone [mass/volume] in urine
3026972
100
Phenx - caffeine protocol 050301
40765028
100
Plasma alkaline phosphatase bone isoenzyme measurement
4195342
100
Uroporphyrin 3 isomer [mass/time] in 24 hour stool
3046614
100
Uroporphyrin 3 isomer [mass/time] in 24 hour urine
21492201
100
Uroporphyrin 3 isomer [moles/mass] in stool
3046700
100
Uroporphyrin 3 isomer [moles/time] in 24 hour urine
3041300
100
Uroporphyrin 3 isomer [moles/volume] in serum or plasma
3011539
100
Uroporphyrin 3 isomer [moles/volume] in stool
3026074
100
Uroporphyrin 3 isomer [moles/volume] in urine
3012056
100
Uroporphyrin iii measurement
4004858
100
Uroporphyrin measurement
4150381
100

If display = "source" the table shows only source concept_id and concept_name counts.

tableConceptIdCounts(result, display = "source")
#> Warning: Values from `estimate_value` are not uniquely identified; output will contain
#> list-cols.
#> • Use `values_fn = list` to suppress this warning.
#> • Use `values_fn = {summary_fun}` to summarise duplicates.
#> • Use the following dplyr code to identify duplicates.
#>   {data} |>
#>   dplyr::summarise(n = dplyr::n(), .by = c(cdm_name, group_level,
#>   source_concept_id, result_id, group_name, estimate_type, estimate_name)) |>
#>   dplyr::filter(n > 1L)
Database name
OMOP table
Source concept id
N records
mockOmopSketch (1)
measurement (1)
0
100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100

If display = "missing source" the table shows only counts for concept ids that are missing a corresponding source concept id.

tableConceptIdCounts(result, display = "missing source")
Database name
OMOP table
Standard concept name
Standard concept id
N records
mockOmopSketch (1)
measurement (20)
Alkaline phosphatase - bone isoenzyme measurement
4154344
100
Alkaline phosphatase bone isoenzyme activity measurement
44810792
100
Alkaline phosphatase bone isoenzyme measurement, serum
4197973
100
Alkaline phosphatase isoenzymes measurement
4220699
100
Alkaline phosphatase.bone [enzymatic activity/volume] in serum or plasma
3001467
100
Alkaline phosphatase.bone [mass/volume] in serum or plasma
3018910
100
Alkaline phosphatase.bone [presence] in serum or plasma
3042479
100
Alkaline phosphatase.bone/alkaline phosphatase.total in serum or plasma
3002069
100
Cyclohexanone [mass/volume] in urine
3026972
100
Phenx - caffeine protocol 050301
40765028
100
Plasma alkaline phosphatase bone isoenzyme measurement
4195342
100
Uroporphyrin 3 isomer [mass/time] in 24 hour stool
3046614
100
Uroporphyrin 3 isomer [mass/time] in 24 hour urine
21492201
100
Uroporphyrin 3 isomer [moles/mass] in stool
3046700
100
Uroporphyrin 3 isomer [moles/time] in 24 hour urine
3041300
100
Uroporphyrin 3 isomer [moles/volume] in serum or plasma
3011539
100
Uroporphyrin 3 isomer [moles/volume] in stool
3026074
100
Uroporphyrin 3 isomer [moles/volume] in urine
3012056
100
Uroporphyrin iii measurement
4004858
100
Uroporphyrin measurement
4150381
100

If display = "missing standard" the table shows only counts for source concept ids that are missing a mapped standard concept id.

tableConceptIdCounts(result, display = "missing standard")
#> Warning: `result` does not contain any `summarise_concept_id_counts` data.
Table has no data
No rows found