In this vignette, we will explore the OmopSketch functions
designed to provide an overview of the observation_period
table. Specifically, there are six key functions that facilitate
this:
summariseObservationPeriod(),
plotObservationPeriod() and
tableObservationPeriod(): Use them to get some overall
statistics describing the observation_period tablesummariseInObservation(),
plotInObservation(), tableInObservation(): Use
them to summarise the trend in the number of records, individuals,
person-days and females in observation during specific intervals of time
and how the median age varies.Let’s see an example of its functionalities. To start with, we will load essential packages and create a mock cdm using the mockOmopSketch() database.
library(dplyr)
library(OmopSketch)
# Connect to mock database
cdm <- mockOmopSketch()Let’s now use the summariseObservationPeriod() function
from the OmopSketch package to help us have an overview of one of the
observation_period table, including some statistics such as
the Number of subjects and Duration in days
for each observation period (e.g., 1st, 2nd)
summarisedResult <- summariseObservationPeriod(cdm$observation_period)
summarisedResult
#> # A tibble: 3,102 × 13
#>    result_id cdm_name       group_name      group_level strata_name strata_level
#>        <int> <chr>          <chr>           <chr>       <chr>       <chr>       
#>  1         1 mockOmopSketch observation_pe… all         overall     overall     
#>  2         1 mockOmopSketch observation_pe… all         overall     overall     
#>  3         1 mockOmopSketch observation_pe… all         overall     overall     
#>  4         1 mockOmopSketch observation_pe… all         overall     overall     
#>  5         1 mockOmopSketch observation_pe… all         overall     overall     
#>  6         1 mockOmopSketch observation_pe… all         overall     overall     
#>  7         1 mockOmopSketch observation_pe… all         overall     overall     
#>  8         1 mockOmopSketch observation_pe… all         overall     overall     
#>  9         1 mockOmopSketch observation_pe… all         overall     overall     
#> 10         1 mockOmopSketch observation_pe… all         overall     overall     
#> # ℹ 3,092 more rows
#> # ℹ 7 more variables: variable_name <chr>, variable_level <chr>,
#> #   estimate_name <chr>, estimate_type <chr>, estimate_value <chr>,
#> #   additional_name <chr>, additional_level <chr>Notice that the output is in the summarised result format.
We can use the arguments to specify which statistics we want to
perform. For example, use the argument estimates to
indicate which estimates you are interested regarding the
Duration in days of the observation period.
summarisedResult <- summariseObservationPeriod(cdm$observation_period,
  estimates = c("mean", "sd", "q05", "q95")
)
summarisedResult |>
  filter(variable_name == "Duration in days") |>
  select(group_level, variable_name, estimate_name, estimate_value)
#> # A tibble: 8 × 4
#>   group_level variable_name    estimate_name estimate_value  
#>   <chr>       <chr>            <chr>         <chr>           
#> 1 all         Duration in days mean          4337.41         
#> 2 all         Duration in days sd            4744.04291439658
#> 3 all         Duration in days q05           170             
#> 4 all         Duration in days q95           15181           
#> 5 1st         Duration in days mean          4337.41         
#> 6 1st         Duration in days sd            4744.04291439658
#> 7 1st         Duration in days q05           170             
#> 8 1st         Duration in days q95           15181Additionally, you can stratify the results by sex and age groups, and specify a date range of interest:
summarisedResult <- summariseObservationPeriod(cdm$observation_period,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)),
  dateRange = as.Date(c("1970-01-01", "2010-01-01"))
)
Notice that, by default, the “overall” group will be also included, as well as crossed strata (that means, sex == “Female” and ageGroup == “>35”).
tableObservationPeriod() will help you to create a table
(see supported types with: visOmopResults::tableType()). By default it
creates a gt table.
summarisedResult <- summariseObservationPeriod(cdm$observation_period,
  estimates = c("mean", "sd", "q05", "q95"),
  sex = TRUE
)
summarisedResult |>
  tableObservationPeriod()
#> ℹ <median> [<q25> - <q75>] has not been formatted.| Observation period ordinal | Variable name | Estimate name | CDM name | 
|---|---|---|---|
| mockOmopSketch | |||
| overall | |||
| all | Number records | N | 100 | 
| Number subjects | N | 100 | |
| Records per person | mean (sd) | 1.00 (0.00) | |
| Duration in days | mean (sd) | 4,337.41 (4,744.04) | |
| 1st | Number subjects | N | 100 | 
| Duration in days | mean (sd) | 4,337.41 (4,744.04) | |
| Female | |||
| all | Number records | N | 49 | 
| Number subjects | N | 49 | |
| Records per person | mean (sd) | 1.00 (0.00) | |
| Duration in days | mean (sd) | 4,296.12 (4,543.96) | |
| 1st | Number subjects | N | 49 | 
| Duration in days | mean (sd) | 4,296.12 (4,543.96) | |
| Male | |||
| all | Number records | N | 51 | 
| Number subjects | N | 51 | |
| Records per person | mean (sd) | 1.00 (0.00) | |
| Duration in days | mean (sd) | 4,377.08 (4,973.61) | |
| 1st | Number subjects | N | 51 | 
| Duration in days | mean (sd) | 4,377.08 (4,973.61) | |
Finally, we can visualise the result using
plotObservationPeriod().
summarisedResult <- summariseObservationPeriod(cdm$observation_period)
plotObservationPeriod(summarisedResult,
  variableName = "Number subjects",
  plotType = "barplot"
)Note that either Number subjects or
Duration in days can be plotted. For
Number of subjects, the plot type can be
barplot, whereas for Duration in days, the
plot type can be barplot, boxplot, or
densityplot.”
Additionally, if results were stratified by sex or age group, we can
further use facet or colour arguments to
highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summarisedResult <- summariseObservationPeriod(cdm$observation_period,
  sex = TRUE
)
plotObservationPeriod(summarisedResult,
  variableName = "Duration in days",
  plotType = "boxplot",
  facet = "sex"
)
summarisedResult <- summariseObservationPeriod(cdm$observation_period,
  sex = TRUE,
  ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))
)
plotObservationPeriod(summarisedResult,
  colour = "sex",
  facet = "age_group"
)OmopSketch can also help you to summarise the number of records in observation during specific intervals of time.
summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "years"
)
summarisedResult |>
  select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 128 × 5
#>    variable_name   estimate_name estimate_value additional_name additional_level
#>    <chr>           <chr>         <chr>          <chr>           <chr>           
#>  1 Number records… count         100            overall         overall         
#>  2 Number records… percentage    100.00         overall         overall         
#>  3 Number records… count         1              time_interval   1957-01-01 to 1…
#>  4 Number records… percentage    1.00           time_interval   1957-01-01 to 1…
#>  5 Number records… count         2              time_interval   1958-01-01 to 1…
#>  6 Number records… percentage    2.00           time_interval   1958-01-01 to 1…
#>  7 Number records… count         2              time_interval   1959-01-01 to 1…
#>  8 Number records… percentage    2.00           time_interval   1959-01-01 to 1…
#>  9 Number records… count         2              time_interval   1960-01-01 to 1…
#> 10 Number records… percentage    2.00           time_interval   1960-01-01 to 1…
#> # ℹ 118 more rowsNote that you can adjust the time interval period using the
interval argument, which can be set to either “years”,
“quarters”, “months” or “overall” (default value).
summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "months"
)
Along with the number of records in observation, you can also
calculate the number of person-days by setting the output
argument to c(“record”, “person-days”).
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("record", "person-days"))                                        
summarisedResult |>
  select(variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 4 × 5
#>   variable_name    estimate_name estimate_value additional_name additional_level
#>   <chr>            <chr>         <chr>          <chr>           <chr>           
#> 1 Number person-d… count         433741         overall         overall         
#> 2 Number records … count         100            overall         overall         
#> 3 Number person-d… percentage    100.00         overall         overall         
#> 4 Number records … percentage    100.00         overall         overallWe can further stratify our counts by sex (setting argument
sex = TRUE) or by age (providing an age group). Notice that
in both cases, the function will automatically create a group called
overall with all the sex groups and all the age groups. We can
also define a date range of interest to filter the
observation_period table accordingly.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("record", "person-days"),
                                           interval = "quarters",
                                           sex = TRUE, 
                                           ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), 
                                           dateRange = as.Date(c("1970-01-01", "2010-01-01")))                                        
summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)You can include additional output metrics by them to the output argument:
If output = "person", the trend in the number of
individuals in observation is returned.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("person"),
                                           interval = "years",
                                           sex = TRUE, 
                                           ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), 
                                           )                                        
summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 942 × 6
#>    strata_level    variable_name    estimate_name estimate_value additional_name
#>    <chr>           <chr>            <chr>         <chr>          <chr>          
#>  1 overall         Number subjects… count         100            overall        
#>  2 Female          Number subjects… count         49             overall        
#>  3 Male            Number subjects… count         51             overall        
#>  4 >=35            Number subjects… count         19             overall        
#>  5 <35             Number subjects… count         81             overall        
#>  6 Male &&& >=35   Number subjects… count         14             overall        
#>  7 Female &&& <35  Number subjects… count         44             overall        
#>  8 Female &&& >=35 Number subjects… count         5              overall        
#>  9 Male &&& <35    Number subjects… count         37             overall        
#> 10 overall         Number subjects… percentage    100.00         overall        
#> # ℹ 932 more rows
#> # ℹ 1 more variable: additional_level <chr>If output = "sex", the trend in the number of females in
observation is returned. If sex = TRUE is specified, this
stratification is ignored.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("sex"),
                                           interval = "years",
                                           sex = TRUE, 
                                           ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), 
                                           )                                        
summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 294 × 6
#>    strata_level variable_name       estimate_name estimate_value additional_name
#>    <chr>        <chr>               <chr>         <chr>          <chr>          
#>  1 overall      Number females in … count         49             overall        
#>  2 >=35         Number females in … count         5              overall        
#>  3 <35          Number females in … count         44             overall        
#>  4 overall      Number females in … percentage    49.00          overall        
#>  5 >=35         Number females in … percentage    5.00           overall        
#>  6 <35          Number females in … percentage    44.00          overall        
#>  7 overall      Number females in … count         2              time_interval  
#>  8 <35          Number females in … count         2              time_interval  
#>  9 overall      Number females in … percentage    2.00           time_interval  
#> 10 <35          Number females in … percentage    2.00           time_interval  
#> # ℹ 284 more rows
#> # ℹ 1 more variable: additional_level <chr>If output = "age, the trend in the median age of the
population in observation is calculated. If ageGroup and
interval are both specified, the age is computed at the
beginning of the interval or of the observation period, whichever is
more recent.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("age"),
                                           interval = "years",
                                           ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf)), 
                                           )                                        
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-19 20:08:56.142477
#> 
#> ✔ Summary finished, at 2025-06-19 20:08:56.255667
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-19 20:08:56.78225
#> 
#> ✔ Summary finished, at 2025-06-19 20:08:56.90689
summarisedResult |>
  select(strata_level, variable_name, estimate_name, estimate_value, additional_name, additional_level)
#> # A tibble: 162 × 6
#>    strata_level variable_name       estimate_name estimate_value additional_name
#>    <chr>        <chr>               <chr>         <chr>          <chr>          
#>  1 overall      Median age in obse… median        16             overall        
#>  2 <35          Median age in obse… median        13             overall        
#>  3 >=35         Median age in obse… median        42             overall        
#>  4 overall      Median age in obse… median        2              time_interval  
#>  5 <35          Median age in obse… median        2              time_interval  
#>  6 overall      Median age in obse… median        3              time_interval  
#>  7 <35          Median age in obse… median        3              time_interval  
#>  8 overall      Median age in obse… median        4              time_interval  
#>  9 <35          Median age in obse… median        4              time_interval  
#> 10 overall      Median age in obse… median        5              time_interval  
#> # ℹ 152 more rows
#> # ℹ 1 more variable: additional_level <chr>tableInObservartion() will help you to create a table of
type gt, reactable or datatable. By default it
creates a gt table.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                                           output = c("person", "person-days", "sex"),
                                           sex = TRUE)
summarisedResult |>
  tableInObservation(type = "gt")| Variable name | Estimate name | Sex | Database name | 
|---|---|---|---|
| mockOmopSketch | |||
| Number person-days | N (%) | Female | 210510 (48.53%) | 
| Number subjects in observation | N (%) | Female | 49 (49.00%) | 
| Number person-days | N (%) | Male | 223231 (51.47%) | 
| Number subjects in observation | N (%) | Male | 51 (51.00%) | 
| Number females in observation | N (%) | overall | 49 (49.00%) | 
| Number person-days | N (%) | overall | 433741 (100.00%) | 
| Number subjects in observation | N (%) | overall | 100 (100.00%) | 
Finally, we can visualise the trend using
plotInObservation().
summarisedResult <- summariseInObservation(cdm$observation_period,
  interval = "years"
)
plotInObservation(summarisedResult)
#> `result_id` is not present in result.
#> `result_id` is not present in result.Notice that one output at a time can be plotted. If more outputs have been included in the summarised result, you will have to filter to only include one variable at time.
Additionally, if results were stratified by sex or age group, we can
further use facet or colour arguments to
highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summarisedResult <- summariseInObservation(cdm$observation_period, 
                       interval = "years",
                       output = c("record", "age"),
                       sex = TRUE,
                       ageGroup = list("<35" = c(0, 34), ">=35" = c(35, Inf))) 
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-19 20:08:59.026081
#> 
#> ✔ Summary finished, at 2025-06-19 20:08:59.244446
#> ℹ The following estimates will be computed:
#> • age: median
#> → Start summary of data, at 2025-06-19 20:08:59.947143
#> 
#> ✔ Summary finished, at 2025-06-19 20:09:00.159181
plotInObservation(summarisedResult |> 
  filter(variable_name == "Median age in observation"),
  colour = "sex", 
  facet = "age_group")
#> `result_id` is not present in result.
#> `result_id` is not present in result.Finally, disconnect from the cdm
PatientProfiles::mockDisconnect(cdm = cdm)