Programming Workflow

Read in Data
Derive/Impute End and Start Analysis Date/time and Relative Day
Derive Durations
Derive ATC variables
Derive Planned and Actual Treatment
Derive Date/Date-time of Last Dose
Derive Treatment Dose and Unit
Derive Severity, Causality, and Toxicity Grade
Derive Treatment Emergent Flag
Derive Occurrence Flags
Derive Query Variables
Add ADSL variables
Derive Analysis Sequence Number
Add Labels and Attributes

Read in Data

To start, all data frames needed for the creation of ADAE should be read into the environment. This will be a company specific process. Some of the data frames needed may be AE and ADSL

For example purpose, the CDISC Pilot SDTM and ADaM datasets —which are included in {pharmaversesdtm}— are used.

library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(lubridate)

ae <- pharmaversesdtm::ae
adsl <- admiral::admiral_adsl
ex_single <- admiral::ex_single

ae <- convert_blanks_to_na(ae)

At this step, it may be useful to join ADSL to your AE domain as well. Only the ADSL variables used for derivations are selected at this step. The rest of the relevant ADSL variables would be added later.

adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01A, TRT01P, DTHDT, EOSDT)

adae <- derive_vars_merged(
  ae,
  dataset_add = adsl,
  new_vars = adsl_vars,
  by = exprs(STUDYID, USUBJID)
)

USUBJID	AESEQ	AETERM	AESTDTC	TRTSDT	TRTEDT	TRT01A	TRT01P	DTHDT	EOSDT
01-701-1015	1	APPLICATION SITE ERYTHEMA	2014-01-03	2014-01-02	2014-07-02	Placebo	Placebo	NA	2014-07-02
01-701-1015	2	APPLICATION SITE PRURITUS	2014-01-03	2014-01-02	2014-07-02	Placebo	Placebo	NA	2014-07-02
01-701-1015	3	DIARRHOEA	2014-01-09	2014-01-02	2014-07-02	Placebo	Placebo	NA	2014-07-02
01-701-1023	3	ATRIOVENTRICULAR BLOCK SECOND DEGREE	2012-08-26	2012-08-05	2012-09-01	Placebo	Placebo	NA	2012-09-02
01-701-1023	1	ERYTHEMA	2012-08-07	2012-08-05	2012-09-01	Placebo	Placebo	NA	2012-09-02
01-701-1023	2	ERYTHEMA	2012-08-07	2012-08-05	2012-09-01	Placebo	Placebo	NA	2012-09-02
01-701-1023	4	ERYTHEMA	2012-08-07	2012-08-05	2012-09-01	Placebo	Placebo	NA	2012-09-02
01-703-1086	1	APPLICATION SITE IRRITATION	2012-09-13	2012-09-02	2012-12-04	Xanomeline Low Dose	Xanomeline Low Dose	NA	2012-12-24
01-703-1086	2	APPLICATION SITE IRRITATION	2012-09-13	2012-09-02	2012-12-04	Xanomeline Low Dose	Xanomeline Low Dose	NA	2012-12-24
01-703-1086	3	APPLICATION SITE IRRITATION	2012-09-13	2012-09-02	2012-12-04	Xanomeline Low Dose	Xanomeline Low Dose	NA	2012-12-24

Derive/Impute End and Start Analysis Date/time and Relative Day

This part derives ASTDTM, ASTDT, ASTDY, AENDTM, AENDT, and AENDY. The function derive_vars_dtm() can be used to derive ASTDTM and AENDTM where ASTDTM could be company-specific. ASTDT and AENDT can be derived from ASTDTM and AENDTM, respectively, using function derive_vars_dtm_to_dt(). derive_vars_dy() can be used to create ASTDY and AENDY.

adae <- adae %>%
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    min_dates = exprs(TRTSDT)
  ) %>%
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = exprs(DTHDT, EOSDT)
  ) %>%
  derive_vars_dtm_to_dt(exprs(ASTDTM, AENDTM)) %>%
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ASTDT, AENDT)
  )

USUBJID	AESTDTC	AEENDTC	ASTDTM	ASTDT	ASTDY	AENDTM	AENDT	AENDY
01-701-1015	2014-01-03	NA	2014-01-03	2014-01-03	2	NA	NA	NA
01-701-1015	2014-01-03	NA	2014-01-03	2014-01-03	2	NA	NA	NA
01-701-1015	2014-01-09	2014-01-11	2014-01-09	2014-01-09	8	2014-01-11 23:59:59	2014-01-11	10
01-701-1023	2012-08-26	NA	2012-08-26	2012-08-26	22	NA	NA	NA
01-701-1023	2012-08-07	2012-08-30	2012-08-07	2012-08-07	3	2012-08-30 23:59:59	2012-08-30	26
01-701-1023	2012-08-07	NA	2012-08-07	2012-08-07	3	NA	NA	NA
01-701-1023	2012-08-07	2012-08-30	2012-08-07	2012-08-07	3	2012-08-30 23:59:59	2012-08-30	26
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2012-09-13	12	2013-01-02 23:59:59	2013-01-02	123
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2012-09-13	12	2013-01-02 23:59:59	2013-01-02	123
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2012-09-13	12	2013-01-02 23:59:59	2013-01-02	123

Derive Durations

The function derive_vars_duration() can be used to create the variables ADURN and ADURU.

adae <- adae %>%
  derive_vars_duration(
    new_var = ADURN,
    new_var_unit = ADURU,
    start_date = ASTDT,
    end_date = AENDT
  )

USUBJID	AESTDTC	AEENDTC	ASTDT	AENDT	ADURN	ADURU
01-701-1015	2014-01-03	NA	2014-01-03	NA	NA	NA
01-701-1015	2014-01-03	NA	2014-01-03	NA	NA	NA
01-701-1015	2014-01-09	2014-01-11	2014-01-09	2014-01-11	3	DAYS
01-701-1023	2012-08-26	NA	2012-08-26	NA	NA	NA
01-701-1023	2012-08-07	2012-08-30	2012-08-07	2012-08-30	24	DAYS
01-701-1023	2012-08-07	NA	2012-08-07	NA	NA	NA
01-701-1023	2012-08-07	2012-08-30	2012-08-07	2012-08-30	24	DAYS
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2013-01-02	112	DAYS
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2013-01-02	112	DAYS
01-703-1086	2012-09-13	2013-01-02	2012-09-13	2013-01-02	112	DAYS

Derive ATC variables

The function derive_vars_atc() can be used to derive ATC Class Variables.

It helps to add Anatomical Therapeutic Chemical class variables from FACM to ADCM.

The expected result is the input dataset with ATC variables added.

cm <- tibble::tribble(
  ~STUDYID,  ~USUBJID,       ~CMGRPID, ~CMREFID,  ~CMDECOD,
  "STUDY01", "BP40257-1001", "14",     "1192056", "PARACETAMOL",
  "STUDY01", "BP40257-1001", "18",     "2007001", "SOLUMEDROL",
  "STUDY01", "BP40257-1002", "19",     "2791596", "SPIRONOLACTONE"
)
facm <- tibble::tribble(
  ~STUDYID,  ~USUBJID,       ~FAGRPID, ~FAREFID,  ~FATESTCD,  ~FASTRESC,
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC1CD", "N",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC2CD", "N02",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC3CD", "N02B",
  "STUDY01", "BP40257-1001", "1",      "1192056", "CMATC4CD", "N02BE",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC1CD", "D",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC2CD", "D10",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC3CD", "D10A",
  "STUDY01", "BP40257-1001", "1",      "2007001", "CMATC4CD", "D10AA",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC1CD", "D",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC2CD", "D07",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC3CD", "D07A",
  "STUDY01", "BP40257-1001", "2",      "2007001", "CMATC4CD", "D07AA",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC1CD", "H",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC2CD", "H02",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC3CD", "H02A",
  "STUDY01", "BP40257-1001", "3",      "2007001", "CMATC4CD", "H02AB",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC1CD", "C",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC2CD", "C03",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC3CD", "C03D",
  "STUDY01", "BP40257-1002", "1",      "2791596", "CMATC4CD", "C03DA"
)

derive_vars_atc(cm, dataset_facm = facm, id_vars = exprs(FAGRPID))
#> # A tibble: 5 × 9
#>   STUDYID USUBJID      CMGRPID CMREFID CMDECOD       ATC1CD ATC2CD ATC3CD ATC4CD
#>   <chr>   <chr>        <chr>   <chr>   <chr>         <chr>  <chr>  <chr>  <chr> 
#> 1 STUDY01 BP40257-1001 14      1192056 PARACETAMOL   N      N02    N02B   N02BE 
#> 2 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    D      D10    D10A   D10AA 
#> 3 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    D      D07    D07A   D07AA 
#> 4 STUDY01 BP40257-1001 18      2007001 SOLUMEDROL    H      H02    H02A   H02AB 
#> 5 STUDY01 BP40257-1002 19      2791596 SPIRONOLACTO… C      C03    C03D   C03DA

Derive Planned and Actual Treatment

TRTA and TRTP must match at least one value of the character treatment variables in ADSL (e.g., TRTxxA/TRTxxP, TRTSEQA/TRTSEQP, TRxxAGy/TRxxPGy).

An example of a simple implementation for a study without periods could be:

adae <- mutate(adae, TRTP = TRT01P, TRTA = TRT01A)

count(adae, TRTP, TRTA, TRT01P, TRT01A)
#> # A tibble: 2 × 5
#>   TRTP                TRTA                TRT01P              TRT01A           n
#>   <chr>               <chr>               <chr>               <chr>        <int>
#> 1 Placebo             Placebo             Placebo             Placebo         10
#> 2 Xanomeline Low Dose Xanomeline Low Dose Xanomeline Low Dose Xanomeline …     6

For studies with periods see the “Visit and Period Variables” vignette.

Derive Date/Date-time of Last Dose

The function derive_vars_joined() can be used to derive the last dose date before the start of the event.

ex_single <- derive_vars_dtm(
  ex_single,
  dtc = EXSTDTC,
  new_vars_prefix = "EXST",
  flag_imputation = "none"
)

adae <- derive_vars_joined(
  adae,
  ex_single,
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(LDOSEDTM = EXSTDTM),
  join_vars = exprs(EXSTDTM),
  join_type = "all",
  order = exprs(EXSTDTM),
  filter_add = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) & !is.na(EXSTDTM),
  filter_join = EXSTDTM <= ASTDTM,
  mode = "last"
)

USUBJID	AEDECOD	AESEQ	AESTDTC	AEENDTC	ASTDT	AENDT	LDOSEDTM
01-701-1015	APPLICATION SITE ERYTHEMA	1	2014-01-03	NA	2014-01-03	NA	2014-01-03
01-701-1015	APPLICATION SITE PRURITUS	2	2014-01-03	NA	2014-01-03	NA	2014-01-03
01-701-1015	DIARRHOEA	3	2014-01-09	2014-01-11	2014-01-09	2014-01-11	2014-01-09
01-701-1023	ATRIOVENTRICULAR BLOCK SECOND DEGREE	3	2012-08-26	NA	2012-08-26	NA	2012-08-26
01-701-1023	ERYTHEMA	1	2012-08-07	2012-08-30	2012-08-07	2012-08-30	2012-08-07
01-701-1023	ERYTHEMA	2	2012-08-07	NA	2012-08-07	NA	2012-08-07
01-701-1023	ERYTHEMA	4	2012-08-07	2012-08-30	2012-08-07	2012-08-30	2012-08-07
01-703-1086	APPLICATION SITE IRRITATION	1	2012-09-13	2013-01-02	2012-09-13	2013-01-02	2012-09-13
01-703-1086	APPLICATION SITE IRRITATION	2	2012-09-13	2013-01-02	2012-09-13	2013-01-02	2012-09-13
01-703-1086	APPLICATION SITE IRRITATION	3	2012-09-13	2013-01-02	2012-09-13	2013-01-02	2012-09-13

Derive Treatment Dose and Unit

In a similar manner, you could derive the treatment dose and unit at the time of the event. Please note that it is assumed that the dosing intervals do not overlap. If this case occurs, the derive_vars_joined() call below will throw an error as handling this case is study-specific.

ex_single <- derive_vars_dtm(
  ex_single,
  dtc = EXENDTC,
  new_vars_prefix = "EXEN",
  time_imputation = "last",
  flag_imputation = "none"
)

adae <- derive_vars_joined(
  adae,
  ex_single,
  by_vars = exprs(STUDYID, USUBJID),
  new_vars = exprs(DOSEON = EXDOSE, DOSEU = EXDOSU),
  join_vars = exprs(EXSTDTM, EXENDTM),
  join_type = "all",
  filter_add = (EXDOSE > 0 | (EXDOSE == 0 & grepl("PLACEBO", EXTRT))) & !is.na(EXSTDTM),
  filter_join = EXSTDTM <= ASTDTM & (ASTDTM <= EXENDTM | is.na(EXENDTM))
)

USUBJID	AEDECOD	AESEQ	AESTDTC	AEENDTC	ASTDT	AENDT	DOSEON	DOSEU
01-701-1015	APPLICATION SITE ERYTHEMA	1	2014-01-03	NA	2014-01-03	NA	0	mg
01-701-1015	APPLICATION SITE PRURITUS	2	2014-01-03	NA	2014-01-03	NA	0	mg
01-701-1015	DIARRHOEA	3	2014-01-09	2014-01-11	2014-01-09	2014-01-11	0	mg
01-701-1023	ATRIOVENTRICULAR BLOCK SECOND DEGREE	3	2012-08-26	NA	2012-08-26	NA	0	mg
01-701-1023	ERYTHEMA	1	2012-08-07	2012-08-30	2012-08-07	2012-08-30	0	mg
01-701-1023	ERYTHEMA	2	2012-08-07	NA	2012-08-07	NA	0	mg
01-701-1023	ERYTHEMA	4	2012-08-07	2012-08-30	2012-08-07	2012-08-30	0	mg
01-703-1086	APPLICATION SITE IRRITATION	1	2012-09-13	2013-01-02	2012-09-13	2013-01-02	54	mg
01-703-1086	APPLICATION SITE IRRITATION	2	2012-09-13	2013-01-02	2012-09-13	2013-01-02	54	mg
01-703-1086	APPLICATION SITE IRRITATION	3	2012-09-13	2013-01-02	2012-09-13	2013-01-02	54	mg

Derive Severity, Causality, and Toxicity Grade

The variables ASEV, AREL, and ATOXGR can be added using simple dplyr::mutate() assignments, if no imputation is required.

adae <- adae %>%
  mutate(
    ASEV = AESEV,
    AREL = AEREL
  )

Derive Treatment Emergent Flag

To derive the treatment emergent flag TRTEMFL, one can call derive_var_trtemfl(). In the example below, we use 30 days in the flag derivation.

adae <- adae %>%
  derive_var_trtemfl(
    trt_start_date = TRTSDT,
    trt_end_date = TRTEDT,
    end_window = 30
  )

USUBJID	TRTSDT	TRTEDT	AESTDTC	ASTDT	TRTEMFL
01-701-1015	2014-01-02	2014-07-02	2014-01-03	2014-01-03	Y
01-701-1015	2014-01-02	2014-07-02	2014-01-03	2014-01-03	Y
01-701-1015	2014-01-02	2014-07-02	2014-01-09	2014-01-09	Y
01-701-1023	2012-08-05	2012-09-01	2012-08-26	2012-08-26	Y
01-701-1023	2012-08-05	2012-09-01	2012-08-07	2012-08-07	Y
01-701-1023	2012-08-05	2012-09-01	2012-08-07	2012-08-07	Y
01-701-1023	2012-08-05	2012-09-01	2012-08-07	2012-08-07	Y
01-703-1086	2012-09-02	2012-12-04	2012-09-13	2012-09-13	Y
01-703-1086	2012-09-02	2012-12-04	2012-09-13	2012-09-13	Y
01-703-1086	2012-09-02	2012-12-04	2012-09-13	2012-09-13	Y

To derive on-treatment flag (ONTRTFL) in an ADaM dataset with a single occurrence date, we use derive_var_ontrtfl().

The expected result is the input dataset with an additional column named ONTRTFL with a value of "Y" or NA.

If you want to also check an end date, you could add the end_date argument. Note that in this scenario you could set span_period = TRUE if you want occurrences that started prior to drug intake, and was ongoing or ended after this time to be considered as on-treatment.

bds1 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-02-24"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-01-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2019-12-31"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds1,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT
)
#> # A tibble: 3 × 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-02-24 2020-01-01 2020-03-01 Y      
#> 2 P02     2020-01-01 2020-01-01 2020-03-01 Y      
#> 3 P03     2019-12-31 2020-01-01 2020-03-01 <NA>

bds2 <- tibble::tribble(
  ~USUBJID, ~ADT,              ~TRTSDT,           ~TRTEDT,
  "P01",    ymd("2020-07-01"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P02",    ymd("2020-04-30"), ymd("2020-01-01"), ymd("2020-03-01"),
  "P03",    ymd("2020-03-15"), ymd("2020-01-01"), ymd("2020-03-01")
)
derive_var_ontrtfl(
  bds2,
  start_date = ADT,
  ref_start_date = TRTSDT,
  ref_end_date = TRTEDT,
  ref_end_window = 60
)
#> # A tibble: 3 × 5
#>   USUBJID ADT        TRTSDT     TRTEDT     ONTRTFL
#>   <chr>   <date>     <date>     <date>     <chr>  
#> 1 P01     2020-07-01 2020-01-01 2020-03-01 <NA>   
#> 2 P02     2020-04-30 2020-01-01 2020-03-01 Y      
#> 3 P03     2020-03-15 2020-01-01 2020-03-01 Y

bds3 <- tibble::tribble(
  ~ADTM,              ~TRTSDTM,           ~TRTEDTM,           ~TPT,
  "2020-01-02T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA,
  "2020-01-01T12:00", "2020-01-01T12:00", "2020-03-01T12:00", "PRE",
  "2019-12-31T12:00", "2020-01-01T12:00", "2020-03-01T12:00", NA
) %>%
  mutate(
    ADTM = ymd_hm(ADTM),
    TRTSDTM = ymd_hm(TRTSDTM),
    TRTEDTM = ymd_hm(TRTEDTM)
  )
derive_var_ontrtfl(
  bds3,
  start_date = ADTM,
  ref_start_date = TRTSDTM,
  ref_end_date = TRTEDTM,
  filter_pre_timepoint = TPT == "PRE"
)
#> # A tibble: 3 × 5
#>   ADTM                TRTSDTM             TRTEDTM             TPT   ONTRTFL
#>   <dttm>              <dttm>              <dttm>              <chr> <chr>  
#> 1 2020-01-02 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  Y      
#> 2 2020-01-01 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 PRE   <NA>   
#> 3 2019-12-31 12:00:00 2020-01-01 12:00:00 2020-03-01 12:00:00 <NA>  <NA>

Derive Occurrence Flags

The function derive_var_extreme_flag() can help derive variables such as AOCCIFL, AOCCPIFL, AOCCSIFL, and AOCCzzFL.

If grades were collected, the following can be used to flag first occurrence of maximum toxicity grade.

adae <- adae %>%
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars = exprs(USUBJID),
      order = exprs(desc(ATOXGR), ASTDTM, AESEQ),
      new_var = AOCCIFL,
      mode = "first"
    ),
    filter = TRTEMFL == "Y"
  )

Similarly, ASEV can also be used to derive the occurrence flags, if severity is collected. In this case, the variable will need to be recoded to a numeric variable. Flag first occurrence of most severe adverse event:

adae <- adae %>%
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars = exprs(USUBJID),
      order = exprs(
        as.integer(factor(
          ASEV,
          levels = c("DEATH THREATENING", "SEVERE", "MODERATE", "MILD")
        )),
        ASTDTM, AESEQ
      ),
      new_var = AOCCIFL,
      mode = "first"
    ),
    filter = TRTEMFL == "Y"
  )

USUBJID	ASTDTM	ASEV	AESEQ	TRTEMFL	AOCCIFL
01-701-1015	2014-01-03	MILD	1	Y	Y
01-701-1015	2014-01-03	MILD	2	Y	NA
01-701-1015	2014-01-09	MILD	3	Y	NA
01-701-1023	2012-08-07	MODERATE	2	Y	Y
01-701-1023	2012-08-07	MILD	1	Y	NA
01-701-1023	2012-08-07	MILD	4	Y	NA
01-701-1023	2012-08-26	MILD	3	Y	NA
01-703-1086	2012-09-13	SEVERE	3	Y	Y
01-703-1086	2012-09-13	MODERATE	2	Y	NA
01-703-1086	2012-09-13	MILD	1	Y	NA

Derive Query Variables

For deriving query variables SMQzzNAM, SMQzzCD, SMQzzSC, SMQzzSCN, or CQzzNAM the derive_vars_query() function can be used. As input it expects a queries dataset, which provides the definition of the queries. See Queries dataset documentation for a detailed description of the queries dataset. The create_query_data() function can be used to create queries datasets.

The following example shows how to derive query variables for Standardized MedDRA Queries (SMQs) in ADAE.

queries <- admiral::queries

PREFIX	GRPNAME	GRPID	SCOPE	SCOPEN	SRCVAR	TERMCHAR	TERMNUM
CQ01	Dermatologic events	NA	NA	NA	AELLT	APPLICATION SITE ERYTHEMA	NA
CQ01	Dermatologic events	NA	NA	NA	AELLT	APPLICATION SITE PRURITUS	NA
CQ01	Dermatologic events	NA	NA	NA	AELLT	ERYTHEMA	NA
CQ01	Dermatologic events	NA	NA	NA	AELLT	LOCALIZED ERYTHEMA	NA
CQ01	Dermatologic events	NA	NA	NA	AELLT	GENERALIZED PRURITUS	NA
SMQ02	Immune-Mediated Hypothyroidism	20000160	BROAD	1	AEDECOD	BIOPSY THYROID GLAND ABNORMAL	NA
SMQ02	Immune-Mediated Hypothyroidism	20000160	BROAD	1	AEDECOD	BLOOD THYROID STIMULATING HORMONE ABNORMAL	NA
SMQ02	Immune-Mediated Hypothyroidism	20000160	NARROW	2	AEDECOD	BIOPSY THYROID GLAND INCREASED	NA
SMQ03	Immune-Mediated Guillain-Barre Syndrome	20000131	NARROW	2	AEDECOD	GUILLAIN-BARRE SYNDROME	NA
SMQ03	Immune-Mediated Guillain-Barre Syndrome	20000131	NARROW	2	AEDECOD	MILLER FISHER SYNDROME	NA

adae1 <- tibble::tribble(
  ~USUBJID, ~ASTDTM, ~AETERM, ~AESEQ, ~AEDECOD, ~AELLT, ~AELLTCD,
  "01", "2020-06-02 23:59:59", "ALANINE AMINOTRANSFERASE ABNORMAL",
  3, "Alanine aminotransferase abnormal", NA_character_, NA_integer_,
  "02", "2020-06-05 23:59:59", "BASEDOW'S DISEASE",
  5, "Basedow's disease", NA_character_, 1L,
  "03", "2020-06-07 23:59:59", "SOME TERM",
  2, "Some query", "Some term", NA_integer_,
  "05", "2020-06-09 23:59:59", "ALVEOLAR PROTEINOSIS",
  7, "Alveolar proteinosis", NA_character_, NA_integer_
)

adae_query <- derive_vars_query(dataset = adae1, dataset_queries = queries)

USUBJID	ASTDTM	AETERM	AESEQ	AEDECOD	AELLT	AELLTCD	SMQ02NAM	SMQ02CD	SMQ02SC	SMQ02SCN	SMQ03NAM	SMQ03CD	SMQ03SC	SMQ03SCN	SMQ05NAM	SMQ05CD	SMQ05SC	SMQ05SCN	CQ01NAM	CQ04NAM	CQ04CD	CQ06NAM	CQ06CD
01	2020-06-02 23:59:59	ALANINE AMINOTRANSFERASE ABNORMAL	3	Alanine aminotransferase abnormal	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
02	2020-06-05 23:59:59	BASEDOW’S DISEASE	5	Basedow’s disease	NA	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Immune-Mediated Colitis	10009888
03	2020-06-07 23:59:59	SOME TERM	2	Some query	Some term	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
05	2020-06-09 23:59:59	ALVEOLAR PROTEINOSIS	7	Alveolar proteinosis	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	Immune-Mediated Pneumonitis	20000042	NARROW	2	NA	NA	NA	NA	NA

Similarly to SMQ, the derive_vars_query() function can be used to derive Standardized Drug Groupings (SDG).

sdg <- tibble::tribble(
  ~PREFIX, ~GRPNAME,          ~GRPID, ~SCOPE,  ~SCOPEN, ~SRCVAR,   ~TERMCHAR,          ~TERMNUM,
  "SDG01", "Diuretics",           11, "BROAD", 1,       "CMDECOD", "Diuretic 1",       NA,
  "SDG01", "Diuretics",           11, "BROAD", 1,       "CMDECOD", "Diuretic 2",       NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 1", NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 2", NA,
  "SDG02", "Costicosteroids",     12, "BROAD", 1,       "CMDECOD", "Costicosteroid 3", NA,
)
adcm <- tibble::tribble(
  ~USUBJID, ~ASTDTM,               ~CMDECOD,
  "01",     "2020-06-02 23:59:59", "Diuretic 1",
  "02",     "2020-06-05 23:59:59", "Diuretic 1",
  "03",     "2020-06-07 23:59:59", "Costicosteroid 2",
  "05",     "2020-06-09 23:59:59", "Diuretic 2"
)
adcm_query <- derive_vars_query(adcm, sdg)

USUBJID	ASTDTM	CMDECOD	SDG01NAM	SDG01CD	SDG01SC	SDG01SCN	SDG02NAM	SDG02CD	SDG02SC	SDG02SCN
01	2020-06-02 23:59:59	Diuretic 1	Diuretics	11	BROAD	1	NA	NA	NA	NA
02	2020-06-05 23:59:59	Diuretic 1	Diuretics	11	BROAD	1	NA	NA	NA	NA
03	2020-06-07 23:59:59	Costicosteroid 2	NA	NA	NA	NA	Costicosteroids	12	BROAD	1
05	2020-06-09 23:59:59	Diuretic 2	Diuretics	11	BROAD	1	NA	NA	NA	NA

Add the `ADSL` variables

If needed, the other ADSL variables can now be added:

adae <- adae %>%
  derive_vars_merged(
    dataset_add = select(adsl, !!!negate_vars(adsl_vars)),
    by_vars = exprs(STUDYID, USUBJID)
  )

USUBJID	AEDECOD	ASTDTM	DTHDT	RFSTDTC	RFENDTC	AGE	AGEU	SEX
01-701-1015	APPLICATION SITE ERYTHEMA	2014-01-03	NA	2014-01-02	2014-07-02	63	YEARS	F
01-701-1015	APPLICATION SITE PRURITUS	2014-01-03	NA	2014-01-02	2014-07-02	63	YEARS	F
01-701-1015	DIARRHOEA	2014-01-09	NA	2014-01-02	2014-07-02	63	YEARS	F
01-701-1023	ERYTHEMA	2012-08-07	NA	2012-08-05	2012-09-02	64	YEARS	M
01-701-1023	ERYTHEMA	2012-08-07	NA	2012-08-05	2012-09-02	64	YEARS	M
01-701-1023	ERYTHEMA	2012-08-07	NA	2012-08-05	2012-09-02	64	YEARS	M
01-701-1023	ATRIOVENTRICULAR BLOCK SECOND DEGREE	2012-08-26	NA	2012-08-05	2012-09-02	64	YEARS	M
01-703-1086	APPLICATION SITE IRRITATION	2012-09-13	NA	2012-09-02	2012-12-24	71	YEARS	M
01-703-1086	APPLICATION SITE IRRITATION	2012-09-13	NA	2012-09-02	2012-12-24	71	YEARS	M
01-703-1086	APPLICATION SITE IRRITATION	2012-09-13	NA	2012-09-02	2012-12-24	71	YEARS	M

Derive Analysis Sequence Number

The function derive_var_obs_number() can be used for deriving ASEQ variable to ensure the uniqueness of subject records within the dataset.

For example, there can be multiple records present in ADCM for a single subject with the same ASTDTM and CMSEQ variables. But these records still differ at ATC level:

adcm <- tibble::tribble(
  ~USUBJID,       ~ASTDTM,          ~CMSEQ, ~CMDECOD,         ~ATC1CD, ~ATC2CD, ~ATC3CD, ~ATC4CD,
  "BP40257-1001", "2013-07-05 UTC", "14",   "PARACETAMOL",    "N",     "N02",   "N02B",  "N02BE",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "D",     "D10",   "D10A",  "D10AA",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "D",     "D07",   "D07A",  "D07AA",
  "BP40257-1001", "2013-08-15 UTC", "18",   "SOLUMEDROL",     "H",     "H02",   "H02A",  "H02AB",
  "BP40257-1002", "2012-12-15 UTC", "19",   "SPIRONOLACTONE", "C",     "C03",   "C03D",  "C03DA"
)

adcm_aseq <- adcm %>%
  derive_var_obs_number(
    by_vars    = exprs(USUBJID),
    order      = exprs(ASTDTM, CMSEQ, ATC1CD, ATC2CD, ATC3CD, ATC4CD),
    new_var    = ASEQ,
    check_type = "error"
  )

USUBJID	ASTDTM	CMSEQ	CMDECOD	ATC1CD	ATC2CD	ATC3CD	ATC4CD	ASEQ
BP40257-1001	2013-07-05 UTC	14	PARACETAMOL	N	N02	N02B	N02BE	1
BP40257-1001	2013-08-15 UTC	18	SOLUMEDROL	D	D07	D07A	D07AA	2
BP40257-1001	2013-08-15 UTC	18	SOLUMEDROL	D	D10	D10A	D10AA	3
BP40257-1001	2013-08-15 UTC	18	SOLUMEDROL	H	H02	H02A	H02AB	4
BP40257-1002	2012-12-15 UTC	19	SPIRONOLACTONE	C	C03	C03D	C03DA	1

Add Labels and Attributes

Note that attributes may not be preserved in some cases after processing with {admiral}. The recommended approach is to apply variable labels and other metadata as a final step in your data derivation process using packages like:

metacore: establish a common foundation for the use of metadata within an R session.
metatools: enable the use of metacore objects. Metatools can be used to build datasets or enhance columns in existing datasets as well as checking datasets against the metadata.
xportr: functionality to associate all metadata information to a local R data frame, perform data set level validation checks and convert into a transport v5 file(xpt).

NOTE: Together with {admiral} these packages comprise an End to End pipeline under the umbrella of the pharmaverse. An example of applying metadata and perform associated checks can be found at the pharmaverse E2E example.

Creating an OCCDS ADaM

Introduction

Programming Workflow

Read in Data

Derive/Impute End and Start Analysis Date/time and Relative Day

Derive Durations

Derive ATC variables

Derive Planned and Actual Treatment

Derive Date/Date-time of Last Dose

Derive Treatment Dose and Unit

Derive Severity, Causality, and Toxicity Grade

Derive Treatment Emergent Flag

Derive Occurrence Flags

Derive Query Variables

Add the `ADSL` variables

Derive Analysis Sequence Number

Add Labels and Attributes

Example Scripts

ADaM	Sourcing Command
ADAE	`use_ad_template("ADAE")`
ADCM	`use_ad_template("ADCM")`

Creating an OCCDS ADaM

Introduction

Programming Workflow

Read in Data

Derive/Impute End and Start Analysis Date/time and Relative Day

Derive Durations

Derive ATC variables

Derive Planned and Actual Treatment

Derive Date/Date-time of Last Dose

Derive Treatment Dose and Unit

Derive Severity, Causality, and Toxicity Grade

Derive Treatment Emergent Flag

Derive Occurrence Flags

Derive Query Variables

Add the ADSL variables

Derive Analysis Sequence Number

Add Labels and Attributes

Example Scripts

Add the `ADSL` variables