multimorbidity
PackageThe multimorbidity package is a simple and transparent one-stop-shop for those working with claims or other administrative health care data who wish to obtain comorbidity, frailty, and/or multimorbidity measures. The goal of the package is to first clean and organize the data in a way that can then be easily used for various algorithms in a uniform and standard format.
We’ve created two sample datasets.
This features 5 hypothetical patients and hypothetical claims across ICD-9 and ICD-10.
<- i9_i10_comb claims
head(claims, 10)
#> patient_id sex date_of_serv visit_type dx1 dx2 dx3 dx4 dx5 hcpcs
#> 1 1001 male 2012-02-14 ip 2768 4019 3310 29620 2630 E2201
#> 2 1001 male 2013-05-15 ip 486 2768 99591 4019 3310 E2201
#> 3 1001 male 2013-01-10 ot 40290 29620 4019 <NA> <NA> E2201
#> 4 1001 male 2013-04-02 ot 3310 57149 4019 <NA> <NA> E2201
#> 5 1001 male 2013-05-06 ot 2449 4019 486 <NA> <NA> E2201
#> 6 1001 male 2013-06-04 ot 486 4019 29620 <NA> <NA> E2201
#> 7 1001 male 2013-10-01 ot 24920 3310 4019 <NA> <NA> E2201
#> 8 1001 male 2013-11-05 ot 430 4019 29620 7930 <NA> E2201
#> 9 1001 male 2014-02-01 ot 7241 3310 4019 430 <NA> E2201
#> 10 1001 male 2014-03-15 ot 24920 4011 486 29620 39891 E2201
#> icd_version
#> 1 9
#> 2 9
#> 3 9
#> 4 9
#> 5 9
#> 6 9
#> 7 9
#> 8 9
#> 9 9
#> 10 9
This is our one-row-per-patient dataset which is only needed if we intend to use the function to limit our time window (comorbidity_window
).
#> patient_id date_of_interest10 date_of_interest9
#> 1 1001 2021-06-04 2013-06-04
#> 2 1002 2021-03-11 2013-03-11
#> 3 1003 2021-08-02 2013-08-02
#> 4 1004 2021-01-20 2013-01-20
#> 5 1005 2021-02-14 2013-02-14
The first step is to “prepare” our data for the subsequent algorithms. The end-goal is to have a dataset that has 1 column with a patient ID, 1 column which contains the diagnosis code, and 1 column which will note if it’s ICD-9 (9) or ICD-10 (10). There are other variables that may be of interest depending on the specification including type (inpatient or outpatient) and date.
The arguments used here are (in order): telling it the name of our data, specifying the ID variable, noting if it’s wide or long (long would be if the data is in our final format), the prefix for the diagnosis columns (dx1, dx2, dx3 would be “dx”), noting if our data include a HCPCS/CPT column, specifying the variable which notes if it’s ICD-9 or ICD-10, specifying the variable which tells us the type of visit (inpatient or outpatient), and finally specifying which column is the date.
<- prepare_data(dat = claims,
prepared_data id = patient_id,
style = "wide",
prefix_dx = "dx",
hcpcs = "yes",
prefix_hcpcs = "hcpcs",
version_var = icd_version,
type_name = visit_type,
date = date_of_serv)
#> # A tibble: 10 × 5
#> patient_id claim_date dx version type
#> <fct> <date> <chr> <dbl> <fct>
#> 1 1001 2012-02-14 2768 9 ip
#> 2 1001 2012-02-14 4019 9 ip
#> 3 1001 2012-02-14 3310 9 ip
#> 4 1001 2012-02-14 29620 9 ip
#> 5 1001 2012-02-14 2630 9 ip
#> 6 1001 2013-05-15 486 9 ip
#> 7 1001 2013-05-15 2768 9 ip
#> 8 1001 2013-05-15 99591 9 ip
#> 9 1001 2013-05-15 4019 9 ip
#> 10 1001 2013-05-15 3310 9 ip
Oftentimes, we may be interested in limiting our claims to a specific window, such as the 1-year before diagnosis. To accommodate this, this package includes a function which will merge datasets and limit the claims to that window.
In the example below we do the following: tell it the name of our ID dataset, the name of our claims data, specify our mutual ID variable, specify the variable name in the ID dataset which is our “date of interest”, specify the variable in the claims data which is our date of the claim, and specify the time window (in this example, pre only) we are interested in. There is a complementary argument for post (time_post), which is set to infinity as the default. In this example we are only taking the claims that occur within the 60 days before our date of interest as well as all claims after our date of interest. A common extension on this would be if we were interested in only those claims that occurred before diagnosis. In this case we could ignore the time_pre argument and set time_post = 0.
Note: in this example we ignore date_of_interest10 but this could be used instead as we include both ICD-9 and ICD-10 claims and dates.
<- comorbidity_window(id_dat = id, dat = prepared_data, id = patient_id,
limit_data id_date = date_of_interest9, claims_date = claim_date,
time_pre = 60)
#> # A tibble: 10 × 7
#> patient_id claim_date dx version type date_of_interest10 date_of_interes…
#> <fct> <date> <chr> <dbl> <fct> <date> <date>
#> 1 1001 2013-05-15 486 9 ip 2021-06-04 2013-06-04
#> 2 1001 2013-05-15 2768 9 ip 2021-06-04 2013-06-04
#> 3 1001 2013-05-15 99591 9 ip 2021-06-04 2013-06-04
#> 4 1001 2013-05-15 4019 9 ip 2021-06-04 2013-06-04
#> 5 1001 2013-05-15 3310 9 ip 2021-06-04 2013-06-04
#> 6 1001 2013-05-06 2449 9 ot 2021-06-04 2013-06-04
#> 7 1001 2013-05-06 4019 9 ot 2021-06-04 2013-06-04
#> 8 1001 2013-05-06 486 9 ot 2021-06-04 2013-06-04
#> 9 1001 2013-06-04 486 9 ot 2021-06-04 2013-06-04
#> 10 1001 2013-06-04 4019 9 ot 2021-06-04 2013-06-04
The real advantage of this package is now that we have our data in a standard format, we are able to apply a multitude of comorbidity indices to these following a near-identical format. More information about these indices can be found in the package documentation, and the code below just demonstrates how to execute them.
The arguments are similar and include: the dataset name, the variable of our patient ID, the variable of our diagnoses, the version (9 = ICD-9 only, 10 = ICD-10 only, and 19 = both), the variable which specifies the version of that diagnosis code (9 or 10), and whether or not we want to require there to be two outpatient visits for an individual to be positively coded with a comorbidity. While not frequently used, this adaptation may limit rule-out diagnoses and the package was built with this flexibility in mind.
<- elixhauser(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
elix_data #> Message: Specifying that your data uses both ICD-9 and ICD-10 will result in only the Elixhauser comorbidities
#> which are compatible with ICD-9, as the changes and additions which are seen in
#> ICD-10 have, to date, not been back-mapped to ICD-9.
#> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type are either 'ot' or 'OT'
#> # A tibble: 5 × 34
#> id chf valve pulmcirc perivasc elix_htn_uc elix_htn_c para neuro
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1001 0 0 0 0 1 0 0 1
#> 2 1002 0 0 0 0 0 0 0 0
#> 3 1003 0 0 0 0 1 0 0 1
#> 4 1004 0 0 0 0 0 0 0 0
#> 5 1005 0 0 0 0 0 0 1 0
#> # … with 25 more variables: chrnlung <dbl>, dm <dbl>, dmcx <dbl>,
#> # hypothy <dbl>, renlfail <dbl>, liver <dbl>, ulcer <dbl>, aids <dbl>,
#> # lymph <dbl>, mets <dbl>, tumor <dbl>, arth <dbl>, coag <dbl>, obese <dbl>,
#> # wghtloss <dbl>, lytes <dbl>, bldloss <dbl>, anemdef <dbl>, alcohol <dbl>,
#> # drug <dbl>, psych <dbl>, depress <dbl>, htn_c <dbl>, elix_death <dbl>,
#> # elix_readmit <dbl>
<- charlson(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
charlson_data #> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type are either 'ot' or 'OT'
#> # A tibble: 5 × 19
#> id charlson_myocar charlson_chf charlson_periph_vasc charlson_cerebro
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1001 0 0 0 1
#> 2 1002 0 0 0 1
#> 3 1003 0 0 0 0
#> 4 1004 0 0 0 0
#> 5 1005 0 0 0 0
#> # … with 14 more variables: charlson_dementia <dbl>,
#> # charlson_chronic_pulm <dbl>, charlson_rheum <dbl>,
#> # charlson_peptic_ulcer <dbl>, charlson_mild_liv <dbl>,
#> # charlson_diab_uc <dbl>, charlson_diab_c <dbl>, charlson_hemi_para <dbl>,
#> # charlson_renal <dbl>, charlson_malig <dbl>, charlson_mod_sev_liv <dbl>,
#> # charlson_met_solid <dbl>, charlson_hiv <dbl>, charlson_score <dbl>
<- cfi(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version) cfi_data
#> # A tibble: 5 × 2
#> id frailty_index
#> <fct> <dbl>
#> 1 1001 0.365
#> 2 1002 0.279
#> 3 1003 0.313
#> 4 1004 0.272
#> 5 1005 0.337
<- mwi(dat = limit_data, id = patient_id, dx = dx, version = 9, version_var = version) mwi_data
#> # A tibble: 5 × 2
#> id mwi
#> <fct> <dbl>
#> 1 1001 21.0
#> 2 1002 3.91
#> 3 1003 3.51
#> 4 1004 3.55
#> 5 1005 0.614
<- nicholsonfortin(dat = limit_data, id = patient_id, dx = dx, version = 19, version_var = version, outpatient_two = "yes")
nf_data #> Message: You have specified that for a comorbidity to be positvely coded, an individual must have two outpatient claims with it. Please make sure the levels of your variable denoting outpatient type must be either 'ot' or 'OT'
#> # A tibble: 5 × 21
#> id htn obesity diabetes clrd hyperlipid cancer cvd heartfail
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1001 1 0 0 1 0 0 0 0
#> 2 1002 0 0 1 0 0 0 0 0
#> 3 1003 1 0 1 0 0 0 0 0
#> 4 1004 0 0 0 0 0 1 0 0
#> 5 1005 0 0 0 0 0 1 1 0
#> # … with 12 more variables: anxietydepress <dbl>, arthritis <dbl>,
#> # stroketia <dbl>, thyroid <dbl>, ckd <dbl>, osteo <dbl>, dementia <dbl>,
#> # musculo <dbl>, stomach <dbl>, colon <dbl>, liver <dbl>, urinary <dbl>