diff --git a/DESCRIPTION b/DESCRIPTION index 661854c6..29692a6b 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -29,4 +29,4 @@ Suggests: RoxygenNote: 7.3.3 VignetteBuilder: knitr Roxygen: list(markdown = TRUE) -Config/Needs/website: pkgdown, comorbidity, icdcomorbid, multimorbidity, dplyr, odbc, DBI, RSQLite +Config/Needs/website: pkgdown, comorbidity, icdcomorbid, multimorbidity, dplyr, odbc, DBI, RSQLite, pccc diff --git a/_pkgdown.yml b/_pkgdown.yml index 0c133793..9580c745 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -16,6 +16,8 @@ navbar: href: articles/elixhauser.html - text: Pediatric Complex Chronic Conditions href: articles/pccc.html + - text: Transition From The pccc Package to The medicalcoder Package + href: articles/transition-pccc-to-medicalcoder.html - text: ------- - text: ICD Codes - text: ICD Utilities diff --git a/vignettes/articles/transition-pccc-to-medicalcoder.Rmd b/vignettes/articles/transition-pccc-to-medicalcoder.Rmd new file mode 100644 index 00000000..caa617ab --- /dev/null +++ b/vignettes/articles/transition-pccc-to-medicalcoder.Rmd @@ -0,0 +1,501 @@ +--- +title: "Transition From The pccc Package to the medicalcoder Package" +output: + rmarkdown::html_vignette: + toc: true + number_sections: false +bibliography: ../references.bib +vignette: > + %\VignetteIndexEntry{Transition From The pccc Package to the medicalcoder Package} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, label = "setup", include = FALSE} +# IMPORTANT SYNTAX NOTE: +# Vignettes cannot use the pipeOp `|>` so the package can depend on R >= 3.5.0 +# The _articles_ (website only) can use the pipeOp since these are not part of the source package. + +library(kableExtra) +options(qwraps2_markup = "markdown") +options(knitr.kable.NA = '') +knitr::opts_chunk$set(collapse = TRUE, fig.align = "center") +``` + +# Introduction + +The R package [pccc](https://cran.r-project.org/package=pccc) +[@feinstein2024pediatric;@dewitt2025pccc] was published to support version 2 of +the Pediatric Complex Chronic Conditions (PCCC) [@feudtner2014pediatric]. This +document is provided to help users of pccc to transition to medicalcoder. + +Major differences between `pccc::ccc()` and `medicalcoder::comorbidities()`: + +1. Input data format + + * `pccc::ccc()` expects a data.frame with each row representing one patient + and/or encounter. There is a column for each diagnosic and procedure code. + For example, a data set were the max number of diagnosic codes is six and + the max number of procedure codes is five, an entry for patient XX could + look like the following: + +``` + patid dx1 dx2 dx3 dx4 dx5 dx6 pr1 pr2 pr3 pr4 pr5 + patXX T8619 E8809 E876 Z7982 NA NA 02PAX3Z 5A1D70Z 04Q90ZZ 0TS60ZZ NA +``` + + + * `medicalcoder::comorbidities()` expects the input data to be in a data.frame + where each row is single ICD code. For example, the same record for patient + XX above would be three columns below, one row for each code, one column to + identify the patient/encounter, and a column to denote if the code is a + diagnosic (dx = 1) or procedure (dx = 0). + +``` + patid code dx + patXX T8619 1 + patXX E8809 1 + patXX E876 1 + patXX Z7982 1 + patXX 02PAX3Z 0 + patXX 5A1D70Z 0 + patXX 04Q90ZZ 0 + patXX 0TS60ZZ 0 +``` + + +2. ICD Version + + * `pccc::ccc()` only considers ICD-9 and ICD-10 independently. If the input + data consists of both ICD-9 and ICD-10 data false negatives will be + inevitable. The version is set by the `icdv` argument to `pccc::ccc()`. + + * `medicalcoder::comorbidities()` considers both ICD-9 and ICD-10 at the same + time. A column added to the input data to identify the code version allows + for a single patient/encounter record to consist of both version and to have + PCCC flagged accordingly. Users specify the ICD version via the arguments + `icdv` and `icdv.var` to `medicalcoder::comorbidities()`. + + +3. PCCC Versions + + * `pccc::ccc()` only impliments PCCC version 2 [@feudtner2014pediatric] + + * `medicalcoder::comorbidities()` impliments: + + * `pccc_v2.0`: consistent results with `pccc::ccc()` for pccc version 1.0.6. + * `pccc_v2.1`: improved mappings of ICD codes to PCCC using the PCCC v2 + scoring algorithm. + * `pccc_v3.0`: consistent with SAS code published with PCCC version 3 + [@feinstein2024pediatric]. + * `pccc_v3.1`: extended set of ICD code to condition mappings. + + * Note: medicalcoder also provides several variants of the Charlson and + Elixhauser comorbidities. + +4. Subconditions + + * `pccc::ccc()` only returns flags for primary conditions + + * `medicalcoder::comorbidities()`: when the argument `subconditons = TRUE` is + passed in, for PCCC the primary conditions and subconditions are flagged. + Examples to follow. + +5. Present-on-Admission and Longitudinal data + + * `pccc::ccc()` only considers singular encounters and considers all codes to + be present-on-admission. + + * `medicalcoder::comorbidities()` can account for present-on-admission flags + and longitudinal flagging of comorbidities within a patient over multiple + encounters. + + +# `pccc::ccc()` vs `medicalcoder::comorbidities()` + +```{r, message=FALSE} +library(pccc) +packageVersion("pccc") +library(medicalcoder) +``` + +## Prepare Data + +We'll use the `mdcr` data set from the medicalcoder package. + +```{r} +head(mdcr) +``` + +We will split the data set into two sets, one for ICD-9 and one for ICD-10. + +Using the tidyverse we can build the needed input data sets + +```{r} +mdcr_tbls <- + mdcr |> + dplyr::group_by(patid, icdv, dx) |> + dplyr::mutate(n = seq_len(dplyr::n())) |> + dplyr::ungroup() |> + dplyr::mutate(dxv = dplyr::if_else(dx == 1, "dx", "pr")) |> + dplyr::group_by(icdv) |> + dplyr::group_split() +mdcr_tbls <- + lapply(mdcr_tbls, + tidyr::pivot_wider, + id_cols = "patid", + names_from = c("dxv", "n"), + names_sep = "", + values_from = "code" + ) +``` + +A data.table approach: + +```{r} +mdcr_DTs <- data.table::as.data.table(data.table::copy(mdcr)) +mdcr_DTs[ + , + dxv := paste0(data.table::fifelse(dx == 1, "dx", "pr"), seq_len(.N)), + by = .(patid, icdv, dx) +] +mdcr_DTs <- split(mdcr_DTs, by = "icdv") +mdcr_DTs <- + lapply( + mdcr_DTs, + data.table::dcast, + formula = patid ~ dxv, + value.var = "code", + na.rm = FALSE + ) +``` + +## Applying `pccc::ccc()` + +To flag PCCC via `pccc::ccc()` we need to call `pccc::ccc()` twice and to then +aggregate the results. + +```{r} +tic <- Sys.time() + +pccc_9_results_tbl <- + pccc::ccc( + data = mdcr_tbls[[1]], + id = patid, + dx_cols = grep("dx", names(mdcr_tbls[[1]]), value = TRUE), + pc_cols = grep("pr", names(mdcr_tbls[[1]]), value = TRUE), + icdv = 9 + ) + +pccc_10_results_tbl <- + pccc::ccc( + data = mdcr_tbls[[2]], + id = patid, + dx_cols = grep("dx", names(mdcr_tbls[[2]]), value = TRUE), + pc_cols = grep("pr", names(mdcr_tbls[[2]]), value = TRUE), + icdv = 10 + ) + +pccc_results_tbl <- + dplyr::bind_rows(pccc_9_results_tbl, pccc_10_results_tbl) |> + dplyr::group_by(patid) |> + dplyr::summarize_all(max) |> + dplyr::ungroup() |> + dplyr::arrange(patid) + +toc <- Sys.time() + +pccc_ccc_tbl_time <- difftime(toc, tic, units = "secs") +``` + +```{r} +tic <- Sys.time() + +pccc_9_results_DT <- + pccc::ccc( + data = mdcr_DTs[[1]], + id = patid, + dx_cols = grep("dx", names(mdcr_DTs[[1]]), value = TRUE), + pc_cols = grep("pr", names(mdcr_DTs[[1]]), value = TRUE), + icdv = 9 + ) + +pccc_10_results_DT <- + pccc::ccc( + data = mdcr_DTs[[2]], + id = patid, + dx_cols = grep("dx", names(mdcr_DTs[[2]]), value = TRUE), + pc_cols = grep("pr", names(mdcr_DTs[[2]]), value = TRUE), + icdv = 10 + ) + +pccc_results_DT <- data.table::rbindlist(list(pccc_9_results_DT, pccc_10_results_DT)) + +pccc_results_DT <- + pccc_results_DT[, lapply(.SD, max), by = .(patid), .SDcols = -"patid"] +data.table::setkey(pccc_results_DT, patid) + +toc <- Sys.time() + +pccc_ccc_dt_time <- difftime(toc, tic) +``` + +A quick sanity check that we have the same results for both the tidyverse and +data.table input data sets. +```{r} +stopifnot( + isTRUE( + all.equal(pccc_results_DT, pccc_results_tbl, check.attributes = FALSE) + ) +) +``` + +## Calling `medicalcoder::comorbidities()` + +```{r} +tic <- Sys.time() + +medicalcoder_results <- + medicalcoder::comorbidities( + data = mdcr, + id.vars = "patid", + icd.codes = "code", + icdv.var = "icdv", + dx.var = "dx", + method = "pccc_v2.0", + poa = 1 + ) + +toc <- Sys.time() +medicalcoder_df_time <- difftime(toc, tic) +``` + +## Differences in results? + +```{r} +old_vs_new <- + merge( + x = pccc_results_DT, + y = medicalcoder_results, + all = TRUE, + by = "patid", + suffixes = c("_old", "_new") + ) +``` + +Most importantly, the condition flag (`ccc_flag` from `pccc::ccc()` and +`cmrb_flag` from `medicalcoder::comorbidities()`) are identical. + +```{r} +stopifnot( + isTRUE( + with(old_vs_new, identical(ccc_flag, cmrb_flag)) + ) +) +``` + +Second, the flags for all but the technology dependence and transplant flags are +identical. + +```{r} +stopifnot( + with(old_vs_new, identical(neuromusc_old, neuromusc_new)), + with(old_vs_new, identical(cvd_old, cvd_new)), + with(old_vs_new, identical(respiratory_old, respiratory_new)), + with(old_vs_new, identical(renal_old, renal_new)), + with(old_vs_new, identical(gi_old, gi_new)), + with(old_vs_new, identical(hemato_immu_old, hemato_immu_new)), + with(old_vs_new, identical(metabolic_old, metabolic_new)), + with(old_vs_new, identical(congeni_genetic_old, congeni_genetic_new)), + with(old_vs_new, identical(malignancy_old, malignancy_new)), + with(old_vs_new, identical(neonatal_old, neonatal_new)) +) +``` + +Omitting the columns which are as expected from the `old_vs_new` data.table we +can focus in on the differences in the results. + +```{r} +good <- c("neuromusc", "cvd", "respiratory", "renal", "gi", "hemato_immu", + "metabolic", "congeni_genetic", "malignancy", "neonatal", "ccc_flag", + "cmrb_flag") + +for(g in good) { + for (j in grep(g, names(old_vs_new), value = TRUE)) { + data.table::set(old_vs_new, j = j, value = NULL) + } +} + +old_vs_new +``` + +First, the `num_cmrb` column is a count of the number of conditions and is +reported by `medicalcoder::comorbidities()`. There is not similar flag from +`pccc::ccc()`. + + +```{r} +old_vs_new[, num_cmrb := NULL] +``` + +The `misc` column is the "miscellaneous" category reported by +`medicalcoder::comorbidities()` and is not reported by `pccc::ccc()`. The +existence of the `misc` column and some differences in the returned results +between `pccc::ccc()` version 1.0.6, and `medicalcoder::comorbidities()` is due +to how medicalcoder is implemented. + +```{r} +old_vs_new +``` + +There are several ICD codes which need to be corrected in pccc + +GitHub links: + +* [ICD-9 349.1](https://github.com/CUD2V/pccc/issues/45) +* [ICD-9 V56](https://github.com/CUD2V/pccc/issues/46) +* [ICD-10 Z49](https://github.com/CUD2V/pccc/issues/47) +* [ICD-9 86.06](https://github.com/CUD2V/pccc/issues/48) +* [ICD-9 V45.85](https://github.com/CUD2V/pccc/issues/49) +* [ICD-9 V53.3](https://github.com/CUD2V/pccc/issues/50) +* [ICD-9 V53.91](https://github.com/CUD2V/pccc/issues/51) +* [ICD-9 V65.46](https://github.com/CUD2V/pccc/issues/52) +* [ICD-9 37.52](https://github.com/CUD2V/pccc/issues/53) +* [ICD-9 V42.0](https://github.com/CUD2V/pccc/issues/54) +* [ICD-10 Z94](https://github.com/CUD2V/pccc/issues/55) + + +# Additional Benfits of medicalcoder + +## Computation Performance + +medicalcoder was built such that only base R is needed to install and use the +package. That said, there is specific support for the tidyverse and data.table. +For example, the same calls as above but with either a tibble or a data.table +instead of a simple base R data.table take less time to compute. The +differences here are small. See +[benchmarking](https://github.com/dewittpe/medicalcoder/tree/main/benchmarking) +for more details. + +```{r} +mdcr_tbl <- tibble::as_tibble(mdcr) +tic <- Sys.time() +medicalcoder_results <- + medicalcoder::comorbidities( + data = mdcr_tbl, + id.vars = "patid", + icd.codes = "code", + icdv.var = "icdv", + dx.var = "dx", + method = "pccc_v2.0", + poa = 1 + ) +toc <- Sys.time() +medicalcoder_tbl_time <- difftime(toc, tic, units = "secs") + +mdcr_DT <- data.table::as.data.table(data.table::copy(mdcr)) +tic <- Sys.time() +medicalcoder_results <- + medicalcoder::comorbidities( + data = mdcr_tbl, + id.vars = "patid", + icd.codes = "code", + icdv.var = "icdv", + dx.var = "dx", + method = "pccc_v2.0", + poa = 1 + ) +toc <- Sys.time() +medicalcoder_dt_time <- difftime(toc, tic, units = "secs") +``` + +```{r} +pccc_ccc_tbl_time +pccc_ccc_dt_time +medicalcoder_df_time +medicalcoder_tbl_time +medicalcoder_dt_time +``` + +## Summary of results + +A simple call to `summary()` will return a data.frame with counts and +percentages for the + +```{r} +summary(medicalcoder_results) +``` + +## Subconditions + +```{r, include = FALSE} +cvd_subconditions <- sort(unique(subset(get_pccc_codes(), condition == "cvd")$subcondition)) +cvd_subconditions <- gsub("_", " ", cvd_subconditions) +cvd_subconditions <- paste0(seq_len(length(cvd_subconditions)), ". ", cvd_subconditions) +``` + +In the documentation for both PCCC v2 and v3 there are subconditions. For +example, there are `r length(cvd_subconditions)` subconditions under +cardiovascular disease: + +```{r, echo = FALSE, results = 'asis'} +cat(cvd_subconditions, sep = "\n") +``` + +Calling `medicalcoder::comorbidities()` with `subconditions = TRUE` when working +with PCCC will flag these conditions as well as the primary conditions. + +```{r} +with_subconditions <- + medicalcoder::comorbidities( + data = mdcr, + id.vars = "patid", + icd.codes = "code", + icdv.var = "icdv", + dx.var = "dx", + method = "pccc_v2.0", + poa = 1, + subconditions = TRUE + ) + +with_subconditions +``` + +The summary includes counts and percentages as before. Additionally, for a +subconditon, the percentage is reported as percent of the cohort and as the +percent of those with the primary condition. + +```{r} +str(summary(with_subconditions)) +``` + +Using tools such as kableExtra, these summaries can be formatted into +publication ready tables. For example, say we want to report on the +cardiovascular and metabolic conditions and subconditions. + +```{r} +cvd_and_metabolic <- subset(summary(with_subconditions), condition %in% c("cvd", "metabolic")) +cvd_and_metabolic$subcondition[is.na(cvd_and_metabolic$subcondition)] <- "Any subcondition" + +kableExtra::kbl( + x = cvd_and_metabolic[, c("subcondition", "count", "percent_of_cohort", "percent_of_those_with_condition")], + caption = "Patients with cardiovascular and/or metabolic conditions and the associated with_subconditions.", + row.names = FALSE, + digits = 2, + col.names = c("Subcondition", "Patients", "% of chort", "% of those with the primary condition") +) |> +kableExtra::kable_styling(bootstrap_options = "striped") |> +kableExtra::pack_rows(index = table(cvd_and_metabolic$condition)) +``` + +# PCCC version 3 +For more detail on the differences between PCCC v2 [@feudtner2014pediatric] and +PCCC v3 [@feudtner2014pediatric] see the +[PCCC article](https://www.peteredewitt.com/medicalcoder/articles/pccc.html#pccc-version-2-0-vs-pccc-version-3-0). + + +# References + + + +