Generally, the function call to medExtractR
is
note <- paste(scan(filename, '', sep = '\n', quiet = TRUE), collapse = '\n')
medExtractR(note, drug_names, unit, window_length, max_dist, ...)
where ...
refers to additional arguments to
medExtractR
. One of the key additional arguments is
drug_list
.
drug_list
, a list of other drug names (besides the drug
names of interest). This list is used to shorten the search window in
which medExtractR
looks for dosing entities by truncating
at the nearest mentions of a competing drug name. By default, this calls
rxnorm_druglist
, a partially cleaned and processed list of
brand name and generic drug names in the RxNorm database.1 This list could also incorporate
other competing information besides drug names, such as drug
abbreviations, symptoms, procedures, or names of laboratory
measurements.The default rxnorm_druglist
contains far more drug names
than likely needed. This results in slow run times for both
medExtractR
and medExtractR_tapering
. This
vignette will demonstrate how to create your own drug_list
for improved performance.
library(medExtractR)
# note file names
fn <- c(
system.file("examples", "tacpid1_2008-06-26_note1_1.txt", package = "medExtractR"),
system.file("examples", "tacpid1_2008-06-26_note2_1.txt", package = "medExtractR"),
system.file("examples", "tacpid1_2008-12-16_note3_1.txt", package = "medExtractR"),
system.file("examples", "lampid1_2016-02-05_note4_1.txt", package = "medExtractR"),
system.file("examples", "lampid1_2016-02-05_note5_1.txt", package = "medExtractR"),
system.file("examples", "lampid2_2008-07-20_note6_1.txt", package = "medExtractR"),
system.file("examples", "lampid2_2012-04-15_note7_1.txt", package = "medExtractR")
)
getNote <- function(x) paste(scan(x, '', sep = '\n', quiet = TRUE), collapse = '\n')
notes <- vapply(fn, getNote, character(1))
Here’s an example run with the last note (note 7). We’re using the default argument for drug_list, the full RxNorm data.
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = "rxnorm")
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180
Let’s take a look at this note. We want to extract entities associated with the drugnames highlighted in blue (i.e., “lamotrigine”, “lamictal”). Note that there are several drug names (yellow highlighted) which should be recognized by medExtractR in order not to extract irrelevant entities not associated with the drug of our interest.
To let medExtractR recognize drugs that are not of our interest, we
need to provide a list of drugs. Unless specified otherwise, we use the
list of drugs in the RxNorm database (druglist = "rxnorm"
).
You can examine this druglist by loading
rxnorm_druglist
.
## [1] 59320
## [1] "A & D" "A & L Laboratories 10 Mix 10"
## [3] "A & L Laboratories Protect" "A Thru Z Hi Potency Caplets"
## [5] "A Thru Z Select Plus Lutein" "A+D Diaper Rash"
We can pass the full druglist directly to medExtractR
.
Note that the result will be equal to the previous example.
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = rxnorm_druglist)
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180
We can even set drug_list
to be empty (with NULL),
though this would lead to many false positives.
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = NULL)
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180
## 7 Strength 1 mg 191:195
## 8 DoseAmt 1 203:204
## 9 DoseAmt 1 225:226
## 10 DoseAmt 1 246:247
## 11 DoseAmt 1 271:272
In this case, adding the drug “lorazepam” will correct our output.
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = 'lorazepam')
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180
Before running medExtractR
we can search for drugname
values present in our notes. If we restrict our drug_list
to only these values, the medExtractR
function will run
much faster. To do this, we can use the string_occurs
function. The first argument is a vector of character strings to find
(i.e., the full drug list). The second argument is a vector of text to
search (i.e., all of our notes). This function also has an argument for
ignoring case (ignore.case
) as well as the number of cores
available for parallel processing (nClust
, which requires
the parallel
package).
## socket cluster with 2 nodes on host 'localhost'
## [1] "TRUE" "FALSE"
## TRUE FALSE
## 61 59259
## [1] "Acetaminophen" "Alert" "Amitriptyline" "AMYLASE"
## [5] "Ativan" "Avapro" "Bactrim" "Catapres"
## [9] "Cellcept" "Clonidine" "Cyclobenzaprine" "Elavil"
## [13] "ENOXAPARIN" "ENSURE" "Flexeril" "Furosemide"
## [17] "Hydrocodone" "INFORMATION" "Keppra" "Lamictal"
## [21] "Lamotrigine" "Lasix" "Levetiracetam" "Lipase"
## [25] "Lipitor" "Loratadine" "Lorazepam" "Lovenox"
## [29] "Lyrica" "Myfortic" "Nifedipine" "Omeprazole"
## [33] "Os-Cal" "Penicillin" "Prednisone" "Pregabalin"
## [37] "Prevacid" "Prilosec" "Procrit" "Prograf"
## [41] "Seizure" "Seizures" "Simvastatin" "Tacrolimus"
## [45] "Tobacco" "Topamax" "Topiramate" "Valacyclovir"
## [49] "Valcyte" "Valtrex" "VITAL" "Vitamin C"
## [53] "wheelchair" "Zithromax" "Zithromax Z-Pak" "Zocor"
## [57] "FK" "cellcept" "myfortic" "mvi"
## [61] "LTG"
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = fnd_drugs)
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180
Additionally, we may want to search for potential drugname
misspellings in our data. If we find any, we can add these to our
drug_list
. We can look for misspellings with the
string_suggestions
function. Its output should be manually
reviewed as many of its suggestions should be discarded.
## suggestion match
## [1,] "os cal" "os-cal"
## [2,] "porgraf" "prograf"
In this case, it finds two values we should include.
all_drugs <- c(fnd_drugs, sug_drugs[,'suggestion'])
medExtractR(note = notes[7], drug_names = c("lamotrigine", "lamictal"),
window_length = 130, unit = "mg", drug_list = all_drugs)
## entity expr pos
## 1 DrugName lamotrigine 103:114
## 2 Strength 150 mg 115:121
## 3 DrugName Lamictal 141:149
## 4 DoseAmt 1 151:152
## 5 Route by mouth 160:168
## 6 Frequency twice a day 169:180