Title: | Electronic Health Record (EHR) Data Processing and Analysis Tool |
---|---|
Description: | Process and analyze electronic health record (EHR) data. The 'EHR' package provides modules to perform diverse medication-related studies using data from EHR databases. Especially, the package includes modules to perform pharmacokinetic/pharmacodynamic (PK/PD) analyses using EHRs, as outlined in Choi, Beck, McNeer, Weeks, Williams, James, Niu, Abou-Khalil, Birdwell, Roden, Stein, Bejan, Denny, and Van Driest (2020) <doi:10.1002/cpt.1787>. Additional modules will be added in future. In addition, this package provides various functions useful to perform Phenome Wide Association Study (PheWAS) to explore associations between drug exposure and phenotypes obtained from EHR data, as outlined in Choi, Carroll, Beck, Mosley, Roden, Denny, and Van Driest (2018) <doi:10.1093/bioinformatics/bty306>. |
Authors: | Leena Choi [aut, cre] , Cole Beck [aut] , Hannah Weeks [aut] , Elizabeth McNeer [aut], Nathan James [aut] , Michael Williams [aut] |
Maintainer: | Leena Choi <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4-11 |
Built: | 2024-12-12 07:11:37 UTC |
Source: | CRAN |
The 'EHR' package provides modules to perform diverse medication-related studies using data from EHR databases.
Package functionality:
Process and analyze Electronic Health Record (EHR) data.
Implement modules to perform diverse medication-related studies using data from EHR databases. Especially, the package includes modules to perform pharmacokinetic/pharmacodynamic (PK/PD) analyses using EHRs, as outlined in Choi et al. (2020).
Implement three statistical methods for Phenome Wide Association Study
(PheWAS). Contingency tables for many binary outcomes (e.g., phenotypes) and
a binary covariate (e.g., drug exposure) can be efficiently generated by
zeroOneTable
, and three commonly used statistical methods to
analyze data for PheWAS are implement by analysisPheWAS
.
Maintainer: Leena Choi [email protected] (ORCID)
Authors:
Cole Beck [email protected] (ORCID)
Hannah Weeks [email protected] (ORCID)
Elizabeth McNeer [email protected]
Nathan James [email protected] (ORCID)
Michael Williams [email protected]
Development of a system for postmarketing population pharmacokinetic
and pharmacodynamic studies using real-world data from electronic health
records.
Choi L, Beck C, McNeer E, Weeks HL, Williams ML, James NT, Niu X,
Abou-Khalil BW, Birdwell KA, Roden DM, Stein CM, Bejan CA, Denny JC,
Van Driest SL.
Clin Pharmacol Ther. 2020 Apr;107(4):934-943. doi: 10.1002/cpt.1787.
Evaluating statistical approaches to leverage large clinical datasets
for uncovering therapeutic and adverse medication effects.
Choi L, Carroll RJ, Beck C, Mosley JD, Roden DM, Denny JC, Van Driest SL.
Bioinformatics. 2018 Sep 1;34(17):2988-2996. doi: 10.1093/bioinformatics/bty306.
Useful links:
Add lastdose data to data set from the buildDose
process.
addLastDose(buildData, lastdoseData)
addLastDose(buildData, lastdoseData)
buildData |
data.frame, output of |
lastdoseData |
data.frame with columns filename, ld_start, lastdose, raw_time, time_type |
Lastdose is a datetime string associated with dose data. Information on time
of last dose can be extracted within the extractMed
function
(i.e., medExtractR
) using the argument
lastdose=TRUE
. Raw extracted times should first be processed using the
processLastDose
function to convert to datetime format before providing
to addLastDose
. This function then combines the processed last dose times with output
from the buildDose
process by file name to pair last dose times with dosing regimens based on position.
Alternatively, the user can provide their own table of lastdose data. In this case, with position
information absent, the lastdose data should be restricted to one unique last dose time per
unique patient ID-date identifier.
In the case where lastdoseData
is output from processLastDose
, it is possible to
have more than one extracted last dose time. In this case, rules are applied to determine which
time should be kept. First, we give preference to an explicit time expression (e.g., "10:30pm")
over a duration expression (e.g., "14 hour level"). Then, we pair last dose times with drug regimens
based on minimum distance between last dose time start position and drug name start position.
See EHR Vignette for Extract-Med and Pro-Med-NLP for details.
a data.frame with the ‘lastdose’ column added.
# Get build data data(tac_mxr_parsed) # don't combine lastdose at this stage tac_build <- buildDose(tac_mxr_parsed, preserve = 'lastdose') # Get processed last dose data tac_mxr <- read.csv(system.file("examples", "tac_mxr.csv", package = "EHR")) data(tac_metadata) data(tac_lab) ld_data <- processLastDose(tac_mxr, tac_metadata, tac_lab) addLastDose(tac_build, ld_data)
# Get build data data(tac_mxr_parsed) # don't combine lastdose at this stage tac_build <- buildDose(tac_mxr_parsed, preserve = 'lastdose') # Get processed last dose data tac_mxr <- read.csv(system.file("examples", "tac_mxr.csv", package = "EHR")) data(tac_metadata) data(tac_lab) ld_data <- processLastDose(tac_mxr, tac_metadata, tac_lab) addLastDose(tac_build, ld_data)
Implement three commonly used statistical methods to analyze data for Phenome Wide Association Study (PheWAS)
analysisPheWAS( method = c("firth", "glm", "lr"), adjust = c("PS", "demo", "PS.demo", "none"), Exposure, PS, demographics, phenotypes, data )
analysisPheWAS( method = c("firth", "glm", "lr"), adjust = c("PS", "demo", "PS.demo", "none"), Exposure, PS, demographics, phenotypes, data )
method |
define the statistical analysis method from 'firth', 'glm', and 'lr'. 'firth': Firth's penalized-likelihood logistic regression; 'glm': logistic regression with Wald test, 'lr': logistic regression with likelihood ratio test. |
adjust |
define the adjustment method from 'PS','demo','PS.demo', and 'none'. 'PS': adjustment of PS only; 'demo': adjustment of demographics only; 'PS.demo': adjustment of PS and demographics; 'none': no adjustment. |
Exposure |
define the variable name of exposure variable. |
PS |
define the variable name of propensity score. |
demographics |
define the list of demographic variables. |
phenotypes |
define the list of phenotypes that need to be analyzed. |
data |
define the data. |
Implements three commonly used statistical methods to analyze the associations between exposure (e.g., drug exposure, genotypes) and various phenotypes in PheWAS. Firth's penalized-likelihood logistic regression is the default method to avoid the problem of separation in logistic regression, which is often a problem when analyzing sparse binary outcomes and exposure. Logistic regression with likelihood ratio test and conventional logistic regression with Wald test can be also performed.
estimate |
the estimate of log odds ratio. |
stdError |
the standard error. |
statistic |
the test statistic. |
pvalue |
the p-value. |
Leena Choi [email protected] and Cole Beck [email protected]
## use small datasets to run this example data(dataPheWASsmall) ## make dd.base with subset of covariates from baseline data (dd.baseline.small) ## or select covariates with upper code as shown below upper.code.list <- unique(sub("[.][^.]*(.).*", "", colnames(dd.baseline.small)) ) upper.code.list <- intersect(upper.code.list, colnames(dd.baseline.small)) dd.base <- dd.baseline.small[, upper.code.list] ## perform regularized logistic regression to obtain propensity score (PS) ## to adjust for potential confounders at baseline phenos <- setdiff(colnames(dd.base), c('id', 'exposure')) data.x <- as.matrix(dd.base[, phenos]) glmnet.fit <- glmnet::cv.glmnet(x=data.x, y=dd.base[,'exposure'], family="binomial", standardize=TRUE, alpha=0.1) dd.base$PS <- c(predict(glmnet.fit, data.x, s='lambda.min')) data.ps <- dd.base[,c('id', 'PS')] dd.all.ps <- merge(data.ps, dd.small, by='id') demographics <- c('age', 'race', 'gender') phenotypeList <- setdiff(colnames(dd.small), c('id','exposure','age','race','gender')) ## run with a subset of phenotypeList to get quicker results phenotypeList.sub <- sample(phenotypeList, 5) results.sub <- analysisPheWAS(method='firth', adjust='PS', Exposure='exposure', PS='PS', demographics=demographics, phenotypes=phenotypeList.sub, data=dd.all.ps) ## run with the full list of phenotype outcomes (i.e., phenotypeList) results <- analysisPheWAS(method='firth', adjust='PS',Exposure='exposure', PS='PS', demographics=demographics, phenotypes=phenotypeList, data=dd.all.ps)
## use small datasets to run this example data(dataPheWASsmall) ## make dd.base with subset of covariates from baseline data (dd.baseline.small) ## or select covariates with upper code as shown below upper.code.list <- unique(sub("[.][^.]*(.).*", "", colnames(dd.baseline.small)) ) upper.code.list <- intersect(upper.code.list, colnames(dd.baseline.small)) dd.base <- dd.baseline.small[, upper.code.list] ## perform regularized logistic regression to obtain propensity score (PS) ## to adjust for potential confounders at baseline phenos <- setdiff(colnames(dd.base), c('id', 'exposure')) data.x <- as.matrix(dd.base[, phenos]) glmnet.fit <- glmnet::cv.glmnet(x=data.x, y=dd.base[,'exposure'], family="binomial", standardize=TRUE, alpha=0.1) dd.base$PS <- c(predict(glmnet.fit, data.x, s='lambda.min')) data.ps <- dd.base[,c('id', 'PS')] dd.all.ps <- merge(data.ps, dd.small, by='id') demographics <- c('age', 'race', 'gender') phenotypeList <- setdiff(colnames(dd.small), c('id','exposure','age','race','gender')) ## run with a subset of phenotypeList to get quicker results phenotypeList.sub <- sample(phenotypeList, 5) results.sub <- analysisPheWAS(method='firth', adjust='PS', Exposure='exposure', PS='PS', demographics=demographics, phenotypes=phenotypeList.sub, data=dd.all.ps) ## run with the full list of phenotype outcomes (i.e., phenotypeList) results <- analysisPheWAS(method='firth', adjust='PS',Exposure='exposure', PS='PS', demographics=demographics, phenotypes=phenotypeList, data=dd.all.ps)
Output from parse process is taken and converted into a wide format, grouping drug entity information together based on various steps and rules.
buildDose( dat, dn = NULL, preserve = NULL, dist_method, na_penalty, neg_penalty, greedy_threshold, checkForRare = FALSE )
buildDose( dat, dn = NULL, preserve = NULL, dist_method, na_penalty, neg_penalty, greedy_threshold, checkForRare = FALSE )
dat |
data.table object from the output of |
dn |
Regular expression specifying drug name(s) of interest. |
preserve |
Column names to include in output, whose values should not be combined with other rows. If present, dosechange is always preserved. |
dist_method |
Distance method to use for calculating distance of various paths. Alternatively set the ‘ehr.dist_method’ option, which defaults to ‘minEntEnd’. |
na_penalty |
Penalty for matching extracted entities with NA. Alternatively set the |
neg_penalty |
Penalty for negative distances between frequency/intake time and dose amounts. Alternatively set the ‘ehr.neg_penalty’ option, which defaults to 0.5. |
greedy_threshold |
Threshold to use greedy matching; increasing this value too high could lead to the
algorithm taking a long time to finish. Alternatively set the |
checkForRare |
Indicate if rare values for each entity should be found and displayed. |
The buildDose
function takes as its main input (dat
), a data.table object that
is the output of a parse process function (parseMedExtractR
, parseMedXN
,
parseMedEx
, or parseCLAMP
). Broadly, the parsed extractions are grouped
together to form wide, more complete drug regimen information. This reformatting facilitates
calculation of dose given intake and daily dose in the collapseDose
process.
The process of creating this output is broken down into multiple steps:
Removing rows for any drugs not of interest. Drugs of interest are specified with the dn
argument.
Determining whether extractions are "simple" (only one drug mention and at most one extraction per entity) or complex. Complex cases can be more straightforward if they contain at most one extraction per entity, or require a pairing algorithm to determine the best pairing if there are multiple extractions for one or more entities.
Drug entities are anchored by drug name mention within the parse process. For complex cases, drug entities are further grouped together anchored at each strength (and dose with medExtractR) extraction.
For strength groups with multiple extractions for at least one entity, these groups go through a path searching algorithm, which computes the cost for each path (based on a chosen distance method) and chooses the path with the lowest cost.
The chosen paths for each strength group are returned as the final pairings. If route is unique within a strength group, it is standardized and added to all entries for that strength group.
The user can specify additional arguments including:
dist_method
: The distance method is the metric used to determine which entity path is the most likely to
be correct based on minimum cost.
na_penalty
: NA penalties are incurred when extractions are paired with nothing (i.e., an NA), requiring that
entities be sufficiently far apart from one another before being left unpaired.
neg_penalty
: When working with dose amount (DA) and frequency/intake time (FIT), it is much more common
for the ordering to be DA followed by FIT. Thus, when we observe FIT followed by DA, we apply a negative penalty to make such pairings
less likely.
greedy threshold
: When there are many extractions from a clinical note, the number of possible combinations for paths
can get exponentially large, particularly when the medication extraction natural language processing system is incorrect. The greedy
threshold puts an upper bound on the number of entity pairings to prevent the function from stalling in such cases.
If none of the optional arguments are specified, then the buildDose
process uses the default option values specified in the EHR
package documentation.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details. For additional details, see McNeer, et al. 2020.
A data.frame object that contains columns for filename (of the clinical note, inherited from the
parse output object dat
), drugname, strength, dose, route, freq, duration, and drugname_start.
data(lam_mxr_parsed) buildDose(lam_mxr_parsed)
data(lam_mxr_parsed) buildDose(lam_mxr_parsed)
Splits drug data and calls makeDose
to collapse at the note and date level.
collapseDose(x, noteMetaData, naFreq = "most", ...)
collapseDose(x, noteMetaData, naFreq = "most", ...)
x |
data.frame containing the output of |
noteMetaData |
data.frame containing identifying meta data for each note, including patient ID, date of the note, and note ID. Column names should be set to ‘filename’, ‘pid’, ‘date’, ‘note’. Date should have format YYYY-MM-DD. |
naFreq |
Expression used to replace missing frequencies with, or by default use the most common. |
... |
drug formulations to split by |
If different formulations of the drug (e.g., extended release) exist, they can be
separated using a regular expression (e.g., ‘xr|er’). This function will call
makeDose
on parsed and paired medication data to calculate dose intake
and daily dose and remove redundancies at the note and date level.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
A list containing two dataframes, one with the note level and one with the date level collapsed data.
data(lam_mxr_parsed) data(lam_metadata) lam_build_out <- buildDose(lam_mxr_parsed) lam_collapsed <- collapseDose(lam_build_out, lam_metadata, naFreq = 'most', 'xr|er') lam_collapsed$note # Note level collapsing lam_collapsed$date # Date level collapsing
data(lam_mxr_parsed) data(lam_metadata) lam_build_out <- buildDose(lam_mxr_parsed) lam_collapsed <- collapseDose(lam_build_out, lam_metadata, naFreq = 'most', 'xr|er') lam_collapsed$note # Note level collapsing lam_collapsed$date # Date level collapsing
Convenience function for making small modifications to a data.frame.
dataTransformation(x, select, rename, modify)
dataTransformation(x, select, rename, modify)
x |
a data.frame |
select |
columns to select |
rename |
character vector with names for all columns |
modify |
list of expressions used to transform data set |
The modified data.frame
Simulated outcome data example from Phenome Wide Association Study (PheWAS) that examines associations between drug exposure and various phenotypes at follow-up after the drug exposure. The dataset includes 1505 variables: subject identification number ('id'), drug exposure ('exposure'), 3 demographic variables ('age', 'race', 'gender'), and 1500 phenotypes.
data(dataPheWAS, package = 'EHR')
data(dataPheWAS, package = 'EHR')
A data frame with 10000 observations on 1505 variables.
data(dataPheWAS)
data(dataPheWAS)
Simulated baseline data example from a Phenome Wide Association Study (PheWAS) obtained at baseline before drug exposure. The dataset includes 1505 variables: subject identification number ('id'), drug exposure ('exposure'), 3 demographic variables ('age', 'race', 'gender'), and 1500 phenotypes.
data(dataPheWAS, package = 'EHR')
data(dataPheWAS, package = 'EHR')
A data frame with 10000 observations on 1505 variables.
data(dataPheWAS)
data(dataPheWAS)
A smaller subset of baseline data example, dd.baseline. The dataset includes 55 variables: subject identification number ('id'), drug exposure ('exposure'), 3 demographic variables ('age', 'race', 'gender'), and 50 phenotypes.
data(dataPheWASsmall, package = 'EHR')
data(dataPheWASsmall, package = 'EHR')
A data frame with 2000 observations on 55 variables.
data(dataPheWASsmall)
data(dataPheWASsmall)
A smaller subset of outcome data example, 'dd'. The dataset includes 55 variables: subject identification number ('id'), drug exposure ('exposure'), 3 demographic variables ('age', 'race', 'gender'), and 50 phenotypes.
data(dataPheWASsmall, package = 'EHR')
data(dataPheWASsmall, package = 'EHR')
A data frame with 2000 observations on 55 variables.
data(dataPheWASsmall)
data(dataPheWASsmall)
This function is an interface to the medExtractR
function
within the medExtractR package, and allows drug dosing information to be extracted
from free-text sources, e.g., clinical notes.
extractMed(note_fn, drugnames, drgunit, windowlength, max_edit_dist = 0, ...)
extractMed(note_fn, drugnames, drgunit, windowlength, max_edit_dist = 0, ...)
note_fn |
File name(s) for the text file(s) containing the clinical notes. Can be a character string for an individual note, or a vector or list of file names for multiple notes. |
drugnames |
Vector of drug names for which dosing information should be extracted. Can include various forms (e.g., generic, brand name) as well as abbreviations. |
drgunit |
Unit of the drug being extracted, e.g., 'mg' |
windowlength |
Length of the search window (in characters) around the drug name in which to search for dosing entities |
max_edit_dist |
Maximum edit distance allowed when attempting to extract |
... |
Additional arguments to |
Medication information, including dosing data, is often stored in free-text sources such as
clinical notes. The extractMed
function serves as a convenient wrapper for the
medExtractR package, a natural language processing system written in R for extracting
medication data. Within extractMed
, the medExtractR
function
identifies dosing data for drug(s) of interest, specified by the drugnames
argument,
using rule-based and dictionary-based approaches. Relevant dosing entities include medication
strength (identified using the unit
argument), dose amount, dose given intake, intake
time or frequency of dose, dose change keywords (e.g., 'increase' or 'decrease'), and time of
last dose. After applying medExtractR
to extract drug dosing information, extractMed
appends the file name to results to ensure
they are appropriately labeled.
See EHR Vignette for for Extract-Med and Pro-Med-NLP. For more details, see Weeks, et al. 2020.
A data.frame with the extracted dosing information, labeled with file name as an identifier
Sample output:
filename | entity | expr | pos |
note_file1.txt | DoseChange | decrease | 66:74 |
note_file1.txt | DrugName | Prograf | 78:85 |
note_file1.txt | Strength | 2 mg | 86:90 |
note_file1.txt | DoseAmt | 1 | 91:92 |
note_file1.txt | Frequency | bid | 101:104 |
note_file1.txt | LastDose | 2100 | 121:125 |
tac_fn <- list(system.file("examples", "tacpid1_2008-06-26_note1_1.txt", package = "EHR"), system.file("examples", "tacpid1_2008-06-26_note2_1.txt", package = "EHR"), system.file("examples", "tacpid1_2008-12-16_note3_1.txt", package = "EHR")) extractMed(tac_fn, drugnames = c("tacrolimus", "prograf", "tac", "tacro", "fk", "fk506"), drgunit = "mg", windowlength = 60, max_edit_dist = 2, lastdose=TRUE)
tac_fn <- list(system.file("examples", "tacpid1_2008-06-26_note1_1.txt", package = "EHR"), system.file("examples", "tacpid1_2008-06-26_note2_1.txt", package = "EHR"), system.file("examples", "tacpid1_2008-12-16_note3_1.txt", package = "EHR")) extractMed(tac_fn, drugnames = c("tacrolimus", "prograf", "tac", "tacro", "fk", "fk506"), drgunit = "mg", windowlength = 60, max_edit_dist = 2, lastdose=TRUE)
This function converts the frequency entity to numeric.
freqNum(x)
freqNum(x)
x |
character vector of extracted frequency values |
numeric vector
f <- stdzFreq(c('in the morning', 'four times a day', 'with meals')) freqNum(f)
f <- stdzFreq(c('in the morning', 'four times a day', 'with meals')) freqNum(f)
Link ID columns from multiple data sets. De-identified columns are created to make a crosswalk.
idCrosswalk(data, idcols, visit.id = "subject_id", uniq.id = "subject_uid")
idCrosswalk(data, idcols, visit.id = "subject_id", uniq.id = "subject_uid")
data |
list of data.frames |
idcols |
list of character vectors, indicating ID columns found in each data set given in ‘data’ |
visit.id |
character sting indicating visit-level ID variable (default is "subject_id") |
uniq.id |
character sting indicating subject-level ID variable (default is "subject_uid") |
‘visit.id’ and ‘uniq.id’ may occur multiple times, but should have a one-to-one linkage defined by at least one of the input data sets. A new visit number is generated for each repeated ‘uniq.id’.
crosswalk of ID columns and their de-identified versions
demo_data <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) conc_data <- data.frame(subj_id=rep((4:6)+0.1,each=5), event=rep(1:5,times=3), conc.level=15*exp(-1*rep(1:5,times=3))+rnorm(15,0,0.1)) data <- list(demo_data, conc_data) idcols <- list(c('subj_id', 'pat_id'), 'subj_id') idCrosswalk(data, idcols, visit.id='subj_id', uniq.id='pat_id')
demo_data <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) conc_data <- data.frame(subj_id=rep((4:6)+0.1,each=5), event=rep(1:5,times=3), conc.level=15*exp(-1*rep(1:5,times=3))+rnorm(15,0,0.1)) data <- list(demo_data, conc_data) idcols <- list(c('subj_id', 'pat_id'), 'subj_id') idCrosswalk(data, idcols, visit.id='subj_id', uniq.id='pat_id')
An example of the metadata needed for the processLastDose
,
makeDose
, and collapseDose
functions.
data(lam_metadata, package = 'EHR')
data(lam_metadata, package = 'EHR')
A data frame with 5 observations on the following variables.
A character vector, filename for the clinical note
A character vector, patient ID associated with the filename
A character vector, date associated with the filename
A character vector, note ID associated with the filename
data(lam_metadata)
data(lam_metadata)
The output after running parseMedExtractR
on 4 example clinical notes.
data(lam_mxr_parsed, package = 'EHR')
data(lam_mxr_parsed, package = 'EHR')
A data frame with 10 observations on the following variables.
A character vector, filename for the clinical note
A character vector, drug name extracted from the clinical note along with start and stop positions
A character vector, strengths extracted from the clinical note along with start and stop positions
A character vector, dose amounts extracted from the clinical note along with start and stop positions
A character vector, routes extracted from the clinical note along with start and stop positions
A character vector, frequencies extracted from the clinical note along with start and stop positions
A character vector, dose intakes extracted from the clinical note along with start and stop positions
A character vector, dose change keywords extracted from the clinical note along with start and stop positions
A character vector, last dose times extracted from the clinical note along with start and stop positions
data(lam_mxr_parsed)
data(lam_mxr_parsed)
logistf
function in the R package ‘logistf’Adapted from logistf
in the R package ‘logistf’, this is
the same as logistf
except that it provides more decimal places
of p-value that would be useful for Genome-Wide Association Study (GWAS)
or Phenome Wide Association Study (PheWAS).
Logistf( formula = attr(data, "formula"), data = sys.parent(), pl = TRUE, alpha = 0.05, control, plcontrol, firth = TRUE, init, weights, plconf = NULL, dataout = TRUE, ... )
Logistf( formula = attr(data, "formula"), data = sys.parent(), pl = TRUE, alpha = 0.05, control, plcontrol, firth = TRUE, init, weights, plconf = NULL, dataout = TRUE, ... )
formula |
a formula object, with the response on the left of the
operator, and the model terms on the right. The response must be a vector
with 0 and 1 or FALSE and TRUE for the outcome, where the higher value (1 or
TRUE) is modeled. It is possible to include contrasts, interactions, nested
effects, cubic or polynomial splines and all S features as well, e.g.
|
data |
a data.frame where the variables named in the formula can be found, i. e. the variables containing the binary response and the covariates. |
pl |
specifies if confidence intervals and tests should be based on the
profile penalized log likelihood ( |
alpha |
the significance level (1- |
control |
Controls Newton-Raphson iteration. Default is |
plcontrol |
Controls Newton-Raphson iteration for the estimation of the
profile likelihood confidence intervals. Default is |
firth |
use of Firth's penalized maximum likelihood ( |
init |
specifies the initial values of the coefficients for the fitting algorithm. |
weights |
specifies case weights. Each line of the input data set is
multiplied by the corresponding element of |
plconf |
specifies the variables (as vector of their indices) for which profile likelihood confidence intervals should be computed. Default is to compute for all variables. |
dataout |
If TRUE, copies the |
... |
Further arguments to be passed to logistf. |
same as logistf
except for providing more decimal places of p-value.
Leena Choi [email protected] and Cole Beck [email protected]
same as those provided in the R package ‘logistf’.
data(dataPheWAS) fit <- Logistf(X264.3 ~ exposure + age + race + gender, data=dd) summary(fit)
data(dataPheWAS) fit <- Logistf(X264.3 ~ exposure + age + race + gender, data=dd) summary(fit)
Takes parsed and paired medication data, calculates dose intake and daily dose, and removes redundant information at the note and date level.
makeDose(x, noteMetaData, naFreq = "most")
makeDose(x, noteMetaData, naFreq = "most")
x |
data.frame containing the output of |
noteMetaData |
data.frame containing identifying meta data for each note, including patient ID, date of the note, and note ID. Column names should be set to ‘filename’, ‘pid’, ‘date’, ‘note’. Date should have format YYYY-MM-DD. |
naFreq |
Replacing missing frequencies with this value, or by default the most common value across
the entire set in |
This function standardizes frequency, route, and duration entities. Dose amount, strength, and frequency entities are converted to numeric. Rows with only drug name and/or route are removed. If there are drug name changes in adjacent rows (e.g., from a generic to brand name), these rows are collapsed into one row if there are no conflicts. Missing strengths, dose amounts, frequencies, and routes are borrowed or imputed using various rules (see McNeer et al., 2020 for details). Dose given intake and daily dose are calculated. Redundancies are removed at the date and note level. If time of last dose is being used and it is unique within the level of collapsing, it is borrowed across all rows.
A list containing two dataframes, one with the note level and one with the date level collapsed data.
data(lam_mxr_parsed) data(lam_metadata) lam_build_out <- buildDose(lam_mxr_parsed) lam_collapsed <- makeDose(lam_build_out, lam_metadata) lam_collapsed[[1]] # Note level collapsing lam_collapsed[[2]] # Date level collapsing
data(lam_mxr_parsed) data(lam_metadata) lam_build_out <- buildDose(lam_mxr_parsed) lam_collapsed <- makeDose(lam_build_out, lam_metadata) lam_collapsed[[1]] # Note level collapsing lam_collapsed[[2]] # Date level collapsing
Takes files with the raw medication extraction output generated by the CLAMP natural language processing system and converts it into a standardized format.
parseCLAMP(filename)
parseCLAMP(filename)
filename |
File name for a single file containing CLAMP output. |
Output from different medication extraction systems is formatted in different ways. In order to be able to process the extracted information, we first need to convert the output from different systems into a standardized format. Extracted expressions for various drug entities (e.g., drug name, strength, frequency, etc.) each receive their own column formatted as "extracted expression::start position::stop position". If multiple expressions are extracted for the same entity, they will be separated by backticks.
CLAMP output files anchor extractions to a specific drug name extraction through semantic relations.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
A data.table object with columns for filename, drugname, strength, dose, route, and freq. The filename contains the file name corresponding to the clinical note. Each of the entity columns are of the format "extracted expression::start position::stop position".
Takes files with the raw medication extraction output generated by the MedEx natural language processing system and converts it into a standardized format.
parseMedEx(filename)
parseMedEx(filename)
filename |
File name for a single file containing MedEx output. |
Output from different medication extraction systems is formatted in different ways. In order to be able to process the extracted information, we first need to convert the output from different systems into a standardized format. Extracted expressions for various drug entities (e.g., drug name, strength, frequency, etc.) each receive their own column formatted as "extracted expression::start position::stop position". If multiple expressions are extracted for the same entity, they will be separated by backticks.
MedEx output files anchor extractions to a specific drug name extraction.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
A data.table object with columns for filename, drugname, strength, dose, route, and freq. The filename contains the file name corresponding to the clinical note. Each of the entity columns are of the format "extracted expression::start position::stop position".
Takes files with the raw medication extraction output generated by the medExtractR natural language processing system and converts it into a standardized format.
parseMedExtractR(filename)
parseMedExtractR(filename)
filename |
File name for a single file containing medExtractR output. |
Output from different medication extraction systems is formatted in different ways. In order to be able to process the extracted information, we first need to convert the output from different systems into a standardized format. Extracted expressions for various drug entities (e.g., drug name, strength, frequency, etc.) each receive their own column formatted as "extracted expression::start position::stop position". If multiple expressions are extracted for the same entity, they will be separated by backticks.
The medExtractR system returns extractions in a long table format, indicating the entity, extracted expression, and start:stop position of the extraction. To perform this initial parsing, entities are paired with the closest preceding drug name. The one exception to this is the dose change entity, which can occur before the drug name (see Weeks, et al. 2020 for details).
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
A data.table object with columns for filename, drugname, strength, dose, route, freq, dosestr, dosechange and lastdose. The filename contains the file name corresponding to the clinical note. Each of the entity columns are of the format "extracted expression::start position::stop position".
mxr_output <- system.file("examples", "lam_mxr.csv", package = "EHR") mxr_parsed <- parseMedExtractR(mxr_output) mxr_parsed
mxr_output <- system.file("examples", "lam_mxr.csv", package = "EHR") mxr_parsed <- parseMedExtractR(mxr_output) mxr_parsed
Takes files with the raw medication extraction output generated by the MedXN natural language processing system and converts it into a standardized format.
parseMedXN(filename, begText = "^[R0-9]+_[0-9-]+_[0-9]+_")
parseMedXN(filename, begText = "^[R0-9]+_[0-9-]+_[0-9]+_")
filename |
File name for single file containing MedXN output. |
begText |
A regular expression that would indicate the beginning of a new observation (i.e., extracted clinical note). |
Output from different medication extraction systems is formatted in different ways. In order to be able to process the extracted information, we first need to convert the output from different systems into a standardized format. Extracted expressions for various drug entities (e.g., drug name, strength, frequency, etc.) each receive their own column formatted as "extracted expression::start position::stop position". If multiple expressions are extracted for the same entity, they will be separated by backticks.
MedXN output files anchor extractions to a specific drug name extraction.
In MedXN output files, the results from multiple clinical notes can be combined into
a single output file. The beginning of some lines of the output file can indicate
when output for a new observation (or new clinical note) begins. The user should specify
the argument begText
to be a regular expression used to identify the lines where output
for a new clinical note begins.
See EHR Vignette for Extract-Med and Pro-Med-NLP as well as Dose Building Using Example Vanderbilt EHR Data for details.
A data.table object with columns for filename, drugname, strength, dose, route, freq, and duration. The filename contains the file name corresponding to the clinical note. Each of the entity columns are of the format "extracted expression::start position::stop position".
mxn_output <- system.file("examples", "lam_medxn.csv", package = "EHR") mxn_parsed <- parseMedXN(mxn_output, begText = "^ID[0-9]+_[0-9-]+_") mxn_parsed
mxn_output <- system.file("examples", "lam_medxn.csv", package = "EHR") mxn_parsed <- parseMedXN(mxn_output, begText = "^ID[0-9]+_[0-9-]+_") mxn_parsed
This function takes last dose times extracted using the medExtractR system and
processes the times into standardized datetime objects using recorded lab data where necessary.
The raw output from extractMed
is filtered to just the LastDose extractions. Time expressions
are standardized into HH:MM:SS format based on what category they fall into (e.g., a time represented
with AM/PM, 24-hour military time, etc.). When the last dose time is after 12pm, it is assumed to have
been taken one day previous to the note's date. For any duration extractions (e.g. "14 hour level"),
the last dose time is calculated from the labtime by extracting the appropriate number of hours. The
final dataset is returned with last dose time formatted into a POSIXct variable.
processLastDose(mxrData, noteMetaData, labData)
processLastDose(mxrData, noteMetaData, labData)
mxrData |
data.frame containing output from the |
noteMetaData |
data.frame with meta data ( |
labData |
data.frame that contains lab dates and times associated with the file names
within |
See EHR Vignette for Extract-Med and Pro-Med-NLP for details.
data.frame with identifying information (e.g., filename, etc) as well as processed and standardized last dose times as a POSIXct column
tac_mxr <- read.csv(system.file("examples", "tac_mxr.csv", package = "EHR")) data(tac_metadata) data(tac_lab) processLastDose(mxrData = tac_mxr, noteMetaData = tac_metadata, labData = tac_lab)
tac_mxr <- read.csv(system.file("examples", "tac_mxr.csv", package = "EHR")) data(tac_metadata) data(tac_lab) processLastDose(mxrData = tac_mxr, noteMetaData = tac_metadata, labData = tac_lab)
Replace IDs with de-identified version pulled from a crosswalk.
pullFakeId( dat, xwalk, firstCols = NULL, orderBy = NULL, uniq.id = "subject_uid" )
pullFakeId( dat, xwalk, firstCols = NULL, orderBy = NULL, uniq.id = "subject_uid" )
dat |
a data.frame |
xwalk |
a data.frame providing linkage for each ID, e.g. output from |
firstCols |
name of columns to put at front of output data set |
orderBy |
name of columns used to reorder output data set |
uniq.id |
character string indicating subject-level id variable (default is "subject_uid") |
The modified data.frame
demo_data <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) # crosswalk w/ same format as idCrosswalk() output xwalk <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), mod_visit=c(1,2,1,1), mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1)) demo_data_deident <- pullFakeId(demo_data, xwalk, firstCols = c('mod_id','mod_id_visit','mod_visit'), uniq.id='pat_id')
demo_data <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) # crosswalk w/ same format as idCrosswalk() output xwalk <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), mod_visit=c(1,2,1,1), mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1)) demo_data_deident <- pullFakeId(demo_data, xwalk, firstCols = c('mod_id','mod_id_visit','mod_visit'), uniq.id='pat_id')
Replace de-identified IDs with identified version pulled from a crosswalk.
pullRealId(dat, xwalk = NULL, remove.mod.id = FALSE)
pullRealId(dat, xwalk = NULL, remove.mod.id = FALSE)
dat |
a data.frame |
xwalk |
a data.frame providing linkage for each ID, e.g. output from |
remove.mod.id |
logical, should the de-identified IDs – mod_id, mod_visit, mod_id_visit – be removed (default=FALSE) |
The modified data.frame
demo_data_deident <- data.frame(mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1), mod_visit=c(1,2,1,1), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) # crosswalk w/ same format as idCrosswalk() output xwalk <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), mod_visit=c(1,2,1,1), mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1)) pullRealId(demo_data_deident, xwalk) pullRealId(demo_data_deident, xwalk, remove.mod.id=TRUE)
demo_data_deident <- data.frame(mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1), mod_visit=c(1,2,1,1), gender=c(1,1,0,1), weight=c(34,42,28,63), height=c(142,148,120,167)) # crosswalk w/ same format as idCrosswalk() output xwalk <- data.frame(subj_id=c(4.1,4.2,5.1,6.1), pat_id=c(14872,14872,24308,37143), mod_visit=c(1,2,1,1), mod_id=c(1,1,2,3), mod_id_visit=c(1.1,1.2,2.1,3.1)) pullRealId(demo_data_deident, xwalk) pullRealId(demo_data_deident, xwalk, remove.mod.id=TRUE)
Convenience function for reading in a CSV file, and making small modifications to a data.frame.
readTransform(file, ...)
readTransform(file, ...)
file |
filename of a CSV file |
... |
additional information passed to |
If read.csv
needs additional arguments (or the file is in a
different format), the user should load the data first, then directly call
dataTransformation
.
The modified data.frame
This module builds PK data for intravenously (IV) administered medications.
run_Build_PK_IV( conc, conc.columns = list(), dose, dose.columns = list(), censor = NULL, censor.columns = list(), demo.list = NULL, demo.columns = list(), lab.list = NULL, lab.columns = list(), dosePriorWindow = 7, labPriorWindow = 7, postWindow = NA, pk.vars = NULL, drugname = NULL, check.path = NULL, missdemo_fn = "-missing-demo", faildupbol_fn = "DuplicateBolus-", date.format = "%m/%d/%y %H:%M:%S", date.tz = "America/Chicago", isStrict = FALSE )
run_Build_PK_IV( conc, conc.columns = list(), dose, dose.columns = list(), censor = NULL, censor.columns = list(), demo.list = NULL, demo.columns = list(), lab.list = NULL, lab.columns = list(), dosePriorWindow = 7, labPriorWindow = 7, postWindow = NA, pk.vars = NULL, drugname = NULL, check.path = NULL, missdemo_fn = "-missing-demo", faildupbol_fn = "DuplicateBolus-", date.format = "%m/%d/%y %H:%M:%S", date.tz = "America/Chicago", isStrict = FALSE )
conc |
concentration data, the output of |
conc.columns |
a named list that should specify columns in concentration data; ‘id’, ‘datetime’, ‘druglevel’ are required. ‘idvisit’ may also be specified; ‘idvisit’ can be used when there are multiple visits (i.e., several occasions) for the same subject. ‘datetime’ is date and time for concentration measurement, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
dose |
dose data, the output of |
dose.columns |
a named list that should specify columns in dose data; ‘id’ is required. ‘infuseDatetime’ and ‘infuseDose’ should be set if infusion dose data is present. ‘infuseTimeExact’ may also be specified for infusion data – this variable represents an precise time, if for example the ‘infuseDatetime’ variable is rounded. ‘bolusDatetime’ and ‘bolusDose’ should be set if bolus dose data is present. A generic ‘date’ variable may be provided, agnostic to either infusion or bolus dosing. ‘gap’ and ‘weight’ column names may also be set. Any of the date-time variables can be specified as a single date-time variable (infuseDatetime = ‘date_time’) or two variables holding date and time separately (e.g., infuseDatetime = c(‘Date’, ‘Time’)). |
censor |
censoring information, if available; this will censor concentration and dose data for dates occuring after the censor datetime variable. |
censor.columns |
a named list that should specify columns in censoring data; ‘id’, and ‘datetime’ are required. ‘datetime’ is the date and time when data should be censored. This can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
demo.list |
demographic information, if available; the output from
|
demo.columns |
a named list that should specify columns in demographic data; ‘id’ is required. ‘weight’ and ‘idvisit’ may also be used to specify columns for weight or the unique idvisit. Any other columns present in the demographic data are treated as covariates. |
lab.list |
lab data, if available; the output from |
lab.columns |
a named list that should specify columns in lab data; ‘id’, and ‘datetime’ are required. ‘datetime’ is the date and time when the lab data was obtained, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). Any other columns present in lab data are treated as lab values. |
dosePriorWindow |
Dose data is merged with drug level data. This value sets the time frame window with the number of days prior to the first drug level data; defaults to 7. |
labPriorWindow |
Lab data is merged with drug level data. This value sets the time frame window with the number of days prior to the first drug level data; defaults to 7. |
postWindow |
Data is merged with drug level data. This postWindow can set the end time for the drug level data, being the number of days after the first drug level data. The default (NA) will use the date of the last drug level data. |
pk.vars |
variables to include in the returned PK data. The variable ‘date’ is a special case; when included, it maps the ‘time’ offset to its original date-time. Other named variables will be merged from the concentration data set. For example, rather than being separate data sets, labs or demographics may already be present in the concentration data. These columns should be named here. |
drugname |
drug of interest, included in filename of check files. The default (NULL) will produce filenames without drugname included. |
check.path |
path to ‘check’ directory, where check files are created. The default (NULL) will not produce any check files. |
missdemo_fn |
filename for checking NA frequency among demographic data |
faildupbol_fn |
filename for duplicate bolus data |
date.format |
output format for ‘date’ variable |
date.tz |
output time zone for ‘date’ variable |
isStrict |
logical; when TRUE dose amount totals are strictly summed rather than repeated hourly until stopped |
See EHR Vignette for Structured Data.
Regarding the ‘gap’ variable in the dose dataset, if ‘gap’ is specified in ‘dose.columns’, it allows a continuous infusion given when there are missing records between infusion dosing records. For example, suppose that ‘gap’ = 60 is defined (which is typical gap size when infusion dosing is supposed to be recorded hourly for inpatients) and time between two records (i.e., gap) are greater than 1 hour (i.e., missing records). If the gap between the two records is less or equal to twice of the gap (i.e., 2*60 = 120 min), a continuous infusion is assumed until the 2nd dose record; otherwise, the first infusion is assumed to be stopped (i.e., add zero doses) after 60 min (i.e., equal to the gap size) and a new infusion (the 2nd record) starts at its recorded time.
PK data set
# make fake data set.seed(6543) build_date <- function(x) format(seq(x, length.out=5, by="1 hour"), "%Y-%m-%d %H:%M") dates <- unlist(lapply(rep(Sys.time(),3), build_date)) plconc <- data.frame(mod_id = rep(1:3,each=5), mod_id_visit = rep(1:3,each=5)+0.1, event = rep(1:5,times=3), conc.level = 15*exp(-1*rep(1:5,times=3))+rnorm(15,0,0.1), date.time = as.POSIXct(dates)) ivdose <- data.frame(mod_id = 1:3, date.dose = substr(dates[seq(1,15,by=5)],1,10), infuse.time.real = NA, infuse.time = NA, infuse.dose = NA, bolus.time = as.POSIXct(dates[seq(1,15,by=5)])-300, bolus.dose = 90, maxint = 0L, weight = 45) run_Build_PK_IV(conc = plconc, conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', idvisit = 'mod_id_visit'), dose = ivdose, dose.columns = list(id = 'mod_id', date = 'date.dose', bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', gap = 'maxint', weight = 'weight'), pk.vars = 'date')
# make fake data set.seed(6543) build_date <- function(x) format(seq(x, length.out=5, by="1 hour"), "%Y-%m-%d %H:%M") dates <- unlist(lapply(rep(Sys.time(),3), build_date)) plconc <- data.frame(mod_id = rep(1:3,each=5), mod_id_visit = rep(1:3,each=5)+0.1, event = rep(1:5,times=3), conc.level = 15*exp(-1*rep(1:5,times=3))+rnorm(15,0,0.1), date.time = as.POSIXct(dates)) ivdose <- data.frame(mod_id = 1:3, date.dose = substr(dates[seq(1,15,by=5)],1,10), infuse.time.real = NA, infuse.time = NA, infuse.dose = NA, bolus.time = as.POSIXct(dates[seq(1,15,by=5)])-300, bolus.dose = 90, maxint = 0L, weight = 45) run_Build_PK_IV(conc = plconc, conc.columns = list(id = 'mod_id', datetime = 'date.time', druglevel = 'conc.level', idvisit = 'mod_id_visit'), dose = ivdose, dose.columns = list(id = 'mod_id', date = 'date.dose', bolusDatetime = 'bolus.time', bolusDose = 'bolus.dose', gap = 'maxint', weight = 'weight'), pk.vars = 'date')
This module builds PK data for orally administered medications.
run_Build_PK_Oral( x, idCol = "id", dtCol = "dt", doseCol = "dose", concCol = "conc", ldCol = NULL, first_interval_hours = 336, imputeClosest = NULL )
run_Build_PK_Oral( x, idCol = "id", dtCol = "dt", doseCol = "dose", concCol = "conc", ldCol = NULL, first_interval_hours = 336, imputeClosest = NULL )
x |
a data.frame or file saved as either CSV, RData, or RDS |
idCol |
data.frame id column name |
dtCol |
data.frame date column name |
doseCol |
dose column name |
concCol |
concentration column name |
ldCol |
last-dose time column name |
first_interval_hours |
number of hours before the first concentration to start time=0; the default is 336 hours = 14 days |
imputeClosest |
columns to impute missing data with next observation propagated backward; this is in addition to all covariates receving imputation using last observation carried forward |
See EHR Vignette for Build-PK-Oral.
data.frame
## Data Generating Function mkdat <- function() { npat <- 3 visits <- floor(runif(npat, min=2, max=6)) id <- rep(1:npat, visits) dt_samp <- as.Date(sort(sample(700, sum(visits))), origin = '2019-01-01') tm_samp <- as.POSIXct(paste(dt_samp, '10:00:00'), tz = 'UTC') dt <- tm_samp + rnorm(sum(visits), 0, 1*60*60) dose_morn <- sample(c(2.5,5,7.5,10), sum(visits), replace = TRUE) conc <- round(rnorm(sum(visits), 1.5*dose_morn, 1),1) ld <- dt - sample(10:16, sum(visits), replace = TRUE) * 3600 ld[rnorm(sum(visits)) < .3] <- NA age <- rep(sample(40:75, npat), visits) gender <- rep(sample(0:1, npat, replace=TRUE), visits) weight <- rep(round(rnorm(npat, 180, 20)),visits) hgb <- rep(rnorm(npat, 10, 2), visits) data.frame(id, dt, dose_morn, conc, ld, age, gender, weight, hgb) } # Make raw data set.seed(30) dat <- mkdat() #Process data without last-dose times run_Build_PK_Oral(x = dat, idCol = "id", dtCol = "dt", doseCol = "dose_morn", concCol = "conc", ldCol = NULL, first_interval_hours = 336, imputeClosest = NULL) #Process data with last-dose times run_Build_PK_Oral(x = dat, doseCol = "dose_morn", ldCol = "ld")
## Data Generating Function mkdat <- function() { npat <- 3 visits <- floor(runif(npat, min=2, max=6)) id <- rep(1:npat, visits) dt_samp <- as.Date(sort(sample(700, sum(visits))), origin = '2019-01-01') tm_samp <- as.POSIXct(paste(dt_samp, '10:00:00'), tz = 'UTC') dt <- tm_samp + rnorm(sum(visits), 0, 1*60*60) dose_morn <- sample(c(2.5,5,7.5,10), sum(visits), replace = TRUE) conc <- round(rnorm(sum(visits), 1.5*dose_morn, 1),1) ld <- dt - sample(10:16, sum(visits), replace = TRUE) * 3600 ld[rnorm(sum(visits)) < .3] <- NA age <- rep(sample(40:75, npat), visits) gender <- rep(sample(0:1, npat, replace=TRUE), visits) weight <- rep(round(rnorm(npat, 180, 20)),visits) hgb <- rep(rnorm(npat, 10, 2), visits) data.frame(id, dt, dose_morn, conc, ld, age, gender, weight, hgb) } # Make raw data set.seed(30) dat <- mkdat() #Process data without last-dose times run_Build_PK_Oral(x = dat, idCol = "id", dtCol = "dt", doseCol = "dose_morn", concCol = "conc", ldCol = NULL, first_interval_hours = 336, imputeClosest = NULL) #Process data with last-dose times run_Build_PK_Oral(x = dat, doseCol = "dose_morn", ldCol = "ld")
This module will load and modify demographic data.
run_Demo(demo.path, demo.columns = list(), toexclude, demo.mod.list)
run_Demo(demo.path, demo.columns = list(), toexclude, demo.mod.list)
demo.path |
filename of demographic file (CSV, RData, RDS) or data.frame |
demo.columns |
a named list that should specify columns in demo data; ‘id’, is required. |
toexclude |
expression that should evaluate to a logical, indicating if the observation should be excluded |
demo.mod.list |
list of expressions, giving modifications to make |
See EHR Vignette for Structured Data.
list with two components
demo |
demographic data |
exclude |
vector of excluded visit IDs |
set.seed(2525) dateSeq <- seq(as.Date('2019/01/01'), as.Date('2020/01/01'), by="day") demo <- data.frame(mod_id_visit = 1:10, weight.lbs = rnorm(10,160,20), age = rnorm(10, 50, 10), enroll.date = sample(dateSeq, 10)) tmpfile <- paste0(tempfile(), '.rds') saveRDS(demo, file = tmpfile) # exclusion functions exclude_wt <- function(x) x < 150 exclude_age <- function(x) x > 60 ind.risk <- function(wt, age) wt>170 & age>55 exclude_enroll <- function(x) x < as.Date('2019/04/01') # make demographic data that: # (1) excludes ids with weight.lbs < 150, age > 60, or enroll.date before 2019/04/01 # (2) creates new 'highrisk' variable for subjects with weight.lbs>170 and age>55 out <- run_Demo(demo.path = tmpfile, demo.columns = list(id = 'mod_id_visit'), toexclude = expression( exclude_wt(weight.lbs)|exclude_age(age)|exclude_enroll(enroll.date) ), demo.mod.list = list(highrisk = expression(ind.risk(weight.lbs, age)))) out
set.seed(2525) dateSeq <- seq(as.Date('2019/01/01'), as.Date('2020/01/01'), by="day") demo <- data.frame(mod_id_visit = 1:10, weight.lbs = rnorm(10,160,20), age = rnorm(10, 50, 10), enroll.date = sample(dateSeq, 10)) tmpfile <- paste0(tempfile(), '.rds') saveRDS(demo, file = tmpfile) # exclusion functions exclude_wt <- function(x) x < 150 exclude_age <- function(x) x > 60 ind.risk <- function(wt, age) wt>170 & age>55 exclude_enroll <- function(x) x < as.Date('2019/04/01') # make demographic data that: # (1) excludes ids with weight.lbs < 150, age > 60, or enroll.date before 2019/04/01 # (2) creates new 'highrisk' variable for subjects with weight.lbs>170 and age>55 out <- run_Demo(demo.path = tmpfile, demo.columns = list(id = 'mod_id_visit'), toexclude = expression( exclude_wt(weight.lbs)|exclude_age(age)|exclude_enroll(enroll.date) ), demo.mod.list = list(highrisk = expression(ind.risk(weight.lbs, age)))) out
This module will load and modify drug-level data.
run_DrugLevel( conc.path, conc.columns = list(), conc.select, conc.rename, conc.mod.list = NULL, samp.path = NULL, samp.columns = list(), samp.mod.list = NULL, check.path = NULL, failmiss_fn = "MissingConcDate-", multsets_fn = "multipleSetsConc-", faildup_fn = "DuplicateConc-", drugname = NULL, LLOQ = NA, demo.list = NULL, demo.columns = list() )
run_DrugLevel( conc.path, conc.columns = list(), conc.select, conc.rename, conc.mod.list = NULL, samp.path = NULL, samp.columns = list(), samp.mod.list = NULL, check.path = NULL, failmiss_fn = "MissingConcDate-", multsets_fn = "multipleSetsConc-", faildup_fn = "DuplicateConc-", drugname = NULL, LLOQ = NA, demo.list = NULL, demo.columns = list() )
conc.path |
filename of concentration data (CSV, RData, RDS), or data.frame |
conc.columns |
a named list that should specify columns in concentration data. ‘id’ and ‘conc’ are required. ‘idvisit’ may also be specified. If linking with sampling data, ‘samplinkid’ is required. Otherwise ‘datetime’ is required. This is the date and time when blood samples were obtained. This can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
conc.select |
columns to select from concentration data |
conc.rename |
new column names for concentration data |
conc.mod.list |
list of expressions, giving modifications to make |
samp.path |
filename of data with sampling time (CSV, RData, RDS), or data.frame |
samp.columns |
a named list that should specify columns in sampling data. ‘conclinkid’ and ‘datetime’ are required to link sampling data to concentration data. ‘conclinkid’ should match the id variable provided as ‘samplinkid’ in the ‘conc.columns’ argument. ‘datetime’ is the date and time when blood samples were obtained. This can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
samp.mod.list |
list of expressions, giving modifications to make |
check.path |
path to ‘check’ directory, where check files are created. The default (NULL) will not produce any check files. |
failmiss_fn |
filename for data missing concentration date |
multsets_fn |
filename for data with multiple concentration sets |
faildup_fn |
filename for data with duplicate concentration observations |
drugname |
drug of interest, included in filename of check files. The default (NULL) will produce filenames without drugname included. |
LLOQ |
lower limit of concentration values; values below this are invalid |
demo.list |
demographic information; if available, concentration records must have a valid demo record |
demo.columns |
a named list that should specify columns in demographic data; ‘id’, is required. If ‘idvisit’ is present in the concentration data, then it is required here too. |
See EHR Vignette for Structured Data.
drug-level data set
# concentrations conc_data <- data.frame(mod_id = rep(1:3,each=4), mod_visit = rep(c(2,1,1),each=4), mod_id_visit = as.numeric(paste(rep(1:3,each=4), rep(c(2,1,1),each=4), sep=".")), samp = rep(1:4,times=3), drug_calc_conc=15*exp(-1*rep(1:4,times=3))+rnorm(12,0,0.1)) # sample times build_date <- function(x) format(seq(x, length.out=4, by="1 hour"), "%Y-%m-%d %H:%M") dates <- unlist(lapply(rep(Sys.time(),3), build_date)) samp_data <- data.frame(mod_id = rep(1:3,each=4), mod_visit = rep(c(2,1,1),each=4), mod_id_visit = as.numeric(paste(rep(1:3,each=4), rep(c(2,1,1),each=4), sep=".")), samp = rep(1:4,times=3), Sample.Collection.Date.and.Time = dates) run_DrugLevel( conc.path = conc_data, conc.columns = list( id = 'mod_id', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event', conc = 'conc.level' ), conc.select = c('mod_id','mod_id_visit','samp','drug_calc_conc'), conc.rename = c(drug_calc_conc= 'conc.level', samp='event'), conc.mod.list = list(mod_id_event = expression(paste(mod_id_visit, event, sep = "_"))), samp.path = samp_data, samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'), samp.mod.list = list(mod_id_event = expression(paste(mod_id_visit, samp, sep = "_"))), drugname = 'drugnm', LLOQ = 0.05 ) # minimal example with data in required format conc_data <- conc_data[,c('mod_id','mod_id_visit','samp','drug_calc_conc')] conc_data[,'mod_id_event'] <- paste(conc_data[,'mod_id_visit'], conc_data[,'samp'], sep = "_") names(conc_data)[3:4] <- c('event','conc.level') samp_data[,'mod_id_event'] <- paste(samp_data[,'mod_id_visit'], samp_data[,'samp'], sep = "_") conc_samp_link <- match(conc_data[,'mod_id_event'], samp_data[,'mod_id_event']) conc_date <- samp_data[conc_samp_link, 'Sample.Collection.Date.and.Time'] conc_data[,'date.time'] <- as.POSIXct(conc_date) run_DrugLevel(conc_data, conc.columns = list( id = 'mod_id', idvisit = 'mod_id_visit', datetime = 'date.time', conc = 'conc.level' ))
# concentrations conc_data <- data.frame(mod_id = rep(1:3,each=4), mod_visit = rep(c(2,1,1),each=4), mod_id_visit = as.numeric(paste(rep(1:3,each=4), rep(c(2,1,1),each=4), sep=".")), samp = rep(1:4,times=3), drug_calc_conc=15*exp(-1*rep(1:4,times=3))+rnorm(12,0,0.1)) # sample times build_date <- function(x) format(seq(x, length.out=4, by="1 hour"), "%Y-%m-%d %H:%M") dates <- unlist(lapply(rep(Sys.time(),3), build_date)) samp_data <- data.frame(mod_id = rep(1:3,each=4), mod_visit = rep(c(2,1,1),each=4), mod_id_visit = as.numeric(paste(rep(1:3,each=4), rep(c(2,1,1),each=4), sep=".")), samp = rep(1:4,times=3), Sample.Collection.Date.and.Time = dates) run_DrugLevel( conc.path = conc_data, conc.columns = list( id = 'mod_id', idvisit = 'mod_id_visit', samplinkid = 'mod_id_event', conc = 'conc.level' ), conc.select = c('mod_id','mod_id_visit','samp','drug_calc_conc'), conc.rename = c(drug_calc_conc= 'conc.level', samp='event'), conc.mod.list = list(mod_id_event = expression(paste(mod_id_visit, event, sep = "_"))), samp.path = samp_data, samp.columns = list(conclinkid = 'mod_id_event', datetime = 'Sample.Collection.Date.and.Time'), samp.mod.list = list(mod_id_event = expression(paste(mod_id_visit, samp, sep = "_"))), drugname = 'drugnm', LLOQ = 0.05 ) # minimal example with data in required format conc_data <- conc_data[,c('mod_id','mod_id_visit','samp','drug_calc_conc')] conc_data[,'mod_id_event'] <- paste(conc_data[,'mod_id_visit'], conc_data[,'samp'], sep = "_") names(conc_data)[3:4] <- c('event','conc.level') samp_data[,'mod_id_event'] <- paste(samp_data[,'mod_id_visit'], samp_data[,'samp'], sep = "_") conc_samp_link <- match(conc_data[,'mod_id_event'], samp_data[,'mod_id_event']) conc_date <- samp_data[conc_samp_link, 'Sample.Collection.Date.and.Time'] conc_data[,'date.time'] <- as.POSIXct(conc_date) run_DrugLevel(conc_data, conc.columns = list( id = 'mod_id', idvisit = 'mod_id_visit', datetime = 'date.time', conc = 'conc.level' ))
This module will load and modify laboratory data.
run_Labs(lab.path, lab.select, lab.mod.list)
run_Labs(lab.path, lab.select, lab.mod.list)
lab.path |
filename of a lab file (CSV, RData, RDS), or data.frame |
lab.select |
columns to select |
lab.mod.list |
list of expressions giving modifications to make;
passed to |
See EHR Vignette for Structured Data.
lab data set
lab_data <- data.frame(mod_id=rep(1:3,each=3), date=rep(c("01/12/17","05/05/18","11/28/16"),each=3), time=rep(c("1:30","2:30","3:30"),3), creat=rnorm(9,0.5,0.05)) run_Labs(lab_data, lab.mod.list=list(log_creat=expression(log(creat))))
lab_data <- data.frame(mod_id=rep(1:3,each=3), date=rep(c("01/12/17","05/05/18","11/28/16"),each=3), time=rep(c("1:30","2:30","3:30"),3), creat=rnorm(9,0.5,0.05)) run_Labs(lab_data, lab.mod.list=list(log_creat=expression(log(creat))))
This module will load and modify structured intravenous (IV) infusion and bolus medication data.
run_MedStrI( mar.path, mar.columns = list(), medGivenReq = FALSE, flow.path = NULL, flow.columns = list(), medchk.path = NULL, demo.list = NULL, demo.columns = list(), missing.wgt.path = NULL, wgt.columns = list(), check.path = NULL, failflow_fn = "FailFlow", failnounit_fn = "NoUnit", failunit_fn = "Unit", failnowgt_fn = "NoWgt", censor_date_fn = "CensorTime", infusion.unit = "mcg/kg/hr", bolus.unit = "mcg", bol.rate.thresh = Inf, rateunit = "mcg/hr", ratewgtunit = "mcg/kg/hr", weightunit = "kg", drugname = NULL )
run_MedStrI( mar.path, mar.columns = list(), medGivenReq = FALSE, flow.path = NULL, flow.columns = list(), medchk.path = NULL, demo.list = NULL, demo.columns = list(), missing.wgt.path = NULL, wgt.columns = list(), check.path = NULL, failflow_fn = "FailFlow", failnounit_fn = "NoUnit", failunit_fn = "Unit", failnowgt_fn = "NoWgt", censor_date_fn = "CensorTime", infusion.unit = "mcg/kg/hr", bolus.unit = "mcg", bol.rate.thresh = Inf, rateunit = "mcg/hr", ratewgtunit = "mcg/kg/hr", weightunit = "kg", drugname = NULL )
mar.path |
filename of MAR data (CSV, RData, RDS), or data.frame |
mar.columns |
a named list that should specify columns in MAR data; ‘id’, ‘datetime’ and ‘dose’ are required. ‘drug’, ‘weight’, ‘given’ may also be specified. ‘datetime’ is date and time for data measurement, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). ‘dose’ can also be given as a single variable or two variables. If given as a single column, the column's values should contain dose and units such as ‘25 mcg’. If given as two column names, the dose column should come before the unit column (e.g., dose = c(‘doseamt’, ‘unit’)). ‘drug’ can provide list of acceptable drug names. If ‘drug’ is present, the ‘medchk.path’ argument should also be provided. The ‘given’ is a variable that flags whether the medication (inpatient) was given. When it is given, values shoule be “Given”; should be used in conjunction with the ‘medGivenReq’ argument. |
medGivenReq |
if TRUE, values in ‘given’ column in MAR data should equal “Given”; if this is FALSE (the default), NA values are also acceptable. |
flow.path |
filename of flow data (CSV, RData, RDS), or data.frame |
flow.columns |
a named list that should specify columns in flow data; ‘id’, ‘datetime’, ‘finalunits’, ‘unit’, ‘rate’, ‘weight’ are required. ‘idvisit’ may also be specified. ‘datetime’ is date and time for data measurement, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
medchk.path |
filename containing data set (CSV, RData, RDS), or data.frame; should have the column ‘medname’ with list of acceptable drug names (e.g., brand and generic name, abbreviations) to subset drugs of interest using ‘drug’ column in MAR data. This argument can be used when MAR data contains different drugs that should be excluded. |
demo.list |
demographic information; if available, the output from 'run_Demo' or a correctly formatted data.frame, which can be used to impute weight when missing |
demo.columns |
a named list that should specify columns in demographic data; ‘id’, ‘datetime’, and ‘weight’ are required. ‘datetime’ is the date and time when the demographic data were obtained, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
missing.wgt.path |
filename containing additional weight data (CSV, RData, RDS), or data.frame. The variables in this file should be defined in the ‘wgt.columns’ argument. |
wgt.columns |
a named list that should specify columns in additional weight data; ‘id’, ‘datetime’, and ‘weight’ are required. ‘datetime’ is date and time for weight measurement, which can refer to a single date-time variable (datetime = ‘date_time’) or two variables holding date and time separately (e.g., datetime = c(‘Date’, ‘Time’)). |
check.path |
path to ‘check’ directory, where check files are created. The default (NULL) will not produce any check files. |
failflow_fn |
filename for duplicate flow data with rate zero |
failnounit_fn |
filename for MAR data with missing unit |
failunit_fn |
filename for MAR data with invalid unit |
failnowgt_fn |
filename for infusion data with missing weight where unit indicates weight is required |
censor_date_fn |
filename containing censor times created with invalid dose data |
infusion.unit |
acceptable unit for infusion data |
bolus.unit |
acceptable unit for bolus data |
bol.rate.thresh |
upper limit for bolus rate; values above this are invalid |
rateunit |
acceptable unit for hourly rate; defaults to ‘mcg/hr’ |
ratewgtunit |
acceptable unit for hourly rate by weight; defaults to ‘mcg/kg/hr’ |
weightunit |
acceptable unit for weight; defaults to ‘kg’ |
drugname |
drug of interest, included in filename of check files. The default (NULL) will produce filenames without drugname included. |
See EHR Vignette for Structured Data.
structured data set
# flow data for 'Fakedrug1' flow <- data.frame(mod_id=c(1,1,2,2,2), mod_id_visit=c(46723,46723,84935,84935,84935), record.date=c("07/05/2019 5:25","07/05/2019 6:01", "09/04/2020 3:21", "09/04/2020 4:39", "09/04/2020 5:32"), Final.Weight=c(6.75,6.75,4.5,4.5,4.5), Final.Rate=c(rep("1 mcg/kg/hr",2), rep("0.5 mcg/kg/hr",3)), Final.Units=c("3.375","6.5", "2.25","2.25","2.25")) flow[,'Perform.Date'] <- pkdata::parse_dates(flow[,'record.date']) flow[,'unit'] <- sub('.*[ ]', '', flow[,'Final.Rate']) flow[,'rate'] <- as.numeric(sub('([0-9.]+).*', '\\1', flow[,'Final.Rate'])) # mar data for 4 fake drugs mar <- data.frame(mod_id=rep(1,5), Date=rep("2019-07-05",5), Time=c("07:12","07:31","08:47","09:16","10:22"), `med:mDrug`=c("Fakedrug2","Fakedrug1","Fakedrug2", "Fakedrug3","Fakedrug4"), `med:dosage`=c("30 mg","0.5 mcg","1 mg", "20 mg","3 mcg/kg/min"), `med:route`=rep("IV",5), `med:given`=rep("Given",5), check.names=FALSE) # medcheck file for drug of interest ('Fakedrug1') medcheck <- data.frame(medname="Fakedrug1",freq=4672) run_MedStrI(mar.path = mar, mar.columns = list(id = 'mod_id', datetime = c('Date','Time'), dose = 'med:dosage', drug = 'med:mDrug', given = 'med:given'), flow.path = flow, flow.columns = list(id = 'mod_id', datetime = 'Perform.Date', finalunits = 'Final.Units', unit = 'unit', rate = 'rate', weight = 'Final.Weight'), medchk.path = medcheck, check.path = tempdir(), drugname = 'fakedrg1')
# flow data for 'Fakedrug1' flow <- data.frame(mod_id=c(1,1,2,2,2), mod_id_visit=c(46723,46723,84935,84935,84935), record.date=c("07/05/2019 5:25","07/05/2019 6:01", "09/04/2020 3:21", "09/04/2020 4:39", "09/04/2020 5:32"), Final.Weight=c(6.75,6.75,4.5,4.5,4.5), Final.Rate=c(rep("1 mcg/kg/hr",2), rep("0.5 mcg/kg/hr",3)), Final.Units=c("3.375","6.5", "2.25","2.25","2.25")) flow[,'Perform.Date'] <- pkdata::parse_dates(flow[,'record.date']) flow[,'unit'] <- sub('.*[ ]', '', flow[,'Final.Rate']) flow[,'rate'] <- as.numeric(sub('([0-9.]+).*', '\\1', flow[,'Final.Rate'])) # mar data for 4 fake drugs mar <- data.frame(mod_id=rep(1,5), Date=rep("2019-07-05",5), Time=c("07:12","07:31","08:47","09:16","10:22"), `med:mDrug`=c("Fakedrug2","Fakedrug1","Fakedrug2", "Fakedrug3","Fakedrug4"), `med:dosage`=c("30 mg","0.5 mcg","1 mg", "20 mg","3 mcg/kg/min"), `med:route`=rep("IV",5), `med:given`=rep("Given",5), check.names=FALSE) # medcheck file for drug of interest ('Fakedrug1') medcheck <- data.frame(medname="Fakedrug1",freq=4672) run_MedStrI(mar.path = mar, mar.columns = list(id = 'mod_id', datetime = c('Date','Time'), dose = 'med:dosage', drug = 'med:mDrug', given = 'med:given'), flow.path = flow, flow.columns = list(id = 'mod_id', datetime = 'Perform.Date', finalunits = 'Final.Units', unit = 'unit', rate = 'rate', weight = 'Final.Weight'), medchk.path = medcheck, check.path = tempdir(), drugname = 'fakedrg1')
This module will load and modify structured e-prescription data.
run_MedStrII(file, dat.columns = list())
run_MedStrII(file, dat.columns = list())
file |
filename of prescription data (CSV, RData, RDS), or data.frame |
dat.columns |
a named list that should specify columns in data; ‘id’, ‘dose’, ‘freq’, ‘date’, and ‘str’ are required. ‘desc’ may also be specified. |
See EHR Vignette for Structured Data.
str data set
erx_data <- data.frame(GRID=paste0("ID",c(1,1,2,2,2,2)), MED_NAME=c("fakedrug","fakedrug","fakedrug", "Brandname","fakedrug","fakedrug"), RX_DOSE=c(1,2,1,'2 tabs',1,'1+1.5+1'), FREQUENCY=c(rep("bid",3),"qam","bid", "brkfst,lunch,dinner"), ENTRY_DATE=c("2018-02-15","2018-03-14","2017-07-01", "2017-07-01","2017-09-15","2017-11-01"), STRENGTH_AMOUNT=c("100","100","200", "100mg","100","100"), DESCRIPTION=c("fakedrug 100 mg tablet","fakedrug 100 mg tablet", "fakedrug 200 mg tablet (also known as brandname)", "Brandname 100mg tablet", "fakedrug 100 mg tablet", "fakedrug 100 mg tablet")) run_MedStrII(erx_data, list(id = 'GRID', dose = 'RX_DOSE', freq = 'FREQUENCY', date = 'ENTRY_DATE', str = 'STRENGTH_AMOUNT', desc = 'DESCRIPTION'))
erx_data <- data.frame(GRID=paste0("ID",c(1,1,2,2,2,2)), MED_NAME=c("fakedrug","fakedrug","fakedrug", "Brandname","fakedrug","fakedrug"), RX_DOSE=c(1,2,1,'2 tabs',1,'1+1.5+1'), FREQUENCY=c(rep("bid",3),"qam","bid", "brkfst,lunch,dinner"), ENTRY_DATE=c("2018-02-15","2018-03-14","2017-07-01", "2017-07-01","2017-09-15","2017-11-01"), STRENGTH_AMOUNT=c("100","100","200", "100mg","100","100"), DESCRIPTION=c("fakedrug 100 mg tablet","fakedrug 100 mg tablet", "fakedrug 200 mg tablet (also known as brandname)", "Brandname 100mg tablet", "fakedrug 100 mg tablet", "fakedrug 100 mg tablet")) run_MedStrII(erx_data, list(id = 'GRID', dose = 'RX_DOSE', freq = 'FREQUENCY', date = 'ENTRY_DATE', str = 'STRENGTH_AMOUNT', desc = 'DESCRIPTION'))
This function standardizes the dose entity.
stdzDose(x)
stdzDose(x)
x |
character vector of extracted dose values |
Some dose strings may include multiple values and additional interpretation may be needed. For example ‘2-1’ likely indicates a dose of 2 followed by a dose of 1. Currently it would be converted to the average of 1.5.
numeric vector
stdzDose(c('one tablet', '1/2 pill', '1-3 tabs'))
stdzDose(c('one tablet', '1/2 pill', '1-3 tabs'))
This function standardizes the dose change entity.
stdzDoseChange(x)
stdzDoseChange(x)
x |
character vector of extracted dose change values |
character vector
stdzDoseChange(c('decreasing','dropped','increased'))
stdzDoseChange(c('decreasing','dropped','increased'))
This function standardizes the dose schedule entity.
stdzDoseSchedule(x)
stdzDoseSchedule(x)
x |
character vector of extracted dose schedule values |
character vector
stdzDoseSchedule(c('tapered','weaned','TAPER'))
stdzDoseSchedule(c('tapered','weaned','TAPER'))
This function standardizes the duration entity.
stdzDuration(x)
stdzDuration(x)
x |
character vector of extracted duration values |
character vector
stdzDuration(c('1 month', 'three days', 'two-weeks'))
stdzDuration(c('1 month', 'three days', 'two-weeks'))
This function standardizes the frequency entity.
stdzFreq(x)
stdzFreq(x)
x |
character vector of extracted frequency values |
character vector
stdzFreq(c('in the morning', 'four times a day', 'with meals'))
stdzFreq(c('in the morning', 'four times a day', 'with meals'))
This function standardizes the route entity.
stdzRoute(x)
stdzRoute(x)
x |
character vector of extracted route values |
character vector
stdzRoute(c('oral', 'po', 'subcut'))
stdzRoute(c('oral', 'po', 'subcut'))
This function standardizes the strength entity.
stdzStrength(str, freq)
stdzStrength(str, freq)
str |
character vector of extracted strength values |
freq |
character vector of extracted frequency values |
Some strength strings may include multiple values and additional interpretation may be needed. For example ‘2-1’ likely indicates a strength of 2 followed by a strength of 1. Thus a single element may need to be standarized into two elements. This can only happen if the frequency entity is missing or in agreement (‘bid’ for example). See the ‘addl_data’ attribute of the returned vector.
numeric vector
stdzStrength(c('1.5', '1/2', '1/1/1')) stdzStrength(c('1.5', '1/2', '1/1/1'), c('am', 'daily', NA)) stdzStrength(c('1.5', '1/2', '1/1/1'), FALSE)
stdzStrength(c('1.5', '1/2', '1/1/1')) stdzStrength(c('1.5', '1/2', '1/1/1'), c('am', 'daily', NA)) stdzStrength(c('1.5', '1/2', '1/1/1'), FALSE)
An example dataset used in processLastDose
that contains lab time data. This dataset should
have one row per patient ID-date pair, and contain the time a lab was performed as a datetime variable.
data(tac_lab, package = 'EHR')
data(tac_lab, package = 'EHR')
A data frame with 2 observations on the following variables.
A character vector, patient ID associated with the lab value
A character vector, date associated with the lab value
A POSIXct vector, datetime at which the lab was performed formatted as YYYY-MM-DD HH:MM:SS
data(tac_lab)
data(tac_lab)
An example of the metadata needed for the processLastDose
,
makeDose
, and collapseDose
functions.
data(tac_metadata, package = 'EHR')
data(tac_metadata, package = 'EHR')
A data frame with 5 observations on the following variables.
A character vector, filename for the clinical note
A character vector, patient ID associated with the filename
A character vector, date associated with the filename
A character vector, note ID associated with the filename
data(tac_metadata)
data(tac_metadata)
The output after running parseMedExtractR
on 3 example clinical notes.
data(tac_mxr_parsed, package = 'EHR')
data(tac_mxr_parsed, package = 'EHR')
A data frame with 7 observations on the following variables.
A character vector, filename for the clinical note
A character vector, drug name extracted from the clinical note along with start and stop positions
A character vector, strengths extracted from the clinical note along with start and stop positions
A character vector, dose amounts extracted from the clinical note along with start and stop positions
A character vector, routes extracted from the clinical note along with start and stop positions
A character vector, frequencies extracted from the clinical note along with start and stop positions
A character vector, dose intakes extracted from the clinical note along with start and stop positions
A character vector, dose change keywords extracted from the clinical note along with start and stop positions
A character vector, last dose times extracted from the clinical note along with start and stop positions
data(tac_mxr_parsed)
data(tac_mxr_parsed)
Make contingency tables for many binary outcomes and a binary covariate
zeroOneTable(EXPOSURE, phenotype)
zeroOneTable(EXPOSURE, phenotype)
EXPOSURE |
binary covariate (e.g., exposure). |
phenotype |
binary outcome (e.g., phenotype). |
Generates frequency and contingency tables for many binary outcomes (e.g., large number of phenotypes) and a binary covariate (e.g., drug exposure, genotypes) more efficiently.
t00 |
frequency for non-exposed group and non-case outcome. |
t01 |
frequency for non-exposed group and case outcome. |
t10 |
frequency for exposed group and non-case outcome. |
t11 |
frequency for exposed group and case outcome. |
Leena Choi [email protected] and Cole Beck [email protected]
## full example data data(dataPheWAS) demo.covariates <- c('id','exposure','age','race','gender') phenotypeList <- setdiff(colnames(dd), demo.covariates) tablePhenotype <- matrix(NA, ncol=4, nrow=length(phenotypeList), dimnames=list(phenotypeList, c("n.nocase.nonexp", "n.case.nonexp", "n.nocase.exp", "n.case.exp"))) for(i in seq_along(phenotypeList)) { tablePhenotype[i, ] <- zeroOneTable(dd[, 'exposure'], dd[, phenotypeList[i]]) }
## full example data data(dataPheWAS) demo.covariates <- c('id','exposure','age','race','gender') phenotypeList <- setdiff(colnames(dd), demo.covariates) tablePhenotype <- matrix(NA, ncol=4, nrow=length(phenotypeList), dimnames=list(phenotypeList, c("n.nocase.nonexp", "n.case.nonexp", "n.nocase.exp", "n.case.exp"))) for(i in seq_along(phenotypeList)) { tablePhenotype[i, ] <- zeroOneTable(dd[, 'exposure'], dd[, phenotypeList[i]]) }