Title: | Benchmark Suite for Indirect Methods for RI Estimation |
---|---|
Description: | The provided benchmark suite enables the automated evaluation and comparison of any existing and novel indirect method for reference interval ('RI') estimation in a systematic way. Indirect methods take routine measurements of diagnostic tests, containing pathological and non-pathological samples as input and use sophisticated statistical methods to derive a model describing the distribution of the non-pathological samples, which can then be used to derive reference intervals. The benchmark suite contains 5,760 simulated test sets with varying difficulty. To include any indirect method, a custom wrapper function needs to be provided. The package offers functions for generating the test sets, executing the indirect method and evaluating the results. See ?RIbench or vignette("RIbench_package") for a more comprehensive description of the features. A detailed description and application is described in Ammer T., Schuetzenmeister A., Prokosch H.-U., Zierk J., Rank C.M., Rauh M. "RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation". Clinical Chemistry (2022) <doi:10.1093/clinchem/hvac142>. |
Authors: | Tatjana Ammer [aut, cre], Christopher Rank [aut], Andre Schuetzenmeister [aut] |
Maintainer: | Tatjana Ammer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.2 |
Built: | 2024-10-31 06:45:45 UTC |
Source: | CRAN |
RIbench enables the automated evaluation and comparison of any existing and novel indirect method in a systematic way.
Indirect methods take routine measurements of diagnostic tests, containing pathological and non-pathological samples
as input and use sophisticated statistical methods to derive a model describing the distribution of the non-pathological
samples, which can then be used to derive reference intervals. The benchmark suite contains 5,760 simulated data sets
with varying difficulty.
To include any indirect method, a custom wrapper function needs to be provided.
The package offers functions for generating the test sets generateBiomarkerTestSets
,
executing the indirect method evaluateBiomarkerTestSets
and evaluating the results evaluateAlgorithmResults
.
Package: | RIbench |
Type: | Package |
Version: | 1.0.2 |
Date: | 2022-11-25 |
License: | GPL (>=3) |
LazyLoad: | yes |
Tatjana Ammer [email protected], Christopher M Rank [email protected], Andre Schuetzenmeister [email protected]
Ammer, T., Schuetzenmeister, A., Prokosch, HU., Zierk, J., Rank, C.M., Rauh, M. RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation. Clin Chem (2022) [Accepted, July 12].
It is possible to use automatically determined grid lines (x=NULL, y=NULL
) or specifying the number
of cells x = 3, y = 4
as done by grid
. Additionally, x- and y-locations of grid-lines can be specified,
e.g. x = 1:10, y = seq(0,10,2)
.
addGrid(x = NULL, y = NULL, col = "lightgray", lwd = 1L, lty = 3L)
addGrid(x = NULL, y = NULL, col = "lightgray", lwd = 1L, lty = 3L)
x |
(integer, numeric) single integer specifies number of cells, numeric vector specifies vertical grid-lines |
y |
(integer, numeric) single integer specifies number of cells, numeric vector specifies horizontal grid-lines |
col |
(character) color of grid-lines |
lwd |
(integer) line width of grid-lines |
lty |
(integer) line type of grid-lines |
Andre Schuetzenmeister [email protected]
Function takes the name of a color and converts it into the rgb space. Parameter "alpha" allows to specify the transparency within [0,1], 0 meaning completey transparent and 1 meaning completey opaque. If an RGB-code is provided and alpha != 1, the RGB-code of the transparency adapted color will be returned.
as.rgb(col = "black", alpha = 1)
as.rgb(col = "black", alpha = 1)
col |
(character) name of the color to be converted/transformed into RGB-space (code). Only those colors can be used which are part of the set returned by function colors(). Defaults to "black". |
alpha |
(numeric) value specifying the transparency to be used, 0 = completely transparent, 1 = opaque. |
RGB-code
Andre Schuetzenmeister [email protected]
One-parameter Box-Cox transformation.
BoxCox(x, lambda)
BoxCox(x, lambda)
x |
(numeric) data to be transformed |
lambda |
(numeric) Box-Cox transformation parameter |
(numeric) vector with Box-Cox transformation of x
Andre Schuetzenmeister [email protected]
Function to simulate the direct method
computeDirect( N = 120, analyte, params, seed = 123, NIter = 10000, RIperc = c(0.025, 0.975) )
computeDirect( N = 120, analyte, params, seed = 123, NIter = 10000, RIperc = c(0.025, 0.975) )
N |
(integer) specifying the number of samples used as sample size for the direct method, default: 120 |
analyte |
(character) specifying the biomarker that is currently simulated |
params |
(list) of parameters for non-pathological distribution (nonp_mu, nonp_sigma, nonp_lambda, and nonp_shift) |
seed |
(integer) specifying the seed used for the simulation, default: 123 |
NIter |
(integer) specifiyng the number of times N samples should be drawn out of the simulated non-pathological distribution (default: 10,000) |
RIperc |
(numeric) value specifying the percentiles, which define the reference interval |
(data frame) with the estimated reference intervals for NIter iterations
Tatjana Ammer [email protected]
Function for computing performance measurements
computePerfMeas( analyte, algo, resRIs, subTable, RIperc = c(0.025, 0.975), cutoffZ = 5 )
computePerfMeas( analyte, algo, resRIs, subTable, RIperc = c(0.025, 0.975), cutoffZ = 5 )
analyte |
(character) specifiyng current analyzed analyte |
algo |
(character) specifying used algorithm |
resRIs |
(data.frame) with all calculated reference intervals |
subTable |
(data.frame) containing all information about the simulated test sets |
RIperc |
(numeric) vector specifying the percentiles for the reference interval, default: 0.025 and 0.975 |
cutoffZ |
(numeric) specifying if a cutoff should be used to classify results as implausible and exclude from analysis |
updated data frame with computed performance measures
Tatjana Ammer [email protected]
Function for computing reference intervals for all markers
computePerfMeasAll(analytes, algo, risIn, tableTCs, cutoffZ = 5)
computePerfMeasAll(analytes, algo, risIn, tableTCs, cutoffZ = 5)
analytes |
(character) listing all analytes for which the result files should be parsed |
algo |
(character) specifying used algorithm |
risIn |
(list) with data frame of all calculated reference intervals |
tableTCs |
(data.frame) containing all information about the simulated test sets |
cutoffZ |
(integer) specifying if and if so which cutoff should be used to classify results as implausible (default: 5) |
list with the calculated errors as data frame for each marker
Tatjana Ammer [email protected]
Function for computing reference intervals
computeRIs( analyte, algo, results, tableTCs, RIperc = c(0.025, 0.975), truncNormal = FALSE )
computeRIs( analyte, algo, results, tableTCs, RIperc = c(0.025, 0.975), truncNormal = FALSE )
analyte |
(character) specifiyng analyte |
algo |
(character) specifying used algorithm |
results |
(list) with all calculated results as RWDRI objects |
tableTCs |
(data frame) containing all information about the simulated test sets |
RIperc |
(numeric) vector specifying the percentiles for the reference interval, default: 0.025 and 0.975 |
truncNormal |
(logical) specifying if a normal distribution truncated at zero shall be assumed |
data frame with computed reference intervals
Tatjana Ammer [email protected]
Function for computing reference intervals for all markers
computeRIsAll(analytes, algo, resIn, tableTCs, truncNormal = FALSE)
computeRIsAll(analytes, algo, resIn, tableTCs, truncNormal = FALSE)
analytes |
(character) listing all markers for which the result files should be parsed |
algo |
(character) specifying used algorithm |
resIn |
(list) with all calculated results for all markers as RWDRI objects |
tableTCs |
(data.frame) containing all information about the simulated test sets |
truncNormal |
(logical) specifying if a normal distribution truncated at zero shall be assumed |
list with the calculated reference intervals as data frame for each marker
Tatjana Ammer [email protected]
Function to compute runtime statistics for all analytes
computeRuntimeAll(analytes, algo, risIn, tableTCs)
computeRuntimeAll(analytes, algo, risIn, tableTCs)
analytes |
(character) listing all analytes for which the result files should be parsed |
algo |
(character) specifying used algorithm |
risIn |
(list) with data frame of all calculated reference intervals and runtime |
tableTCs |
(data.frame) containing all information about the simulated test cases |
(list) wit runtime statistics per analyte and data frames with raw runtime overall and per analyte
Tatjana Ammer [email protected]
Helper function to compute the subscores for the distribution types and the mentioned categories
computeSubResults( errorDf, tableTCs, distCat, errorParam, catList, catLabels, perfCombination = "mean" )
computeSubResults( errorDf, tableTCs, distCat, errorParam, catList, catLabels, perfCombination = "mean" )
errorDf |
(data frame) containing the estimate reference intervals and all computed error measures |
tableTCs |
(data.frame) containing all information about the simulated test sets |
distCat |
(character) specifying the distribution category |
errorParam |
(character) specifiying for which error parameter the data frame should be generated |
catList |
(character) vector containing the categories to split the dataset |
catLabels |
(character) vector containing the labels that will be used for the categories |
perfCombination |
(character) specifying if mean (default), median or sum should be computed |
(data frame) containing the computed subscores
Tatjana Ammer [email protected]
Function for defining a subset that is used for analyizing the computation time and can be used for other subanalyses.
defineSubset(tableTCs = NULL, N = 50, seed = 123)
defineSubset(tableTCs = NULL, N = 50, seed = 123)
tableTCs |
(data frame) describing the pre-defined testcases |
N |
(integer) describing the number of testcases per biomarker contained in the subset (default: 50) |
seed |
(integer) specifying the seed used for defining the subset, default: 123 |
(data frame) describing the updated table with all test case definitions.
Tatjana Ammer [email protected]
Convenience Function to generate all result plots and calculate the benchmark score
evaluateAlgorithmResults( workingDir = "", algoNames = NULL, subset = "all", evalFolder = "Evaluation", withDirect = TRUE, withMean = TRUE, outline = TRUE, errorParam = c("zzDevAbs_Ov", "AbsPercError_Ov", "AbsError_Ov"), cutoffZ = 5, cols = NULL, ... )
evaluateAlgorithmResults( workingDir = "", algoNames = NULL, subset = "all", evalFolder = "Evaluation", withDirect = TRUE, withMean = TRUE, outline = TRUE, errorParam = c("zzDevAbs_Ov", "AbsPercError_Ov", "AbsError_Ov"), cutoffZ = 5, cols = NULL, ... )
workingDir |
(character) specifying the working directory: Plots will be stored in 'workingDir/evalFolder' and results will be used from 'workingDir/Results/algoName/biomarker' |
algoNames |
(character) vector specifying all algorithms that should be part of the evaluation |
subset |
(character, numeric, or data.frame) to specify for which subset the algorithm should be evaluated. character options: 'all' (default) for all test sets, a distribution type: 'normal', 'skewed', 'heavilySkewed', 'shifted'; a biomarker: 'Hb', 'Ca', 'FT4', 'AST', 'LACT', 'GGT', 'TSH', 'IgE', 'CRP', 'LDH'; 'Runtime' for runtime analysis subset; numeric option: number of test sets per biomarker, e.g. 10; data.frame: customized subset of table with test set specifications |
evalFolder |
(character) specifying the name of the ouptut directory, Plots will be stored in workingDir/evalFolder, default: 'Evaluation' |
withDirect |
(logical) indicating whether the direct method should be simulated for comparison (default:TRUE) |
withMean |
(logical) indicating whether the mean should be plotted as well (default: TRUE) |
outline |
(logical) indicating whether outliers should be drawn (TRUE, default), or not (FALSE) |
errorParam |
(character) specifying for which error parameter the data frame should be generated, choose between absolute z-score deviation ("zzDevAbs_Ov"), absolute percentage error ("AbsPercError_Ov"), and absolute error ("AbsError_Ov") |
cutoffZ |
(integer) specifying if and if so which cutoff for the absolute z-score deviation should be used to classify results as implausible and exclude them from the overall benchmark score (default: 5) |
cols |
(character) vector specifying the colors used for the different algorithms |
... |
additional arguments to be passed to the method, e.g. "truncNormal" (logical) vector specifying if a normal distribution truncated at zero shall be assumed, can be either TRUE/FALSE or a vector with TRUE/FALSE for each algorithm; "colDirect" (character) specifying the color used for the direct method, default: "grey" "ylab" (character) specifying the label for the y-axis |
(data frame) containing the computed benchmark results
Tatjana Ammer [email protected]
## Not run: # Ensure that 'generateBiomarkerTestSets()' and 'evaluateBiomarkerTestSets() is called # with the same workingDir and for all mentioned algorithms before calling this function. # first example, evaluation for several algorithms benchmarkScore <- evaluateAlgorithmResults(workingDir=tempdir(), algoNames=c("Hoffmann", "TML", "kosmic", "TMC", "refineR")) # The function will create several plots saved in workingDir/Evaluation. # second example, evaluation for only one algorithm and a defined subset benchmarkScore <- evaluateAlgorithmResults(workingDir = tempdir(), algoNames = "refineR", subset = 'Ca') # third example, saving the results in a different folder, and setting a different cutoff # for the absolute z-score deviation benchmarkScore <- evaluateAlgorithmResults(workingDir = tempdir(), algoNames = "refineR", subset = 'Ca', cutoffZ = 4, evalFolder = "Eval_Test") ## End(Not run)
## Not run: # Ensure that 'generateBiomarkerTestSets()' and 'evaluateBiomarkerTestSets() is called # with the same workingDir and for all mentioned algorithms before calling this function. # first example, evaluation for several algorithms benchmarkScore <- evaluateAlgorithmResults(workingDir=tempdir(), algoNames=c("Hoffmann", "TML", "kosmic", "TMC", "refineR")) # The function will create several plots saved in workingDir/Evaluation. # second example, evaluation for only one algorithm and a defined subset benchmarkScore <- evaluateAlgorithmResults(workingDir = tempdir(), algoNames = "refineR", subset = 'Ca') # third example, saving the results in a different folder, and setting a different cutoff # for the absolute z-score deviation benchmarkScore <- evaluateAlgorithmResults(workingDir = tempdir(), algoNames = "refineR", subset = 'Ca', cutoffZ = 4, evalFolder = "Eval_Test") ## End(Not run)
Wrapper function to evaluate all test sets or a specified subset for a specified algorithm.
evaluateBiomarkerTestSets( workingDir = "", algoName = "refineR", algoFunction = "findRI", libs = "refineR", sourceFiles = NULL, params = NULL, requireDecimals = FALSE, requirePercentiles = FALSE, subset = "all", timeLimit = 14400, verbose = TRUE, showWarnings = FALSE, ... )
evaluateBiomarkerTestSets( workingDir = "", algoName = "refineR", algoFunction = "findRI", libs = "refineR", sourceFiles = NULL, params = NULL, requireDecimals = FALSE, requirePercentiles = FALSE, subset = "all", timeLimit = 14400, verbose = TRUE, showWarnings = FALSE, ... )
workingDir |
(character) specifying the working directory: Results will be stored in 'workingDir/Results/algo/biomarker' and data will be used from 'workingDir/Data/biomarker' |
algoName |
(character) specifying the name of the algorithm that is evaluated |
algoFunction |
(character) specifying the name of the function needed for estimating RIs |
libs |
(list) containing all libraries needed for executing the algorithm |
sourceFiles |
(list) containing all source files needed for executing the algorithm |
params |
(list) with additional parameters needed for calling algoFunction |
requireDecimals |
(logical) indicating whether the algorithm needs the number of decimal places (TRUE) or not (FALSE, default) |
requirePercentiles |
(logical) indicating whether only percentiles and no model is estimated |
subset |
(character, numeric, or data.frame) to specify for which subset the algorithm should be executed. character options: 'all' (default) for all test sets; a distribution type: 'normal', 'skewed', 'heavilySkewed', 'shifted'; a biomarker: 'Hb', 'Ca', 'FT4', 'AST', 'LACT', 'GGT', 'TSH', 'IgE', 'CRP', 'LDH'; 'Runtime' for runtime analysis subset; numeric option: number of test sets per biomarker, e.g. 10; data.frame: customized subset of table with test set specifications |
timeLimit |
(integer) specifying the maximum amount of time in seconds allowed to execute one single estimation (default: 14400 sec (4h)) |
verbose |
(logical) indictaing if the progress counter should be shown (default: TRUE) |
showWarnings |
(logical) indicating whether warnings from the call to the indirect method/algorithm should be shown (default: FALSE) |
... |
additional arguments to be passed to the method, e.g. specified in- and output directory ('inputDir', 'outputDir') |
(data frame) containing information about the test sets where the algorithm terminated the R session or failed to report a result
Tatjana Ammer [email protected]
## Not run: # The evaluation of all test sets can take several hours depending on # the computation time of the algorithm. # Wrapper function for indirect method required, see vignette("RIbench_package") # Ensure that 'generateBiomarkerTestSets()' is called with the same workingDir # before calling this function. # first generic example evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), sourceFiles = list("C:\\Temp\\MyAlgoWrapper.R"), requireDecimals = FALSE, requirePercentiles = FALSE, subset ='all', timeLimit = 14400) # second example, evaluation for only 'Calcium' test sets. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = "Ca") # third example, evaluation for only a subset testsets that follow a skewed distribution. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = "skewed") # forth example, evaluation for a subset of 3 testsets per biomarker. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = 3) # fifth example, evaluation for a customized subset with all test sets that have # a pathological fraction <= 30%. testsets <- loadTestsetDefinition() progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = testsets[testsets$fractionPathol <= 0.3,] ) # sixth example, evaluation forwarding additional parameters to the 'algoFunction' progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), sourceFiles = list("Test_RIEst_2pBoxCox"), params = list("model='2pBoxCox'")) # seventh example, evaluation for indirect method that requires the number of # decimal points as input evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModelDec', libs = c('myOwnAlgo'), sourceFiles = "C:\\Temp\\Test_RIEst_dec.R", requireDecimals = TRUE) # eigth example, evaluation for indirect method that directly estimates the percentiles evaluateBiomarkerTestSets(workingDir = tempdir(), algoName="myOwnAlgo", algoFunction="estimateRIs", libs="myOwnAlgo", sourceFiles = "C:\\Temp\\Test_RIEst.R", requirePercentiles=TRUE) ## End(Not run)
## Not run: # The evaluation of all test sets can take several hours depending on # the computation time of the algorithm. # Wrapper function for indirect method required, see vignette("RIbench_package") # Ensure that 'generateBiomarkerTestSets()' is called with the same workingDir # before calling this function. # first generic example evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), sourceFiles = list("C:\\Temp\\MyAlgoWrapper.R"), requireDecimals = FALSE, requirePercentiles = FALSE, subset ='all', timeLimit = 14400) # second example, evaluation for only 'Calcium' test sets. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = "Ca") # third example, evaluation for only a subset testsets that follow a skewed distribution. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = "skewed") # forth example, evaluation for a subset of 3 testsets per biomarker. progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = 3) # fifth example, evaluation for a customized subset with all test sets that have # a pathological fraction <= 30%. testsets <- loadTestsetDefinition() progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), subset = testsets[testsets$fractionPathol <= 0.3,] ) # sixth example, evaluation forwarding additional parameters to the 'algoFunction' progress <- evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModel', libs = c('myOwnAlgo'), sourceFiles = list("Test_RIEst_2pBoxCox"), params = list("model='2pBoxCox'")) # seventh example, evaluation for indirect method that requires the number of # decimal points as input evaluateBiomarkerTestSets(workingDir = tempdir(), algoName = 'myOwnAlgo', algoFunction = 'estimateModelDec', libs = c('myOwnAlgo'), sourceFiles = "C:\\Temp\\Test_RIEst_dec.R", requireDecimals = TRUE) # eigth example, evaluation for indirect method that directly estimates the percentiles evaluateBiomarkerTestSets(workingDir = tempdir(), algoName="myOwnAlgo", algoFunction="estimateRIs", libs="myOwnAlgo", sourceFiles = "C:\\Temp\\Test_RIEst.R", requirePercentiles=TRUE) ## End(Not run)
Rounding method with trailing zeros.
formatNumber(x, digits)
formatNumber(x, digits)
x |
(numeric) value that is rounded |
digits |
(integer) indicating the number of decimal places to be used |
Rounded value with trailing zeros
Christopher Rank [email protected]
Convenience function to generate simulated data and save each test set as a separate file
generateBiomarkerTestSets( workingDir = "", subset = "all", rounding = TRUE, verbose = TRUE )
generateBiomarkerTestSets( workingDir = "", subset = "all", rounding = TRUE, verbose = TRUE )
workingDir |
(character) specifying the working directory where 'workingDir/Data/biomarker' folders will be generated containing the simulated data |
subset |
(character, numeric, or data.frame) to specify for which subset the data should be generated and the algorithms later applied to. character options: 'all' (default) for all test sets; a distribution type: 'normal', 'skewed', 'heavilySkewed', 'shifted'; a biomarker: 'Hb', 'Ca', 'FT4', 'AST', 'LACT', 'GGT', 'TSH', 'IgE', 'CRP', 'LDH'; 'Runtime' for runtime analysis subset; numeric option: number of test sets per biomarker, e.g. 10; data.frame: customized subset of table with test set specification |
rounding |
(logical) indicating whether decimal places stated in test set specification should be applied (default, TRUE), if FALSE, data will be rounded to 5 decimal places to mimic unrounded data |
verbose |
(logical) indicating if the progress counter should be shown (default: TRUE) |
No return value, instead the data files are generated and saved in the workingDir
Tatjana Ammer [email protected]
## Not run: workingDir <- "C:\\Temp\\RIbench\\" generateBiomarkerTestSets(workingDir = workingDir) ## End(Not run) # example generating a subset of 2 test sets per biomarker generateBiomarkerTestSets(workingDir = tempdir(), subset = 2)
## Not run: workingDir <- "C:\\Temp\\RIbench\\" generateBiomarkerTestSets(workingDir = workingDir) ## End(Not run) # example generating a subset of 2 test sets per biomarker generateBiomarkerTestSets(workingDir = tempdir(), subset = 2)
Wrapper function to generate one boxplot for a specified analyte
generateBoxPlotOneAnalyte( errorListAll, colList, nameList, catList, catLabels, a, errorParam, outline = TRUE, withMean = TRUE, withCats = TRUE, withDirect = TRUE, titlePart = NULL, outputDir, filenamePart = NULL, ylim1 = c(0, 100), ylim2 = c(100, 1000), ... )
generateBoxPlotOneAnalyte( errorListAll, colList, nameList, catList, catLabels, a, errorParam, outline = TRUE, withMean = TRUE, withCats = TRUE, withDirect = TRUE, titlePart = NULL, outputDir, filenamePart = NULL, ylim1 = c(0, 100), ylim2 = c(100, 1000), ... )
errorListAll |
(list) containing the overall benchmark results per algorithm |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
catList |
(character) vector specifying the categories for which the boxes should be drawn |
catLabels |
(character) vector specifying the labels to the associated categories used for the x-axis |
a |
(character) specifying the analyte for which the boxplot should be generated |
errorParam |
(charcter) specifying for which error measure the plot should be generated |
outline |
(logical) indicating whether outliers should be drawn (TRUE, default), or not (FALSE) |
withMean |
(logical) indicating whether the mean should be plotted as well (default: TRUE) |
withCats |
(logical) set to TRUE if categories (e.g. pathological fraction) should be plotted (default: FALSE) |
withDirect |
(logical) indicating whether the box of the direct method should be elongated to facilitate comparison (default:TRUE) |
titlePart |
(character) specifying the latter part of the title |
outputDir |
(character) specifying a output directory |
filenamePart |
(character) specifying a filename for the plot |
ylim1 |
(numeric) vector specifying the limits in y-direction for the first granular scale |
ylim2 |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Wrapper function to generate all boxplots for the specified distribution types split by defined categories
generateBoxplotsDistTypes( errorListAll, colList, nameList, catList, catLabels, errorParam = "zzDevAbs_Ov", outline = TRUE, withMean = TRUE, withDirect = TRUE, withCats = TRUE, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim1Vec = NULL, ylim2Vec = NULL, yticks1Vec = NULL, yticks2Vec = NULL, ... )
generateBoxplotsDistTypes( errorListAll, colList, nameList, catList, catLabels, errorParam = "zzDevAbs_Ov", outline = TRUE, withMean = TRUE, withDirect = TRUE, withCats = TRUE, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim1Vec = NULL, ylim2Vec = NULL, yticks1Vec = NULL, yticks2Vec = NULL, ... )
errorListAll |
(list) containing the overall benchmark results per algorithm |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
catList |
(character) vector specifying the categories for which the boxes should be drawn |
catLabels |
(character) vector specifying the labels to the associated categories used for the x-axis |
errorParam |
(charcter) specifying for which error measure the plot should be generated |
outline |
(logical) indicating whether outliers should be drawn (TRUE, default), or not (FALSE) |
withMean |
(logical) indicating whether the mean should be plotted as well (default: TRUE) |
withDirect |
(logical) indicating whether the box of the direct method should be elongated to facilitate comparison (default:TRUE) |
withCats |
(logical) set to TRUE if categories (e.g. pathological fraction) should be plotted (default: FALSE) |
titlePart |
(character) specifying the latter part of the title |
outputDir |
(character) specifying a output directory |
filenamePart |
(character) specifying a filename for the plot |
ylim1Vec |
(numeric) vector specifying the limits in y-direction for the first granular scale |
ylim2Vec |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
yticks1Vec |
(numeric) vector specifying the ticks in y-direction for the first granular scale |
yticks2Vec |
(numeric) vector specifying the ticks in y-direction for the second less detailed scale |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Wrapper function to generate all boxplots for the specified analytes split by defined categories
generateBoxplotsMultipleCats( analytes, errorListAll, colList, nameList, category = c("fractionPathol", "fractionPathol_cum", "N", "N_cum", "OvFreq", "OvFreq_cum"), catList = NULL, catLabels = NULL, errorParam = "zzDevAbs_Ov", outline = TRUE, withMean = TRUE, withDirect = TRUE, withCats = TRUE, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim1Vec = NULL, ylim2Vec = NULL, yticks1Vec = NULL, yticks2Vec = NULL, ... )
generateBoxplotsMultipleCats( analytes, errorListAll, colList, nameList, category = c("fractionPathol", "fractionPathol_cum", "N", "N_cum", "OvFreq", "OvFreq_cum"), catList = NULL, catLabels = NULL, errorParam = "zzDevAbs_Ov", outline = TRUE, withMean = TRUE, withDirect = TRUE, withCats = TRUE, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim1Vec = NULL, ylim2Vec = NULL, yticks1Vec = NULL, yticks2Vec = NULL, ... )
analytes |
(character) vector specifying for which analytes the plots should be generated |
errorListAll |
(named list) containing the overall benchmark results per algorithm (names of list elements should be the names of the algorithms) |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
category |
(character) defining the category used for creating the subsets. All defined sub-features are used for the categorization. Choose from "fractionPathol" (default), "N", or "OvFreq", individual or cumulative ("_cum"); if category is set this will be used to define catList and catLabels |
catList |
(character) vector specifying the categories for which the boxes should be drawn |
catLabels |
(character) vector specifying the labels to the associated categories used for the x-axis |
errorParam |
(charcter) specifying for which error measure the plot should be generated |
outline |
(logical) indicating whether outliers should be drawn (TRUE, default), or not (FALSE) |
withMean |
(logical) indicating whether the mean should be plotted as well (default: TRUE) |
withDirect |
(logical) indicating whether the box of the direct method should be elongated to facilitate comparison (default:TRUE) |
withCats |
(logical) set to TRUE if categories (e.g. pathological fraction) should be plotted (default: FALSE) |
titlePart |
(character) specifying the latter part of the title |
outputDir |
(character) specifying an output directory |
filenamePart |
(character) specifying a filename for the plot |
ylim1Vec |
(numeric) vector specifying the limits in y-direction for the first granular scale |
ylim2Vec |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
yticks1Vec |
(numeric) vector specifying the ticks in y-direction for the first granular scale |
yticks2Vec |
(numeric) vector specifying the ticks in y-direction for the second less detailed scale |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Generate simulated data with one start seed for each biomarker and save each test set as separate file
generateDataFiles( tableTCs = NULL, outputDir = NULL, rounding = TRUE, verbose = TRUE )
generateDataFiles( tableTCs = NULL, outputDir = NULL, rounding = TRUE, verbose = TRUE )
tableTCs |
(data.frame) containing all information about the simulated test cases |
outputDir |
(character) specifying the output directory where the data files should be written to |
rounding |
(logical) indicating whether decimal places stated in tableTCs should be applied (default, TRUE), if FALSE, data will be rounded to 5 decimal places to mimic unrounded data |
verbose |
(logical) indicating if the progress counter should be shown (default: TRUE) |
No return value, instead the data files are generated
Tatjana Ammer [email protected]
Generate an MD5 hash sum for any R object.
generateMD5(x)
generateMD5(x)
x |
(object) any R object. |
(character) MD5 hash sum of the input object.
Christopher Rank [email protected]
Wrapper function to generate scatterplots for the specified analytes
generateScatterplotsAll( analytes, errorListAll, colList = NULL, nameList, tableTCs, errorParam = "zzDevAbs", withColorCat = NULL, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim = NULL, xlim = NULL, xlab = NULL, ylab = NULL, ... )
generateScatterplotsAll( analytes, errorListAll, colList = NULL, nameList, tableTCs, errorParam = "zzDevAbs", withColorCat = NULL, titlePart = NULL, outputDir = NULL, filenamePart = NULL, ylim = NULL, xlim = NULL, xlab = NULL, ylab = NULL, ... )
analytes |
(character) vector specifying for which analytes the scatterplot should be generated |
errorListAll |
(list) containing the overall benchmark results per algorithm |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
tableTCs |
(data frame) containing all test case information |
errorParam |
(character) specifying for which error measure the plot should be generated |
withColorCat |
(character) indicating if plot should be colored according to the pathological fraction ("fractionPathol"), sample size ("N"), or "overlapPatholLeft", "overlapPatholRight" |
titlePart |
(character) specifying the latter part of the title |
outputDir |
(character) specifying a output directory |
filenamePart |
(character) specifying a filename for the plot |
ylim |
(numeric) vector specifying the limits in y-direction for the first granular scale |
xlim |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
xlab |
(character) specifying x-axis label |
ylab |
(character) specifying y-axis label |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Computing benchmark table with the mean overall results.
getBenchmarkResults( errorList, nameVec, tableTCs, errorParam = "zzDevAbsCutoff_Ov", cutoffZ = 5, catList = c("fractionPathol <= 0.20 & N <= 5000", "fractionPathol <= 0.20 & N > 5000", "fractionPathol > 0.20 & N <= 5000", "fractionPathol > 0.20 & N > 5000"), catLabels = c("lowPlowN", "lowPhighN", "highPlowN", "highPhighN"), perfCombination = c("mean", "median", "sum") )
getBenchmarkResults( errorList, nameVec, tableTCs, errorParam = "zzDevAbsCutoff_Ov", cutoffZ = 5, catList = c("fractionPathol <= 0.20 & N <= 5000", "fractionPathol <= 0.20 & N > 5000", "fractionPathol > 0.20 & N <= 5000", "fractionPathol > 0.20 & N > 5000"), catLabels = c("lowPlowN", "lowPhighN", "highPlowN", "highPhighN"), perfCombination = c("mean", "median", "sum") )
errorList |
(list) containing the the computed errors for the different (indirect) methods/algorithms |
nameVec |
(character) vector specifying the names of the different (indirect) methods/algorithms |
tableTCs |
(data.frame) containing all information about the simulated test sets |
errorParam |
(character) specifying for which error parameter the data frame should be generated |
cutoffZ |
(integer) specifying if and if so which cutoff for the absolute z-score deviation should be used to classify results as implausible and exclude them from the overall benchmark score (default: 5) |
catList |
(character) vector containing the categories to split the dataset |
catLabels |
(character) vector containing the labels that will be used for the categories |
perfCombination |
(character) specifying which measure should be used to compute the overall benchmark score; choose from "mean" (default), "median", or "sum" |
(data frame) containing the computed benchmark results
Tatjana Ammer [email protected]
Method to calculate reference intervals (percentiles) for objects of class 'RWDRI'
getRI( x, RIperc = c(0.025, 0.975), CIprop = 0.95, pointEst = c("fullDataEst", "medianBS", "meanBS"), truncNormal = FALSE, Scale = c("original", "transformed") )
getRI( x, RIperc = c(0.025, 0.975), CIprop = 0.95, pointEst = c("fullDataEst", "medianBS", "meanBS"), truncNormal = FALSE, Scale = c("original", "transformed") )
x |
(object) of class 'RWDRI' |
RIperc |
(numeric) value specifying the percentiles, which define the reference interval |
CIprop |
(numeric) value specifying the central region for estimation of confidence intervals |
pointEst |
(character) specifying the point estimate determination: (1) using the full dataset ("fullDataEst"), (2) calculating the median from all bootstrap samples ("medianBS"), (2) works only if NBootstrap > 0 (3) calculating the mean from all bootstrap samples ("meanBS"), (3) works only if NBootstrap > 0 |
truncNormal |
(logical) specifying if a normal distribution truncated at zero shall be assumed |
Scale |
(character) specifying if percentiles are calculated on the original scale ("Or") or the transformed scale ("Tr") |
(data.frame) with columns for percentile, point estimate and confidence intervals.
Christopher Rank [email protected], Tatjana Ammer [email protected]
Function for retrieving reference intervals if directly computed
getRIsAllwithoutModel(analytes, algo, resIn, tableTCs)
getRIsAllwithoutModel(analytes, algo, resIn, tableTCs)
analytes |
(character) listing all markers for which the result files should be parsed |
algo |
(character) specifying used algorithm |
resIn |
(list) with all calculated results for all markers as RWDRI objects |
tableTCs |
(data.frame) containing all information about the simulated test sets |
list with the calculated reference intervals as data frame for each marker
Tatjana Ammer [email protected]
Helper function to compute runtime statistics
getRuntime(x, analyte)
getRuntime(x, analyte)
x |
(data.frame) with one column specifying the Runtime |
analyte |
(character) specifying current analyzed marker |
(data.frame) containing runtime statistics (min, mean, median, max)
Tatjana Ammer [email protected]
The feature can either be the pathological fraction, the sample size or the overlap (category) individually or cumulative (_cum).
For an individualized categorisation see getSubsetForDefinedCats
.
getSubset( subsetDef, distType = FALSE, tableTCs, errorList, category = c("fractionPathol", "fractionPathol_cum", "N", "N_cum", "OvFreq", "OvFreq_cum"), restrict = NULL )
getSubset( subsetDef, distType = FALSE, tableTCs, errorList, category = c("fractionPathol", "fractionPathol_cum", "N", "N_cum", "OvFreq", "OvFreq_cum"), restrict = NULL )
subsetDef |
(character) listing either the analytes or distribution types for which the result files should be parsed |
distType |
(logical) indicating if parameter subsetDef refers to analytes (FALSE, default) or distribution types (TRUE) |
tableTCs |
(data.frame) containing all information about the simulated test sets |
errorList |
(list) containing for each method the table with the computed error measurements |
category |
(character) defining the category used for creating the subsets. All defined sub-features are used for the categorization. Choose from "fractionPathol" (default), "N", or "OvFreq", individual or cumulative ("_cum") |
restrict |
(character) indicating whether test sets should be filtered according to specified restriction, default NULL, e.g. fractionPathol <= 0.30 |
(list) containing the performance measurements grouped according to specified subset definition and categories
Tatjana Ammer [email protected]
Function to group the data according to a specified feature.
getSubsetForDefinedCats( subsetDef, distType = FALSE, tableTCs, errorList, catList = NULL, catLabels = NULL, restrict = NULL )
getSubsetForDefinedCats( subsetDef, distType = FALSE, tableTCs, errorList, catList = NULL, catLabels = NULL, restrict = NULL )
subsetDef |
(character) listing either the analytes or distribution types for which the result files should be parsed |
distType |
(logical) indicating if 'subsetDef' refers to analytes (FALSE, default) or distribution types (TRUE) |
tableTCs |
(data.frame) containing all information about the simulated test sets |
errorList |
(list) containing the table with the computed error measurements |
catList |
(list) containing the categories to split the dataset |
catLabels |
(list) containing the labels that will be used for the categories |
restrict |
(character) indicating whether testcases should be filtered according to specified restriction, default NULL, e.g. fractionPathol <= 0.30 |
(list) containing the performance measurements grouped according to specified subset definition and categories
Tatjana Ammer [email protected]
Inverse of the one-parameter Box-Cox transformation.
invBoxCox(x, lambda)
invBoxCox(x, lambda)
x |
(numeric) data to be transformed |
lambda |
(numeric) Box-Cox transformation parameter |
(numeric) vector with inverse Box-Cox transformation of x
Andre Schuetzenmeister [email protected]
Convenience function to load the table with the information about the pre-defined test sets
loadTestsetDefinition()
loadTestsetDefinition()
(data frame) containing the pre-defined parameter combinations to generate the simulations
Tatjana Ammer [email protected]
testsets <- loadTestsetDefinition() str(testsets)
testsets <- loadTestsetDefinition() str(testsets)
The feature can either be the pathological fraction, the sample size or the overlap (category) individually or cumulative (_cum).
For a individualized categorisation see getSubsetForDefinedCats
.
mergeAnalytes( tableTCs, errorList, catList = NULL, catLabels = NULL, distTypes = TRUE )
mergeAnalytes( tableTCs, errorList, catList = NULL, catLabels = NULL, distTypes = TRUE )
tableTCs |
(data.frame) containing all information about the simulated test sets |
errorList |
(list) containing for each method the table with the computed error measurements |
catList |
(list) containing the categories to split the dataset |
catLabels |
(list) containing the labels that will be used for the categories |
distTypes |
(logical) indicating if 'catList' refers to analytes (FALSE, default) or distribution types (TRUE) |
(list) containing the merged performance measurements grouped according to specified category
Tatjana Ammer [email protected]
Helper function to combine all computed summary errors
mergeSummaryErrors( errorList, nameVec, errorParam = "MedianAbsPercErrorOV", cutoffZ = FALSE )
mergeSummaryErrors( errorList, nameVec, errorParam = "MedianAbsPercErrorOV", cutoffZ = FALSE )
errorList |
(list) of the error lists for the different methods for which the summary errors should be combined |
nameVec |
(character) vector specifying the names of the methods |
errorParam |
(character) specifying for which error parameter the data frame should be generated |
cutoffZ |
(logical) indicating if a cutoff was set, needed for CRP case |
(data frame) containing the summary errors per analyte per method
Tatjana Ammer [email protected]
Plot method for generating a barplot out of the benchmark results
plotBarplot( benchmarkRes, perDistType = FALSE, colList, nameList = NULL, withLabels = FALSE, withHorizLines = FALSE, title = NULL, xlim = NULL, xlab = "Mean of Absolute Z-Score Deviations", outputDir = NULL, filename = NULL, ... )
plotBarplot( benchmarkRes, perDistType = FALSE, colList, nameList = NULL, withLabels = FALSE, withHorizLines = FALSE, title = NULL, xlim = NULL, xlab = "Mean of Absolute Z-Score Deviations", outputDir = NULL, filename = NULL, ... )
benchmarkRes |
(data frame) containing the overall benchmark results |
perDistType |
(logical) indicating if one overall plot should be generated or if it should be separated by the distribution type |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
withLabels |
(logical) indicating whether the corresponding values should be plotted as well (default: FALSE) |
withHorizLines |
(logical) indicating whether horizontal lines should be plotted for a better visual separation of the different categories (default:FALSE) |
title |
(character) specifying plot title |
xlim |
(numeric) vector specifying the limits in x-direction |
xlab |
(character) specifying the x-axis label |
outputDir |
(character) specifying a output directory |
filename |
(character) specifying a filename for the plot |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Plot method for generating a boxplot of the benchmark results
plotBoxplot( errorList, colList, nameList, outline = TRUE, withMean = TRUE, withCats = FALSE, withDirect = TRUE, title = "", outputDir = NULL, filename = NULL, ylim1 = c(0, 100), ylim2 = c(100, 1000), ... )
plotBoxplot( errorList, colList, nameList, outline = TRUE, withMean = TRUE, withCats = FALSE, withDirect = TRUE, title = "", outputDir = NULL, filename = NULL, ylim1 = c(0, 100), ylim2 = c(100, 1000), ... )
errorList |
containing the overall benchmark results |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
outline |
(logical) indicating whether outliers should be drawn (TRUE, default), or not (FALSE) |
withMean |
(logical) indicating whether the mean should be plotted as well (default: TRUE) |
withCats |
(logical) set to TRUE if categories (e.g. pathological fraction) should be plotted (default: FALSE) |
withDirect |
(logical) indicating whether the box of the direct method should be elongated to facilitate comparison (default:TRUE) |
title |
(character) specifying plot title |
outputDir |
(character) specifying a output directory |
filename |
(character) specifying a filename for the plot |
ylim1 |
(numeric) vector specifying the limits in y-direction for the first granular scale |
ylim2 |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Plot method for generating a scatterplot
plotScatterplot( errorList, colList, nameList, withColor = NULL, cats = NULL, title = "", outputDir = NULL, filename = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, ... )
plotScatterplot( errorList, colList, nameList, withColor = NULL, cats = NULL, title = "", outputDir = NULL, filename = NULL, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, ... )
errorList |
(data frame) containing the overall benchmark results |
colList |
(character) vector specifying the colors used for the different algorithms (should correspond to columns of benchmark results) |
nameList |
(character) vector specifying the names used in the legend (should correspond to columns of benchmark results), if NULL, colnames will be used |
withColor |
(character) indicating if plot should be colored according to pathological fraction, sample size or pathological overlap left / right |
cats |
(character) specifying the category labels |
title |
(character) specifying plot title |
outputDir |
(character) specifying a output directory |
filename |
(character) specifying a filename for the plot |
xlim |
(numeric) vector specifying the limits in y-direction for the first granular scale |
ylim |
(numeric) vector specifying the limits in y-direction for the second less detailed scale |
xlab |
(character) specifying x-axis label |
ylab |
(character) specifying y-axis label |
... |
additional arguments passed forward to other functions |
No return value. Instead, a plot is generated.
Tatjana Ammer [email protected]
Standard print method for objects of class 'RWDRI'
## S3 method for class 'RWDRI' print( x, RIperc = c(0.025, 0.975), CIprop = 0.95, pointEst = c("fullDataEst", "medianBS", "meanBS"), truncNormal = FALSE, ... )
## S3 method for class 'RWDRI' print( x, RIperc = c(0.025, 0.975), CIprop = 0.95, pointEst = c("fullDataEst", "medianBS", "meanBS"), truncNormal = FALSE, ... )
x |
(object) of class 'RWDRI' |
RIperc |
(numeric) value specifying the percentiles, which define the reference interval |
CIprop |
(numeric) value specifying the central region for estimation of confidence intervals |
pointEst |
(character) specifying the point estimate determination: (1) using the full dataset ("fullDataEst"), (2) calculating the median from the bootstrap samples ("medianBS"), (2) works only if NBootstrap > 0 (3) calculating the mean from the bootstrap samples ("meanBS"), (3) works only if NBootstrap > 0 |
truncNormal |
(logical) specifying if a normal distribution truncated at zero shall be assumed |
... |
additional arguments passed forward to other functions. |
No return value. Instead, a summary is printed.
Christopher Rank [email protected]
Function for setting up the progress indicator.
progressInd(value, maxValue, nCharMsg = 0)
progressInd(value, maxValue, nCharMsg = 0)
value |
(integer) indicating the current number |
maxValue |
(integer) indicating the maximum number |
nCharMsg |
(integer) indicating the number of characters the message already has |
(character) returing generated progress message
Tatjana Ammer [email protected]
Function for reading in the result files for one marker
readResultFiles(analyte, algo, path = NULL, tableTCs = NULL)
readResultFiles(analyte, algo, path = NULL, tableTCs = NULL)
analyte |
(character) specifying analyte |
algo |
(character) specifying used algorithm |
path |
(character) specifying path to Results directories |
tableTCs |
(data frame) containing all information about the simulated test sets |
list with caluclated results as RWDRI objects
Tatjana Ammer [email protected]
Function for reading all results files.
readResultFilesAll(analytes, algo, baseDir = NULL, inputDir = NULL, tableTCs)
readResultFilesAll(analytes, algo, baseDir = NULL, inputDir = NULL, tableTCs)
analytes |
(character) listing all analytes for which the result files should be parsed |
algo |
(character) specifying used algorithm |
baseDir |
(character) specifying the baseDir: Results will be used from baseDir/Results/algo/marker if baseDir is set, inputDir will be ignored; if baseDir is NULL, the current working directory will be used |
inputDir |
(character) specifying path directly to Results directories |
tableTCs |
(data frame) containing all information about the simulated test sets |
list with all caluclated results as RWDRI objects for each marker
Tatjana Ammer [email protected]
Function to read the result files and compute performance measures to create customized plots afterwards
readResultsAndComputeErrors( workingDir = getwd(), algoName = NULL, subset = "all", cutoffZ = 5, ... )
readResultsAndComputeErrors( workingDir = getwd(), algoName = NULL, subset = "all", cutoffZ = 5, ... )
workingDir |
(character) specifying the working directory: Plots will be stored in workingDir/evalFolder and results will be used from workingDir/Results/algoName/biomarker; |
algoName |
(character) vector specifying one algorithm for which the performance measures should be evaluated |
subset |
(character, numeric, or data.frame) to specify for which subset the algorithm should be executed. character options: 'all' (default) for all test sets, a distribution type: 'normal', 'skewed', 'heavilySkewed', 'shifted'; a biomarker: 'Hb', 'Ca', 'FT4', 'AST', 'LACT', 'GGT', 'TSH', 'IgE', 'CRP', 'LDH'; 'runtime' for runtime analysis subset; numeric option: number of test sets per biomarker, e.g. 10; data.frame: customized subset of table with test set specifications |
cutoffZ |
(integer) specifying if and if so which cutoff for the absolute z-score deviation should be used to classify results as implausible and exclude them from the overall benchmark score (default: 5) |
... |
additional arguments to be passed to the method truncNormal (logical) specifying if a normal distribution truncated at zero shall be assumed |
(list) with (data frame) and a (list) with the computed performance measures
Tatjana Ammer [email protected]
Function to get error subsets for defined category and restriction.
restrictSet(overallCat, tableTCs, errorList, distType = TRUE, restrict = NULL)
restrictSet(overallCat, tableTCs, errorList, distType = TRUE, restrict = NULL)
overallCat |
(list) containing the categories to split the dataset |
tableTCs |
(data.frame) containing all information about the simulated test sets |
errorList |
(list) containing for each method the table with the computed error measurements |
distType |
(logical) indicating if 'overallCat' refers to analytes (FALSE, default) or distribution types (TRUE) |
restrict |
(character) indicating whether testcases should be filtered according to specified restriction, default NULL, e.g. fractionPathol <= 0.30 |
(list) containing the merged performance measurements grouped according to specified category
Tatjana Ammer [email protected]
Convenience function to simulate the direct method for the specified subset
runDirectMethod(tableTCs = NULL, N = 120, cutoffZ = 5)
runDirectMethod(tableTCs = NULL, N = 120, cutoffZ = 5)
tableTCs |
(data frame) containing the pre-defined parameter combinations to generate the simulations |
N |
(integer) specifying the number of samples used as sample size for the direct method, default: 120 |
cutoffZ |
(numeric) specifying if a cutoff should be used to classify results as implausible and exclude from analysis |
(data frame) with computed performance measures
Tatjana Ammer [email protected]
# example to run direct method only for test sets for hemoglobin (Hb) testsets <- loadTestsetDefinition() directRes <- runDirectMethod(tableTCs = testsets[testsets$Analyte =="Hb",], N = 120, cutoffZ = 5)
# example to run direct method only for test sets for hemoglobin (Hb) testsets <- loadTestsetDefinition() directRes <- runDirectMethod(tableTCs = testsets[testsets$Analyte =="Hb",], N = 120, cutoffZ = 5)
Function for running test sets per algorithm per marker with calling Rscript for each test set
runTC_usingRscript( biomarker = NULL, algoName = "myOwnAlgo", algoFunction = "estimateModel", sourceFiles = NULL, libs = NULL, params = NULL, decimals = FALSE, ris = FALSE, RIperc = c(0.025, 0.975), tableTCs = NULL, outputDir = NULL, inputDir = NULL, timeLimit = 14400, subsetDef = "all", verbose = TRUE, showWarnings = FALSE, ... )
runTC_usingRscript( biomarker = NULL, algoName = "myOwnAlgo", algoFunction = "estimateModel", sourceFiles = NULL, libs = NULL, params = NULL, decimals = FALSE, ris = FALSE, RIperc = c(0.025, 0.975), tableTCs = NULL, outputDir = NULL, inputDir = NULL, timeLimit = 14400, subsetDef = "all", verbose = TRUE, showWarnings = FALSE, ... )
biomarker |
(character) specifying the biomarker for which the algorithm should calculate RIs |
algoName |
(character) specifying the name of the algorithm that is evaluated |
algoFunction |
(character) specifying the name of the function needed for estimating RIs |
sourceFiles |
(list) containing all source files needed for executing the algorithm |
libs |
(list) containing all libraries needed for executing the algorithm |
params |
(list) with additional parameters needed for calling algoFunction |
decimals |
(logical) indicating whether the algorithm needs the number of decimal places (TRUE) or not (FALSE, default) |
ris |
(logical) indicating whether only percentiles and no model is estimated |
RIperc |
(numeric) value specifying the percentiles, which define the reference interval |
tableTCs |
(data.frame) with the information about the simulated test sets |
outputDir |
(character) specifying the outputDir: Results will be stored in outputDir/Results/algo/biomarker |
inputDir |
(character) specifying the inputDir: Data files should be stored in inputDir/Data/biomarker |
timeLimit |
(integer) specifying the maximum amount of time in seconds allowed to execute one single estimation (default: 14400 sec (4h)) |
subsetDef |
(character) describing the specified subset of all test sets the algorithm is applied to, used for naming the progress file |
verbose |
(logical) indictaing if the progress counter should be shown (default: TRUE) |
showWarnings |
(logical) indicating whether warnings from the call to the indirect method/algorithm should be shown (default: FALSE) |
... |
additional arguments to be passed to the method |
(data frame) containing information about the test sets where the algorithm terminated the R session or failed to report a result
Tatjana Ammer [email protected]
Convenience function to set up the directory structure used for storing data and results.
setupDirStructure( outputDir = NULL, onlyData = FALSE, onlyResults = FALSE, tableTCs = NULL, algoName = NULL )
setupDirStructure( outputDir = NULL, onlyData = FALSE, onlyResults = FALSE, tableTCs = NULL, algoName = NULL )
outputDir |
(character) specifying the base output directory. From here, Data/biomarker and Result/algoName/biomarker directories are generated |
onlyData |
(logical) if set to TRUE, only the biomarker subdirectories are generated, name of output directory is used as it is (default:FALSE) |
onlyResults |
(logical) if set to TRUE, only the algoName/biomarker subdirectories are generated, name of output directory is used as it is (default:FALSE) |
tableTCs |
(data frame) containing the pre-defined parameter combinations to generate the simulations |
algoName |
(character) specifying the name of the algorithm used for creating the subdirectory |
No return value. Instead, the directory structure is set up.
Tatjana Ammer [email protected]
Helper function to write result file when time out occured or R session terminated
writeResFile( algoName, biomarker, N = 0, error = NULL, runtime = NULL, filename = NULL, outputDir = NULL )
writeResFile( algoName, biomarker, N = 0, error = NULL, runtime = NULL, filename = NULL, outputDir = NULL )
algoName |
(character) specifying the name of the algorithm that is evaluated |
biomarker |
(character) specifying the biomarker for which the algorithm should calculate RIs |
N |
(numeric) specifying the number of input data points |
error |
(character) specifying the type of error (e.g. timeout, RSessionTerminated) |
runtime |
(numeric) specifying the computation time up until the error occured |
filename |
(character) specifying the filename for which the algorithm failed |
outputDir |
(character) specifying the outputDir: Data files should be stored in outputDir/Data/biomarker and Results will be stored in outputDir/Results/algo/biomarker |
Tatjana Ammer [email protected]