Title: | Interface to the Algorithm Selection Benchmark Library |
---|---|
Description: | Provides an interface to the algorithm selection benchmark library at <http://www.aslib.net> and the 'LLAMA' package (<https://cran.r-project.org/package=llama>) for building algorithm selection models; see Bischl et al. (2016) <doi:10.1016/j.artint.2016.04.003>. |
Authors: | Bernd Bischl <[email protected]>, Lars Kotthoff <[email protected]>, Pascal Kerschke <[email protected]> [ctb], Damir Pulatov <[email protected]> [ctb] |
Maintainer: | Lars Kotthoff <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2 |
Built: | 2024-12-08 07:13:47 UTC |
Source: | CRAN |
Object members
character(1)
]Name of scenario.
character
]Names of measures.
character
]Maximize measure?
character
]Either “runtime” or “solution_quality”.
numeric(1)
]Cutoff time for an algorithm run.
numeric(1)
]Cutoff memory for an algorithm run.
numeric(1)
]Cutoff time for an instance feature run.
numeric(1)
]Cutoff memory for an instance feature run.
numeric(1)
]Cutoff time for an algorithm feature run.
numeric(1)
]Cutoff memory for an algorithm feature run.
list
of character
]Names of feature processing steps, the other feature steps they require, and the features they provide.
list
of lists of character
]Names of algorithms and meta-information about them.
Potentially duplicated instances are detected by grouping all instances with equal feature vectors.
checkDuplicatedInstances(asscenario)
checkDuplicatedInstances(asscenario)
asscenario |
[ |
[list
of character
]. List of instance id vectors where
corresponding feature vectors are the same. Only groups of at least 2 elements are returned.
algo.runs
object of a scenario to wide format.The first 2 columns are “instance_id” and “repetition”. The remaining ones are the measured performance values. The feature columns are in the same order as “features_deterministic”, “features_stochastic” in the description object. codeNA means the performance value is not available, possibly because the algorithm run was aborted. The data.frame is sorted by “instance_id”, then “repetition”.
convertAlgoPerfToWideFormat(desc, algo.runs, measure)
convertAlgoPerfToWideFormat(desc, algo.runs, measure)
desc |
[ |
algo.runs |
[ |
measure |
[ |
[data.frame
].
For features, mean values are computed across repetitions. For algorithms, repetitions are not supported at the moment and will result in an error.
convertToLlama(asscenario, measure, feature.steps)
convertToLlama(asscenario, measure, feature.steps)
asscenario |
[ |
measure |
[ |
feature.steps |
[ |
Note that feature step dependencies are currently not supported explicitly by LLAMA. The conversion checks that all dependencies are satisfied, but subsequent feature selection on the LLAMA data frame may not work as expected.
Result of calling input
.
For features, mean values are computed across repetitions. For algorithms, repetitions are not supported at the moment and will result in an error.
convertToLlamaCVFolds( asscenario, measure, feature.steps, algorithm.feature.steps, cv.splits )
convertToLlamaCVFolds( asscenario, measure, feature.steps, algorithm.feature.steps, cv.splits )
asscenario |
[ |
measure |
[ |
feature.steps |
[ |
algorithm.feature.steps |
[ |
cv.splits |
[ |
Result of calling input
with data partitioned into folds.
Create a data.frame that defines cross-validation splits for a scenario,
and potentially store it in an ARFF file.
The mlr
package is used to generate the splits, see
makeResampleDesc
and makeResampleInstance
.
createCVSplits(asscenario, reps = 1L, folds = 10L, file = NULL)
createCVSplits(asscenario, reps = 1L, folds = 10L, file = NULL)
asscenario |
[ |
reps |
[ |
folds |
[ |
file |
[ |
[data.frame
]. Splits as defined in the algorithm benchmark repository
specification text.
Has columns: “instance_id”, “fold”, “rep”.
Defines which instances go into the test set for each replication / fold during CV.
The training set are the remaining instances, in exactly the order as given by the data.frame
for the current repetition.
If NAs occur, they are imputed (before aggregation) by
base + 0.3 * range
.
base
is the cutoff value for runtimes scenarios with cutoff or
the worst performance for all others.
Stochastic replications are aggregated by the mean value.
findDominatedAlgos(asscenario, measure, reduce = FALSE, type = "logical")
findDominatedAlgos(asscenario, measure, reduce = FALSE, type = "logical")
asscenario |
[ |
measure |
[ |
reduce |
[ |
type |
[ |
[matrix
]. See above.
Determines whether any of the feature groups in the LLAMA data frame presolve any of the instances. If so, the performances of all algorithms in the portfolio are set to the runtime of the first used feature group that presolves the respective instance. Furthermore, the success of all algorithms on those instances is set to true.
fixFeckingPresolve(asscenario, ldf)
fixFeckingPresolve(asscenario, ldf)
asscenario |
[ |
ldf |
[ |
These modifications are done on the main LLAMA data and on any test splits. They are *not* done on the training data. This function should only ever be used to evaluate the performance of an actual selector that uses features (i.e. not VBS or single best). Using it in polite company is to be avoided.
The LLAMA data frame with presolving baked into the algorithm performances.
Returns algorithm names of scenario.
getAlgorithmNames(asscenario)
getAlgorithmNames(asscenario)
asscenario |
[ |
[character
].
Uses subversion export to retrieve a specific scenario from the official
Coseal Github repository. The scenario is checked out into a temporary directory
and parsed with parseASScenario
.
getCosealASScenario(name)
getCosealASScenario(name)
name |
[ |
[ASScenario
]. Description object.
## Not run: sc = getCosealASScenario("CSP-2010") ## End(Not run)
## Not run: sc = getCosealASScenario("CSP-2010") ## End(Not run)
Return whether an instance was presolved and which step did it.
getCostsAndPresolvedStatus(asscenario, feature.steps, type)
getCostsAndPresolvedStatus(asscenario, feature.steps, type)
asscenario |
[ |
feature.steps |
[ |
type |
[ |
[list
]. Below, n
is the number of instances. All following object are ordered by “instance_id”.
is.presolved [logical(n)] |
Was instance presolved? Named by instance ids. |
solve.steps [character(n)] |
Which step solved it? NA if no step did it. Named by instance ids. |
costs [numeric(n)] |
Feature costs for using the steps. Named by instance ids. NULL if no costs are present. |
Returns the default feature step names of scenario.
getDefaultFeatureStepNames(asscenario)
getDefaultFeatureStepNames(asscenario)
asscenario |
[ |
[character
].
Returns feature names of scenario.
getFeatureNames(asscenario, type)
getFeatureNames(asscenario, type)
asscenario |
[ |
type |
[ |
[character
].
Returns feature step names of scenario.
getFeatureStepNames(asscenario, type)
getFeatureStepNames(asscenario, type)
asscenario |
[ |
type |
[ |
[character
].
Returns instance names of scenario.
getInstanceNames(asscenario)
getInstanceNames(asscenario)
asscenario |
[ |
[character
].
Returns number of CV folds.
getNumberOfCVFolds(asscenario)
getNumberOfCVFolds(asscenario)
asscenario |
[ |
[integer(1)
].
Returns number of CV repetitions.
getNumberOfCVReps(asscenario)
getNumberOfCVReps(asscenario)
asscenario |
[ |
[integer(1)
].
Return features that are useable for a given set of feature steps.
getProvidedFeatures(asscenario, steps, type)
getProvidedFeatures(asscenario, steps, type)
asscenario |
[ |
steps |
[ |
type |
[ |
[character
].
Returns feature costs of scenario, summed over all instances.
getSummedFeatureCosts(asscenario, feature.steps)
getSummedFeatureCosts(asscenario, feature.steps)
asscenario |
[ |
feature.steps |
[ |
[character
].
The following formula is used for imputation:
base +- range.scalar * range.span + N(0, sd = jitter * range.span)
With range.span = max - min
.
Returns an object like algo.runs
of asscenario
, but drops
the runstatus and all other measures.
imputeAlgoPerf( asscenario, measure, base = NULL, range.scalar = 0.3, jitter = 0, impute.zero.vals = FALSE )
imputeAlgoPerf( asscenario, measure, base = NULL, range.scalar = 0.3, jitter = 0, impute.zero.vals = FALSE )
asscenario |
[ |
measure |
[ |
base |
[ |
range.scalar |
[ |
jitter |
[ |
impute.zero.vals |
[ |
[data.frame
].
Object members
Let n be the number of (replicated) instances, m the number of unique instances, p the number of features, s the number of feature steps and k the number of algorithms.
ASScenarioDesc
]Description object, containing further info.
data.frame(n, s + 2)
]Runstatus of instance feature computation steps.
The first 2 columns are “instance_id” and “repetition”, the remaining are the status factors.
The step columns are in the same order as the feature steps in the description object.
The factor levels are always: ok, presolved, crash, timeout, memout, other.
No entry can be NA
.
The data.frame is sorted by “instance_id”, then “repetition”.
data.frame(k, s + 1)
]Runstatus of algorithm feature computation steps.
The first column is “algorithm”, the remaining are the status factors.
The step columns are in the same order as the feature steps in the description object.
The factor levels are always: ok, crash, timeout, memout, other.
No entry can be NA
.
The data.frame is sorted by “algorithm”.
data.frame(n, s + 2)
]Costs of instance feature computation steps.
The first 2 columns are “instance_id” and “repetition”, the remaining are
numeric costs of the instance feature steps.
The step columns are in the same order as the feature steps in the description object.
codeNA means the cost is not available, possibly because the feature computation was aborted.
The data.frame is sorted by “instance_id”, then “repetition”.
If no cost file is available at all, NULL
is stored.
data.frame(n, s + 1)
]Costs of algorithm feature computation steps.
The first column is “algorithm”, the remaining are
numeric costs of the algorithmic feature steps.
The step columns are in the same order as the feature steps in the description object.
codeNA means the cost is not available, possibly because the feature computation was aborted.
The data.frame is sorted by “algorithm”.
If no cost file is available at all, NULL
is stored.
data.frame(n, p + 2)
]Measured feature values of instances. The first 2 columns are “instance_id” and “repetition”. The remaining ones are the measured instance features. The feature columns are in the same order as “instance_features_deterministic”, “features_stochastic” in the description object. codeNA means the feature is not available, possibly because the feature computation was aborted. The data.frame is sorted by “instance_id”, then “repetition”.
data.frame(k, p + 1)
]Measured feature values of algorithms The first column is “algorithm”. The remaining ones are the measured algorithmic features. The feature columns are in the same order as “algorithm_features_deterministic”, “algorithm_features_stochastic” in the description object. codeNA means the feature is not available, possibly because the feature computation was aborted. The data.frame is sorted by “algorithm”.
data.frame
]Runstatus and performance information of the
algorithms. Simply the parsed ARFF file.
See convertAlgoPerfToWideFormat
for a more convenient format.
data.frame(n, k + 2)
]Runstatus of algorithm runs.
The first 2 columns are “instance_id” and “repetition”, the remaining are the status factors.
The step columns are in the same order as the feature steps in the description object.
The factor levels are always: ok, presolved, crash, timeout, memout, other.
No entry can be NA
.
The data.frame is sorted by “instance_id”, then “repetition”.
data.frame(m, 3)
]Definition of cross-validation splits for each replication
of a repeated CV with folds.
Has columns “instance_id”, “repetition” and “fold”.
The instances with fold = i for a replication r constitute the i-th test set for the r-th CV.
The training set is the “instance_id” column with repetition = r, in the same order,
when the test set is removed.
The data.frame is sorted by “repetition”, then “fold”, then “instance_id”.
If no CV file is available at all, NULL
is stored, and a warning is issued, although this
should not happen.
parseASScenario(path)
parseASScenario(path)
path |
[ |
[ASScenario
]. Description object.
## Not run: sc = parseASScenario("/path/to/scenario") ## End(Not run)
## Not run: sc = parseASScenario("/path/to/scenario") ## End(Not run)
If NAs occur, they are imputed (before aggregation) by
base + 0.3 * range
.
base
is the cutoff value for runtimes scenarios with cutoff or
the worst performance for all others.
Stochastic replications are aggregated by the mean value.
plotAlgoCorMatrix( asscenario, measure, order.method = "hclust", hclust.method = "ward.D2", cor.method = "spearman" )
plotAlgoCorMatrix( asscenario, measure, order.method = "hclust", hclust.method = "ward.D2", cor.method = "spearman" )
asscenario |
[ |
measure |
[ |
order.method |
[ |
hclust.method |
[ |
cor.method |
[ |
See corrplot
.
If NAs occur, they are imputed (before aggregation) by
base + 0.3 range + jitter
.
base
is is the cutoff value for runtimes scenarios with cutoff or
the worst performance for all others.
For the CDFs we only show the visible area where successful runs occurred.
Stochastic replications are aggregated by the mean value.
plotAlgoPerfBoxplots( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, impute.failed.runs = TRUE, rm.censored.runs = TRUE ) plotAlgoPerfCDFs( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE ) plotAlgoPerfDensities( asscenario, measure, impute.failed.runs = TRUE, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE ) plotAlgoPerfScatterMatrix( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE )
plotAlgoPerfBoxplots( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, impute.failed.runs = TRUE, rm.censored.runs = TRUE ) plotAlgoPerfCDFs( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE ) plotAlgoPerfDensities( asscenario, measure, impute.failed.runs = TRUE, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE ) plotAlgoPerfScatterMatrix( asscenario, measure, impute.zero.vals = FALSE, log = FALSE, rm.censored.runs = TRUE )
asscenario |
[ |
measure |
[ |
impute.zero.vals |
[ |
log |
[ |
impute.failed.runs |
[ |
rm.censored.runs |
[ |
ggplot2 plot object.
It is likely that you need to install some additional R packages for this from CRAN or extra
Weka learner. The latter can be done via e.g. WPM("install-package", "XMeans")
.
Feature costs are added for real prognostic models but not for baseline models.
runLlamaModels( asscenarios, feature.steps.list = NULL, baselines = NULL, learners = list(), par.sets = list(), rs.iters = 100L, n.inner.folds = 2L )
runLlamaModels( asscenarios, feature.steps.list = NULL, baselines = NULL, learners = list(), par.sets = list(), rs.iters = 100L, n.inner.folds = 2L )
asscenarios |
[(list of) |
feature.steps.list |
[ |
baselines |
[ |
learners |
[list of |
par.sets |
[list of |
rs.iters |
[ |
n.inner.folds |
[ |
batchtools registry.
Creates summary data.frame for algorithm performance values across all instances.
summarizeAlgoPerf(asscenario, measure)
summarizeAlgoPerf(asscenario, measure)
asscenario |
[ |
measure |
[ |
[data.frame
].
Creates summary data.frame for algorithm runstatus across all instances.
summarizeAlgoRunstatus(asscenario)
summarizeAlgoRunstatus(asscenario)
asscenario |
[ |
[data.frame
].
Creates a data.frame that summarizes the feature steps.
summarizeFeatureSteps(asscenario)
summarizeFeatureSteps(asscenario)
asscenario |
[ |
[data.frame
].
Creates summary data.frame for feature values across all instances.
summarizeFeatureValues(asscenario, type)
summarizeFeatureValues(asscenario, type)
asscenario |
[ |
type |
[ |
[data.frame
].
Creates summary data.table for runLlamaModel experiments.
summarizeLlamaExps( reg, ids = findSubmitted(), fun = function(job, res) { return(list(succ = res$succ, par10 = res$par10, mcp = res$mcp)) }, missing.val = list(succ = 0, par10 = Inf, mcp = Inf) )
summarizeLlamaExps( reg, ids = findSubmitted(), fun = function(job, res) { return(list(succ = res$succ, par10 = res$par10, mcp = res$mcp)) }, missing.val = list(succ = 0, par10 = Inf, mcp = Inf) )
reg |
[ |
ids |
[ |
fun |
[ |
missing.val |
[ |
[data.table
].
Splits an algorithm selection scenario into description, feature values / runstatus / costs, algorithm performance and cv splits and saves those data sets as single ARFF files in the given directory.
writeASScenario(asscenario, path = asscenario$desc$scenario_id)
writeASScenario(asscenario, path = asscenario$desc$scenario_id)
asscenario |
[ |
path |
[ |