Title: | Machine Learning in R |
---|---|
Description: | Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized. |
Authors: | Bernd Bischl [aut] , Michel Lang [aut] , Lars Kotthoff [aut], Patrick Schratz [aut] , Julia Schiffner [aut], Jakob Richter [aut], Zachary Jones [aut], Giuseppe Casalicchio [aut] , Mason Gallo [aut], Jakob Bossek [ctb] , Erich Studerus [ctb] , Leonard Judt [ctb], Tobias Kuehn [ctb], Pascal Kerschke [ctb] , Florian Fendt [ctb], Philipp Probst [ctb] , Xudong Sun [ctb] , Janek Thomas [ctb] , Bruno Vieira [ctb], Laura Beggel [ctb] , Quay Au [ctb] , Martin Binder [aut, cre], Florian Pfisterer [ctb], Stefan Coors [ctb], Steve Bronder [ctb], Alexander Engelhardt [ctb], Christoph Molnar [ctb], Annette Spooner [ctb] |
Maintainer: | Martin Binder <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 2.19.2 |
Built: | 2024-12-10 06:53:07 UTC |
Source: | CRAN |
Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Generic resampling, including cross-validation, bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling. Most operations can be parallelized.
Maintainer: Martin Binder [email protected]
Authors:
Bernd Bischl [email protected] (ORCID)
Michel Lang [email protected] (ORCID)
Lars Kotthoff [email protected]
Patrick Schratz [email protected] (ORCID)
Julia Schiffner [email protected]
Jakob Richter [email protected]
Zachary Jones [email protected]
Giuseppe Casalicchio [email protected] (ORCID)
Mason Gallo [email protected]
Other contributors:
Jakob Bossek [email protected] (ORCID) [contributor]
Erich Studerus [email protected] (ORCID) [contributor]
Leonard Judt [email protected] [contributor]
Tobias Kuehn [email protected] [contributor]
Pascal Kerschke [email protected] (ORCID) [contributor]
Florian Fendt [email protected] [contributor]
Philipp Probst [email protected] (ORCID) [contributor]
Xudong Sun [email protected] (ORCID) [contributor]
Janek Thomas [email protected] (ORCID) [contributor]
Bruno Vieira [email protected] [contributor]
Laura Beggel [email protected] (ORCID) [contributor]
Quay Au [email protected] (ORCID) [contributor]
Florian Pfisterer [email protected] [contributor]
Stefan Coors [email protected] [contributor]
Steve Bronder [email protected] [contributor]
Alexander Engelhardt [email protected] [contributor]
Christoph Molnar [email protected] [contributor]
Annette Spooner [email protected] [contributor]
Useful links:
Report bugs at https://github.com/mlr-org/mlr/issues
Adds new measures to an existing ResampleResult
.
addRRMeasure(res, measures)
addRRMeasure(res, measures)
res |
(ResampleResult) |
measures |
(Measure | list of Measure) |
Other resample:
ResamplePrediction
,
ResampleResult
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
An aggregation method reduces the performance values of the test (and possibly the training sets) to a single value. To see all possible implemented aggregations look at aggregations.
The aggregation can access all relevant information of the result after resampling and combine them into a single value. Though usually something very simple like taking the mean of the test set performances is done.
Object members:
character(1)
)Name of the aggregation method.
character(1)
)Long name of the aggregation method.
Properties of the aggregation.
Aggregation function.
Mean of performance values on test sets.
Standard deviation of performance values on test sets.
Median of performance values on test sets.
Minimum of performance values on test sets.
Maximum of performance values on test sets.
Sum of performance values on test sets.
Mean of performance values on training sets.
Standard deviation of performance values on training sets.
Median of performance values on training sets.
Minimum of performance values on training sets.
Maximum of performance values on training sets.
Sum of performance values on training sets.
Aggregation for B632 bootstrap.
Aggregation for B632+ bootstrap.
Performance values on test sets are grouped according to resampling method. The mean for every group is calculated, then the mean of those means. Mainly used for repeated CV.
Similar to testgroup.mean - after the mean for every group is calculated, the standard deviation of those means is obtained. Mainly used for repeated CV.
Performance measure on joined test sets. This is especially useful for small sample sizes where unbalanced group sizes have a significant impact on the aggregation, especially for cross-validation test.join might make sense now. For the repeated CV, the performance is calculated on each repetition and then aggregated with the arithmetic mean.
Contains the task (agri.task
).
See cluster::agriculture.
This function prints the steps selectFeatures took to find its optimal set of features and the reason why it stopped. It can also print information about all calculations done in each intermediate step.
Currently only implemented for sequential feature selection.
analyzeFeatSelResult(res, reduce = TRUE)
analyzeFeatSelResult(res, reduce = TRUE)
res |
(FeatSelResult) |
reduce |
( |
(invisible(NULL)
).
Other featsel:
FeatSelControl
,
getFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
Converts predictions to a format package ROCR can handle.
asROCRPrediction(pred)
asROCRPrediction(pred)
pred |
(Prediction) |
Other roc:
calculateROCMeasures()
Other predict:
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
This function is a very parallel version of benchmark using batchtools. Experiments are created in the provided registry for each combination of learners, tasks and resamplings. The experiments are then stored in a registry and the runs can be started via batchtools::submitJobs. A job is one train/test split of the outer resampling. In case of nested resampling (e.g. with makeTuneWrapper), each job is a full run of inner resampling, which can be parallelized in a second step with ParallelMap.
For details on the usage and support backends have a look at the batchtools tutorial page: https://github.com/mllg/batchtools.
The general workflow with batchmark
looks like this:
Create an ExperimentRegistry using batchtools::makeExperimentRegistry.
Call batchmark(...)
which defines jobs for all learners and tasks in an base::expand.grid fashion.
Submit jobs using batchtools::submitJobs.
Babysit the computation, wait for all jobs to finish using batchtools::waitForJobs.
Call reduceBatchmarkResult()
to reduce results into a BenchmarkResult.
If you want to use this with OpenML datasets you can generate tasks
from a vector of dataset IDs easily with tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x)))
.
batchmark( learners, tasks, resamplings, measures, keep.pred = TRUE, keep.extract = FALSE, models = FALSE, reg = batchtools::getDefaultRegistry() )
batchmark( learners, tasks, resamplings, measures, keep.pred = TRUE, keep.extract = FALSE, models = FALSE, reg = batchtools::getDefaultRegistry() )
learners |
(list of Learner | character) |
tasks |
list of Task |
resamplings |
[(list of) ResampleDesc) |
measures |
(list of Measure) |
keep.pred |
( |
keep.extract |
( |
models |
( |
reg |
(batchtools::Registry) |
(data.table). Generated job ids are stored in the column “job.id”.
Other benchmark:
BenchmarkResult
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Contains the task (bc.task
).
See mlbench::BreastCancer.
The column "Id"
and all incomplete cases have been removed from the task.
Complete benchmark experiment to compare different learning algorithms across one or more tasks w.r.t. a given resampling strategy. Experiments are paired, meaning always the same training / test sets are used for the different learners. Furthermore, you can of course pass “enhanced” learners via wrappers, e.g., a learner can be automatically tuned using makeTuneWrapper.
benchmark( learners, tasks, resamplings, measures, keep.pred = TRUE, keep.extract = FALSE, models = FALSE, show.info = getMlrOption("show.info") )
benchmark( learners, tasks, resamplings, measures, keep.pred = TRUE, keep.extract = FALSE, models = FALSE, show.info = getMlrOption("show.info") )
learners |
(list of Learner | character) |
tasks |
list of Task |
resamplings |
(list of ResampleDesc | ResampleInstance) |
measures |
(list of Measure) |
keep.pred |
( |
keep.extract |
( |
models |
( |
show.info |
( |
Other benchmark:
BenchmarkResult
,
batchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart")) tasks = list(iris.task, sonar.task) rdesc = makeResampleDesc("CV", iters = 2L) meas = list(acc, ber) bmr = benchmark(lrns, tasks, rdesc, measures = meas) rmat = convertBMRToRankMatrix(bmr) print(rmat) plotBMRSummary(bmr) plotBMRBoxplots(bmr, ber, style = "violin") plotBMRRanksAsBarChart(bmr, pos = "stack") friedmanTestBMR(bmr) friedmanPostHocTestBMR(bmr, p.value = 0.05)
lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart")) tasks = list(iris.task, sonar.task) rdesc = makeResampleDesc("CV", iters = 2L) meas = list(acc, ber) bmr = benchmark(lrns, tasks, rdesc, measures = meas) rmat = convertBMRToRankMatrix(bmr) print(rmat) plotBMRSummary(bmr) plotBMRBoxplots(bmr, ber, style = "violin") plotBMRRanksAsBarChart(bmr, pos = "stack") friedmanTestBMR(bmr) friedmanPostHocTestBMR(bmr, p.value = 0.05)
Result of a benchmark experiment conducted by benchmark with the following members:
A nested list of resample results, first ordered by task id, then by learner id.
The performance measures used in the benchmark experiment.
The learning algorithms compared in the benchmark experiment.
The print method of this object shows aggregated performance values for all tasks and learners.
It is recommended to
retrieve required information via the getBMR*
getter functions.
You can also convert the object using as.data.frame.
Other benchmark:
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Helper functions to deal with mlr caching.
getCacheDir() deleteCacheDir()
getCacheDir() deleteCacheDir()
getCacheDir()
returns the default mlr cache directory deleteCacheDir()
clears the default mlr cache directory. Custom cache
directories must be deleted by hand.
Calculates the confusion matrix for a (possibly resampled) prediction. Rows indicate true classes, columns predicted classes. The marginal elements count the number of classification errors for the respective row or column, i.e., the number of errors when you condition on the corresponding true (rows) or predicted (columns) class. The last bottom right element displays the total amount of errors.
A list is returned that contains multiple matrices.
If relative = TRUE
we compute three matrices, one with absolute values and two with relative.
The relative confusion matrices are normalized based on rows and columns respectively,
if FALSE
we only compute the absolute value matrix.
The print
function returns the relative matrices in
a compact way so that both row and column marginals can be seen in one matrix.
For details see ConfusionMatrix.
Note that for resampling no further aggregation is currently performed. All predictions on all test sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated vs. y, as if both were computed on a single test set. This probably mainly makes sense when cross-validation is used for resampling.
calculateConfusionMatrix(pred, relative = FALSE, sums = FALSE, set = "both") ## S3 method for class 'ConfusionMatrix' print(x, both = TRUE, digits = 2, ...)
calculateConfusionMatrix(pred, relative = FALSE, sums = FALSE, set = "both") ## S3 method for class 'ConfusionMatrix' print(x, both = TRUE, digits = 2, ...)
pred |
(Prediction) |
relative |
( |
sums |
( |
set |
( |
x |
(ConfusionMatrix) |
both |
( |
digits |
( |
... |
(any) |
print(ConfusionMatrix)
:
Other performance:
ConfusionMatrix
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
# get confusion matrix after simple manual prediction allinds = 1:150 train = sample(allinds, 75) test = setdiff(allinds, train) mod = train("classif.lda", iris.task, subset = train) pred = predict(mod, iris.task, subset = test) print(calculateConfusionMatrix(pred)) print(calculateConfusionMatrix(pred, sums = TRUE)) print(calculateConfusionMatrix(pred, relative = TRUE)) # now after cross-validation r = crossval("classif.lda", iris.task, iters = 2L) print(calculateConfusionMatrix(r$pred))
# get confusion matrix after simple manual prediction allinds = 1:150 train = sample(allinds, 75) test = setdiff(allinds, train) mod = train("classif.lda", iris.task, subset = train) pred = predict(mod, iris.task, subset = test) print(calculateConfusionMatrix(pred)) print(calculateConfusionMatrix(pred, sums = TRUE)) print(calculateConfusionMatrix(pred, relative = TRUE)) # now after cross-validation r = crossval("classif.lda", iris.task, iters = 2L) print(calculateConfusionMatrix(r$pred))
Calculate the absolute number of correct/incorrect classifications and the following evaluation measures:
tpr
True positive rate (Sensitivity, Recall)
fpr
False positive rate (Fall-out)
fnr
False negative rate (Miss rate)
tnr
True negative rate (Specificity)
ppv
Positive predictive value (Precision)
for
False omission rate
lrp
Positive likelihood ratio (LR+)
fdr
False discovery rate
npv
Negative predictive value
acc
Accuracy
lrm
Negative likelihood ratio (LR-)
dor
Diagnostic odds ratio
For details on the used measures see measures and also https://en.wikipedia.org/wiki/Receiver_operating_characteristic.
The element for the false omission rate in the resulting object is not called for
but
fomr
since for
should never be used as a variable name in an object.
calculateROCMeasures(pred) ## S3 method for class 'ROCMeasures' print(x, abbreviations = TRUE, digits = 2, ...)
calculateROCMeasures(pred) ## S3 method for class 'ROCMeasures' print(x, abbreviations = TRUE, digits = 2, ...)
pred |
(Prediction) |
x |
( |
abbreviations |
( |
digits |
( |
... |
|
(ROCMeasures
).
A list containing two elements confusion.matrix
which is
the 2 times 2 confusion matrix of absolute frequencies and measures
, a list of the above mentioned measures.
print(ROCMeasures)
:
Other roc:
asROCRPrediction()
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, sonar.task) pred = predict(fit, task = sonar.task) calculateROCMeasures(pred)
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, sonar.task) pred = predict(fit, task = sonar.task) calculateROCMeasures(pred)
Convert numeric entries which large/infinite (absolute) values in a data.frame or task. Only numeric/integer columns are affected.
capLargeValues( obj, target = character(0L), cols = NULL, threshold = Inf, impute = threshold, what = "abs" )
capLargeValues( obj, target = character(0L), cols = NULL, threshold = Inf, impute = threshold, what = "abs" )
obj |
(data.frame | Task) |
target |
(character) |
cols |
(character) |
threshold |
( |
impute |
( |
what |
( |
Other eda_and_preprocess:
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
capLargeValues(iris, threshold = 5, impute = 5)
capLargeValues(iris, threshold = 5, impute = 5)
Configuration is done by setting custom options.
If you do not set an option here, its current value will be kept.
If you call this function with an empty argument list, everything is set to its defaults.
configureMlr( show.info, on.learner.error, on.learner.warning, on.par.without.desc, on.par.out.of.bounds, on.measure.not.applicable, show.learner.output, on.error.dump )
configureMlr( show.info, on.learner.error, on.learner.warning, on.par.without.desc, on.par.out.of.bounds, on.measure.not.applicable, show.learner.output, on.error.dump )
show.info |
( |
on.learner.error |
( |
on.learner.warning |
( |
on.par.without.desc |
( |
on.par.out.of.bounds |
( |
on.measure.not.applicable |
( |
show.learner.output |
( |
on.error.dump |
( |
(invisible(NULL)
).
Other configure:
getMlrOptions()
The result of calculateConfusionMatrix.
Object members:
Confusion matrix of absolute values and marginals. Can also contain row and column sums of observations.
Additional information about the task.
logical(1)
)Flag if marginal sums of observations are calculated.
logical(1)
)Flag if the relative confusion matrices are calculated.
Confusion matrix of relative values and marginals normalized by row.
Confusion matrix of relative values and marginals normalized by column.
numeric(1)
)Relative error overall.
Other performance:
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Computes a matrix of all the ranks of different algorithms over different datasets (tasks). Ranks are computed from aggregated measures. Smaller ranks imply better methods, so for measures that are minimized, small ranks imply small scores. for measures that are maximized, small ranks imply large scores.
convertBMRToRankMatrix( bmr, measure = NULL, ties.method = "average", aggregation = "default" )
convertBMRToRankMatrix( bmr, measure = NULL, ties.method = "average", aggregation = "default" )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
ties.method |
( |
aggregation |
( |
(matrix) with measure ranks as entries.
The matrix has one row for each learner
, and one column for each task
.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
We auto-set the target column, drop any column which is called “Id” and convert logicals to factors.
convertMLBenchObjToTask(x, n = 100L, ...)
convertMLBenchObjToTask(x, n = 100L, ...)
x |
( |
n |
( |
... |
(any) |
print(convertMLBenchObjToTask("Ionosphere")) print(convertMLBenchObjToTask("mlbench.spirals", n = 100, sd = 0.1))
print(convertMLBenchObjToTask("Ionosphere")) print(convertMLBenchObjToTask("mlbench.spirals", n = 100, sd = 0.1))
Contains the task (costiris.task
).
See datasets::iris. The cost matrix was generated artificially following
Tu, H.-H. and Lin, H.-T. (2010), One-sided support vector regression for multiclass cost-sensitive classification. In ICML, J. Fürnkranz and T. Joachims, Eds., Omnipress, 1095–1102.
Replace all factor features with their dummy variables. Internally model.matrix is used. Non factor features will be left untouched and passed to the result.
createDummyFeatures( obj, target = character(0L), method = "1-of-n", cols = NULL )
createDummyFeatures( obj, target = character(0L), method = "1-of-n", cols = NULL )
obj |
(data.frame | Task) |
target |
( |
method |
(
Default is “1-of-n”. |
cols |
(character) |
data.frame | Task. Same type as obj
.
Other eda_and_preprocess:
capLargeValues()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Visualize partitioning of resample objects with spatial information.
createSpatialResamplingPlots( task = NULL, resample = NULL, crs = NULL, datum = 4326, repetitions = 1, color.train = "#0072B5", color.test = "#E18727", point.size = 0.5, axis.text.size = 14, x.axis.breaks = waiver(), y.axis.breaks = waiver() )
createSpatialResamplingPlots( task = NULL, resample = NULL, crs = NULL, datum = 4326, repetitions = 1, color.train = "#0072B5", color.test = "#E18727", point.size = 0.5, axis.text.size = 14, x.axis.breaks = waiver(), y.axis.breaks = waiver() )
task |
Task |
resample |
ResampleResult or named |
crs |
integer |
datum |
integer |
repetitions |
integer |
color.train |
character |
color.test |
character |
point.size |
integer |
axis.text.size |
integer |
x.axis.breaks |
numeric |
y.axis.breaks |
numeric |
If a named list is given to resample
, names will appear in the title of
each fold.
If multiple inputs are given to resample
, these must be named.
This function makes a hard cut at five columns of the resulting gridded plot.
This means if the resample
object consists of folds > 5
, these folds will
be put into the new row.
For file saving, we recommend to use cowplot::save_plot.
When viewing the resulting plot in RStudio, margins may appear to be different than they really are. Make sure to save the file to disk and inspect the image.
When modifying axis breaks, negative values need to be used if the area is located in either the western or southern hemisphere. Use positive values for the northern and eastern hemisphere.
(list of 2L
containing (1) multiple 'gg“ objects and (2) their
corresponding labels.
The crs has to be suitable for the coordinates stored in the Task
.
For example, if the coordinates are UTM, crs
should be set to a
UTM projection.
Due to a limited axis space in the resulting grid (especially on the x-axis),
the data will by default projected into a lat/lon projection, specifically
EPSG 4326.
If other projections are desired for the resulting map, please set argument
datum
accordingly. This argument will be passed onto ggplot2::coord_sf.
Patrick Schratz
Other plot:
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
rdesc = makeResampleDesc("SpRepCV", folds = 5, reps = 4) r = resample(makeLearner("classif.qda"), spatial.task, rdesc) ## ------------------------------------------------------------- ## single unnamed resample input with 5 folds and 2 repetitions ## ------------------------------------------------------------- plots = createSpatialResamplingPlots(spatial.task, r, crs = 32717, repetitions = 2, x.axis.breaks = c(-79.065, -79.085), y.axis.breaks = c(-3.970, -4)) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2, labels = plots[["Labels"]]) ## -------------------------------------------------------------------------- ## single named resample input with 5 folds and 1 repetition and 32717 datum ## -------------------------------------------------------------------------- plots = createSpatialResamplingPlots(spatial.task, list("Resamp" = r), crs = 32717, datum = 32717, repetitions = 1) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 1, labels = plots[["Labels"]]) ## ------------------------------------------------------------- ## multiple named resample inputs with 5 folds and 1 repetition ## ------------------------------------------------------------- rdesc1 = makeResampleDesc("SpRepCV", folds = 5, reps = 4) r1 = resample(makeLearner("classif.qda"), spatial.task, rdesc1) rdesc2 = makeResampleDesc("RepCV", folds = 5, reps = 4) r2 = resample(makeLearner("classif.qda"), spatial.task, rdesc2) plots = createSpatialResamplingPlots(spatial.task, list("SpRepCV" = r1, "RepCV" = r2), crs = 32717, repetitions = 1, x.axis.breaks = c(-79.055, -79.085), y.axis.breaks = c(-3.975, -4)) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2, labels = plots[["Labels"]]) ## ------------------------------------------------------------------------------------- ## Complex arrangements of multiple named resample inputs with 5 folds and 1 repetition ## ------------------------------------------------------------------------------------- p1 = cowplot::plot_grid(plots[["Plots"]][[1]], plots[["Plots"]][[2]], plots[["Plots"]][[3]], ncol = 3, nrow = 1, labels = plots[["Labels"]][1:3], label_size = 18) p12 = cowplot::plot_grid(plots[["Plots"]][[4]], plots[["Plots"]][[5]], ncol = 2, nrow = 1, labels = plots[["Labels"]][4:5], label_size = 18) p2 = cowplot::plot_grid(plots[["Plots"]][[6]], plots[["Plots"]][[7]], plots[["Plots"]][[8]], ncol = 3, nrow = 1, labels = plots[["Labels"]][6:8], label_size = 18) p22 = cowplot::plot_grid(plots[["Plots"]][[9]], plots[["Plots"]][[10]], ncol = 2, nrow = 1, labels = plots[["Labels"]][9:10], label_size = 18) cowplot::plot_grid(p1, p12, p2, p22, ncol = 1)
rdesc = makeResampleDesc("SpRepCV", folds = 5, reps = 4) r = resample(makeLearner("classif.qda"), spatial.task, rdesc) ## ------------------------------------------------------------- ## single unnamed resample input with 5 folds and 2 repetitions ## ------------------------------------------------------------- plots = createSpatialResamplingPlots(spatial.task, r, crs = 32717, repetitions = 2, x.axis.breaks = c(-79.065, -79.085), y.axis.breaks = c(-3.970, -4)) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2, labels = plots[["Labels"]]) ## -------------------------------------------------------------------------- ## single named resample input with 5 folds and 1 repetition and 32717 datum ## -------------------------------------------------------------------------- plots = createSpatialResamplingPlots(spatial.task, list("Resamp" = r), crs = 32717, datum = 32717, repetitions = 1) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 1, labels = plots[["Labels"]]) ## ------------------------------------------------------------- ## multiple named resample inputs with 5 folds and 1 repetition ## ------------------------------------------------------------- rdesc1 = makeResampleDesc("SpRepCV", folds = 5, reps = 4) r1 = resample(makeLearner("classif.qda"), spatial.task, rdesc1) rdesc2 = makeResampleDesc("RepCV", folds = 5, reps = 4) r2 = resample(makeLearner("classif.qda"), spatial.task, rdesc2) plots = createSpatialResamplingPlots(spatial.task, list("SpRepCV" = r1, "RepCV" = r2), crs = 32717, repetitions = 1, x.axis.breaks = c(-79.055, -79.085), y.axis.breaks = c(-3.975, -4)) cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2, labels = plots[["Labels"]]) ## ------------------------------------------------------------------------------------- ## Complex arrangements of multiple named resample inputs with 5 folds and 1 repetition ## ------------------------------------------------------------------------------------- p1 = cowplot::plot_grid(plots[["Plots"]][[1]], plots[["Plots"]][[2]], plots[["Plots"]][[3]], ncol = 3, nrow = 1, labels = plots[["Labels"]][1:3], label_size = 18) p12 = cowplot::plot_grid(plots[["Plots"]][[4]], plots[["Plots"]][[5]], ncol = 2, nrow = 1, labels = plots[["Labels"]][4:5], label_size = 18) p2 = cowplot::plot_grid(plots[["Plots"]][[6]], plots[["Plots"]][[7]], plots[["Plots"]][[8]], ncol = 3, nrow = 1, labels = plots[["Labels"]][6:8], label_size = 18) p22 = cowplot::plot_grid(plots[["Plots"]][[9]], plots[["Plots"]][[10]], ncol = 2, nrow = 1, labels = plots[["Labels"]][9:10], label_size = 18) cowplot::plot_grid(p1, p12, p2, p22, ncol = 1)
Takes two bit strings and creates a new one of the same size by selecting the items from the first string or the second, based on a given rate (the probability of choosing an element from the first string).
x |
(logical) |
y |
(logical) |
rate |
( |
(crossover).
Decrease the observations in a task
or a ResampleInstance
to a given percentage of observations.
downsample(obj, perc = 1, stratify = FALSE)
downsample(obj, perc = 1, stratify = FALSE)
obj |
(Task | ResampleInstance) |
perc |
( |
stratify |
( |
([data.frame| [Task] | [ResampleInstance]). Same type as
obj'.
Other downsample:
makeDownsampleWrapper()
Drop some features of task.
dropFeatures(task, features)
dropFeatures(task, features)
task |
(Task) |
features |
(character) |
Task.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Estimates the relative overfitting of a model as the ratio of the difference in test and train performance to the difference of test performance in the no-information case and train performance. In the no-information case the features carry no information with respect to the prediction. This is simulated by permuting features and predictions.
estimateRelativeOverfitting( predish, measures, task, learner = NULL, pred.train = NULL, iter = 1 )
estimateRelativeOverfitting( predish, measures, task, learner = NULL, pred.train = NULL, iter = 1 )
predish |
(ResampleDesc | ResamplePrediction | Prediction) |
measures |
(Measure | list of Measure) |
task |
(Task) |
learner |
(Learner | |
pred.train |
(Prediction) |
iter |
(integer) |
Currently only support for classification and regression tasks is implemented.
(data.frame). Relative overfitting estimate(s), named by measure(s), for each resampling iteration.
Bradley Efron and Robert Tibshirani; Improvements on Cross-Validation: The .632+ Bootstrap Method, Journal of the American Statistical Association, Vol. 92, No. 438. (Jun., 1997), pp. 548-560.
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
task = makeClassifTask(data = iris, target = "Species") rdesc = makeResampleDesc("CV", iters = 2) estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.knn")) estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.lda")) rpred = resample("classif.knn", task, rdesc)$pred estimateRelativeOverfitting(rpred, acc, task)
task = makeClassifTask(data = iris, target = "Species") rdesc = makeResampleDesc("CV", iters = 2) estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.knn")) estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.lda")) rpred = resample("classif.knn", task, rdesc)$pred estimateRelativeOverfitting(rpred, acc, task)
Estimate the residual variance of a regression model on a given task. If a regression learner is provided instead of a model, the model is trained (see train) first.
estimateResidualVariance(x, task, data, target)
estimateResidualVariance(x, task, data, target)
x |
(Learner or WrappedModel) |
task |
(RegrTask) |
data |
(data.frame) |
target |
( |
The function extracts features from functional data based on the Bspline fit.
For more details refer to FDboost::bsignal()
.
extractFDABsignal(bsignal.knots = 10L, bsignal.df = 3)
extractFDABsignal(bsignal.knots = 10L, bsignal.df = 3)
bsignal.knots |
( |
bsignal.df |
( |
(data.frame).
Other fda_featextractor:
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
The function extracts features from functional data based on the DTW distance with a reference dataframe.
extractFDADTWKernel( ref.method = "random", n.refs = 0.05, refs = NULL, dtwwindow = 0.05 )
extractFDADTWKernel( ref.method = "random", n.refs = 0.05, refs = NULL, dtwwindow = 0.05 )
ref.method |
( |
n.refs |
( |
refs |
( |
dtwwindow |
( |
(data.frame).
Other fda_featextractor:
extractFDABsignal()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
Extract non-functional features from functional features using various methods.
The function extractFDAFeatures performs the extraction for all functional features
via the methods specified in feat.methods
and transforms all mentioned functional
(matrix) features into regular data.frame columns.
Additionally, a “extractFDAFeatDesc
” object
which contains learned coefficients and other helpful data for
re-extraction during the predict-phase is returned. This can be used with
reextractFDAFeatures in order to extract features during the prediction phase.
extractFDAFeatures(obj, target = character(0L), feat.methods = list(), ...)
extractFDAFeatures(obj, target = character(0L), feat.methods = list(), ...)
obj |
(Task | data.frame) |
target |
( |
feat.methods |
(named list) |
... |
(any) |
The description object contains these slots:
target (character
): See argument.
coln (character
): Colum names of data.
fd.cols (character
): Functional feature names.
extractFDAFeat (list
): Contains feature.methods
and relevant
parameters for reextraction.
(list)
data | task (data.frame | Task): Extracted features, same type as obj.
desc (extracFDAFeatDesc
): Description object. See description for details.
Other fda:
makeExtractFDAFeatMethod()
,
makeExtractFDAFeatsWrapper()
df = data.frame(x = matrix(rnorm(24), ncol = 8), y = factor(c("a", "a", "b"))) fdf = makeFunctionalData(df, fd.features = list(x1 = 1:4, x2 = 5:8), exclude.cols = "y") task = makeClassifTask(data = fdf, target = "y") extracted = extractFDAFeatures(task, feat.methods = list("x1" = extractFDAFourier(), "x2" = extractFDAWavelets(filter = "haar"))) print(extracted$task) reextractFDAFeatures(task, extracted$desc)
df = data.frame(x = matrix(rnorm(24), ncol = 8), y = factor(c("a", "a", "b"))) fdf = makeFunctionalData(df, fd.features = list(x1 = 1:4, x2 = 5:8), exclude.cols = "y") task = makeClassifTask(data = fdf, target = "y") extracted = extractFDAFeatures(task, feat.methods = list("x1" = extractFDAFourier(), "x2" = extractFDAWavelets(filter = "haar"))) print(extracted$task) reextractFDAFeatures(task, extracted$desc)
The function extracts features from functional data based on the fast fourier transform. For more details refer to stats::fft.
extractFDAFourier(trafo.coeff = "phase")
extractFDAFourier(trafo.coeff = "phase")
trafo.coeff |
( |
(data.frame).
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
The function extracts the functional principal components from a data.frame
containing functional features. Uses stats::prcomp
.
extractFDAFPCA(rank. = NULL, center = TRUE, scale. = FALSE)
extractFDAFPCA(rank. = NULL, center = TRUE, scale. = FALSE)
rank. |
( |
center |
( |
scale. |
( |
(data.frame).
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
,
extractFDAWavelets()
The function extracts currently the mean of multiple segments of each curve and stacks them as features. The segments length are set in a hierachy way so the features cover different resolution levels.
extractFDAMultiResFeatures(res.level = 3L, shift = 0.5, seg.lens = NULL)
extractFDAMultiResFeatures(res.level = 3L, shift = 0.5, seg.lens = NULL)
res.level |
( |
shift |
( |
seg.lens |
( |
(data.frame).
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDATsfeatures()
,
extractFDAWavelets()
The function extracts features from functional data based on known Heuristics.
For more details refer to tsfeatures::tsfeatures()
.
Under the hood this function uses the package tsfeatures::tsfeatures()
.
For more information see Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection, ICDM 2015.
Note: Currently computes the following features:
"frequency", "stl_features", "entropy", "acf_features", "arch_stat",
"crossing_points", "flat_spots", "hurst", "holt_parameters", "lumpiness",
"max_kl_shift", "max_var_shift", "max_level_shift", "stability", "nonlinearity"
extractFDATsfeatures( scale = TRUE, trim = FALSE, trim_amount = 0.1, parallel = FALSE, na.action = na.pass, feats = NULL, ... )
extractFDATsfeatures( scale = TRUE, trim = FALSE, trim_amount = 0.1, parallel = FALSE, na.action = na.pass, feats = NULL, ... )
scale |
( |
trim |
( |
trim_amount |
( |
parallel |
( |
na.action |
( |
feats |
( |
... |
(any) |
Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection, ICDM 2015.
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDAWavelets()
The function extracts discrete wavelet transform coefficients from the raw functional data. See wavelets::dwt for more information.
extractFDAWavelets(filter = "la8", boundary = "periodic")
extractFDAWavelets(filter = "la8", boundary = "periodic")
filter |
( |
boundary |
( |
(data.frame).
Other fda_featextractor:
extractFDABsignal()
,
extractFDADTWKernel()
,
extractFDAFPCA()
,
extractFDAFourier()
,
extractFDAMultiResFeatures()
,
extractFDATsfeatures()
A subclass of WrappedModel. It is created
if you set the respective option in configureMlr - when a model internally crashed during training. The model always predicts NAs.
The if mlr option on.error.dump
is TRUE
, the
FailureModel
contains the debug trace of the error.
It can be accessed with getFailureModelDump
and
inspected with debugger
.
Its encapsulated learner.model
is simply a string:
The error message that was generated when the model crashed.
The following code shows how to access the message.
Other debug:
ResampleResult
,
getPredictionDump()
,
getRRDump()
configureMlr(on.learner.error = "warn") data = iris data$newfeat = 1 # will make LDA crash task = makeClassifTask(data = data, target = "Species") m = train("classif.lda", task) # LDA crashed, but mlr catches this print(m) print(m$learner.model) # the error message p = predict(m, task) # this will predict NAs print(p) print(performance(p)) configureMlr(on.learner.error = "stop")
configureMlr(on.learner.error = "warn") data = iris data$newfeat = 1 # will make LDA crash task = makeClassifTask(data = data, target = "Species") m = train("classif.lda", task) # LDA crashed, but mlr catches this print(m) print(m$learner.model) # the error message p = predict(m, task) # this will predict NAs print(p) print(performance(p)) configureMlr(on.learner.error = "stop")
Feature selection method used by selectFeatures.
The methods used here follow a wrapper approach, described in
Kohavi and John (1997) (see references).
The following optimization algorithms are available:
Exhaustive search. All feature sets (up to a certain number
of features max.features
) are searched.
Random search. Features vectors are randomly drawn,
up to a certain number of features max.features
.
A feature is included in the current set with probability prob
.
So we are basically drawing (0,1)-membership-vectors, where each element
is Bernoulli(prob
) distributed.
Deterministic forward or backward search. That means extending
(forward) or shrinking (backward) a feature set.
Depending on the given method
different approaches are taken.sfs
Sequential Forward Search: Starting from an empty model, in each step the feature increasing
the performance measure the most is added to the model.sbs
Sequential Backward Search: Starting from a model with all features, in each step the feature
decreasing the performance measure the least is removed from the model.sffs
Sequential Floating Forward Search: Starting from an empty model, in each step the algorithm
chooses the best model from all models with one additional feature and from all models with one
feature less.sfbs
Sequential Floating Backward Search: Similar to sffs
but starting with a full model.
Search via genetic algorithm.
The GA is a simple (mu
, lambda
) or (mu
+ lambda
) algorithm,
depending on the comma
setting.
A comma strategy selects a new population of size mu
out of the
lambda
> mu
offspring.
A plus strategy uses the joint pool of mu
parents and lambda
offspring
for selecting mu
new candidates.
Out of those mu
features, the new lambda
features are generated
by randomly choosing pairs of parents. These are crossed over and crossover.rate
represents the probability of choosing a feature from the first parent instead of
the second parent.
The resulting offspring is mutated, i.e., its bits are flipped with
probability mutation.rate
. If max.features
is set, offspring are
repeatedly generated until the setting is satisfied.
makeFeatSelControlExhaustive( same.resampling.instance = TRUE, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlGA( same.resampling.instance = TRUE, impute.val = NULL, maxit = NA_integer_, max.features = NA_integer_, comma = FALSE, mu = 10L, lambda, crossover.rate = 0.5, mutation.rate = 0.05, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlRandom( same.resampling.instance = TRUE, maxit = 100L, max.features = NA_integer_, prob = 0.5, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlSequential( same.resampling.instance = TRUE, impute.val = NULL, method, alpha = 0.01, beta = -0.001, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" )
makeFeatSelControlExhaustive( same.resampling.instance = TRUE, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlGA( same.resampling.instance = TRUE, impute.val = NULL, maxit = NA_integer_, max.features = NA_integer_, comma = FALSE, mu = 10L, lambda, crossover.rate = 0.5, mutation.rate = 0.05, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlRandom( same.resampling.instance = TRUE, maxit = 100L, max.features = NA_integer_, prob = 0.5, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" ) makeFeatSelControlSequential( same.resampling.instance = TRUE, impute.val = NULL, method, alpha = 0.01, beta = -0.001, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" )
same.resampling.instance |
( |
maxit |
( |
max.features |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
impute.val |
(numeric) |
comma |
( |
mu |
( |
lambda |
( |
crossover.rate |
( |
mutation.rate |
( |
prob |
( |
method |
( |
alpha |
( |
beta |
( |
(FeatSelControl). The specific subclass is one of FeatSelControlExhaustive, FeatSelControlRandom, FeatSelControlSequential, FeatSelControlGA.
Ron Kohavi and George H. John,
Wrappers for feature subset selection, Artificial Intelligence Volume 97, 1997, 273-324.
http://ai.stanford.edu/~ronnyk/wrappersPrint.pdf.
Other featsel:
analyzeFeatSelResult()
,
getFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
Container for results of feature selection.
Contains the obtained features, their performance values
and the optimization path which lead there.
You can visualize it using analyzeFeatSelResult.
Object members:
Learner that was optimized.
Control object from feature selection.
Vector of feature names identified as optimal.
Performance values for optimal x
.
Vector of finally found and used thresholds
if tune.threshold
was enabled in FeatSelControl, otherwise not present and
hence NULL
.
Optimization path which lead to x
.
First, calls generateFilterValuesData.
Features are then selected via select
and val
.
filterFeatures( task, method = "FSelectorRcpp_information.gain", fval = NULL, perc = NULL, abs = NULL, threshold = NULL, fun = NULL, fun.args = NULL, mandatory.feat = NULL, select.method = NULL, base.methods = NULL, cache = FALSE, ... )
filterFeatures( task, method = "FSelectorRcpp_information.gain", fval = NULL, perc = NULL, abs = NULL, threshold = NULL, fun = NULL, fun.args = NULL, mandatory.feat = NULL, select.method = NULL, base.methods = NULL, cache = FALSE, ... )
task |
(Task) |
method |
( |
fval |
(FilterValues) |
perc |
( |
abs |
( |
threshold |
( |
fun |
( |
fun.args |
(any) |
mandatory.feat |
(character) |
select.method |
If multiple methods are supplied in argument |
base.methods |
If |
cache |
( |
... |
(any) |
Task.
If cache = TRUE
, the default mlr cache directory is used to cache
filter values. The directory is operating system dependent and can be
checked with getCacheDir()
.
The default cache can be cleared with deleteCacheDir()
.
Alternatively, a custom directory can be passed to store the cache.
Note that caching is not thread safe. It will work for parallel computation on many systems, but there is no guarantee.
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
Other filter:
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
# simple filter filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2) # ensemble filter filterFeatures(iris.task, method = "E-min", base.methods = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"), abs = 2)
# simple filter filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2) # ensemble filter filterFeatures(iris.task, method = "E-min", base.methods = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"), abs = 2)
Performs a PMCMRplus::frdAllPairsNemenyiTest for a BenchmarkResult and a selected measure.
This means all pairwise comparisons of learners
are performed. The null
hypothesis of the post hoc test is that each pair of learners is equal. If
the null hypothesis of the included ad hoc stats::friedman.test can be
rejected an object of class pairwise.htest
is returned. If not, the
function returns the corresponding friedman.test.
Note that benchmark results for at least two learners on at least two tasks are required.
friedmanPostHocTestBMR( bmr, measure = NULL, p.value = 0.05, aggregation = "default" )
friedmanPostHocTestBMR( bmr, measure = NULL, p.value = 0.05, aggregation = "default" )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
p.value |
( |
aggregation |
( |
(pairwise.htest
): See PMCMRplus::frdAllPairsNemenyiTest for
details.
Additionally two components are added to the list:
f.rejnull (logical(1)
):
Whether the according friedman.test rejects
the Null hypothesis at the selected p.value
crit.difference (list(2)
):
Minimal difference the mean ranks of two
learners need to have in order to be significantly different
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
Performs a stats::friedman.test for a selected measure. The null hypothesis is that apart from an effect of the different (Task), the location parameter (aggregated performance measure) is the same for each Learner. Note that benchmark results for at least two learners on at least two tasks are required.
friedmanTestBMR(bmr, measure = NULL, aggregation = "default")
friedmanTestBMR(bmr, measure = NULL, aggregation = "default")
bmr |
(BenchmarkResult) |
measure |
(Measure) |
aggregation |
( |
(htest
): See stats::friedman.test for details.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
Contains the task (fuelsubset.task
).
2 functional covariates and 1 scalar covariate.
You have to predict the heat value of some fuel based on the
ultraviolet radiation spectrum and infrared ray radiation and one scalar
column called h2o.
The features and grids are scaled in the same way as in FDboost::FDboost.
See Brockhaus, S., Scheipl, F., Hothorn, T., & Greven, S. (2015). The functional linear array model. Statistical Modelling, 15(3), 279–300.
A calibrated classifier is one where the predicted probability of a class closely matches the rate at which that class occurs, e.g. for data points which are assigned a predicted probability of class A of .8, approximately 80 percent of such points should belong to class A if the classifier is well calibrated. This is estimated empirically by grouping data points with similar predicted probabilities for each class, and plotting the rate of each class within each bin against the predicted probability bins.
generateCalibrationData(obj, breaks = "Sturges", groups = NULL, task.id = NULL)
generateCalibrationData(obj, breaks = "Sturges", groups = NULL, task.id = NULL)
obj |
(list of Prediction | list of ResampleResult | BenchmarkResult) |
breaks |
( |
groups |
( |
task.id |
( |
CalibrationData. A list containing:
proportion |
data.frame with columns:
|
data |
data.frame with columns:
|
task |
(TaskDesc) |
Vuk, Miha, and Curk, Tomaz. “ROC Curve, Lift Chart, and Calibration Plot.” Metodoloski zvezki. Vol. 3. No. 1 (2006): 89-108.
Other generate_plot_data:
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other calibration:
plotCalibration()
Generates data that can be used to plot a
critical differences plot. Computes the critical differences according
to either the
"Bonferroni-Dunn"
test or the "Nemenyi"
test."Bonferroni-Dunn"
usually yields higher power as it does not
compare all algorithms to each other, but all algorithms to a
baseline
instead.
Learners are drawn on the y-axis according to their average rank.
For test = "nemenyi"
a bar is drawn, connecting all groups of not
significantly different learners.
For test = "bd"
an interval is drawn arround the algorithm selected
as a baseline. All learners within this interval are not signifcantly different
from the baseline.
Calculation:
Where is based on the studentized range statistic.
See references for details.
generateCritDifferencesData( bmr, measure = NULL, p.value = 0.05, baseline = NULL, test = "bd" )
generateCritDifferencesData( bmr, measure = NULL, p.value = 0.05, baseline = NULL, test = "bd" )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
p.value |
( |
baseline |
( |
test |
( |
(critDifferencesData
). List containing:
data |
(data.frame) containing the info for the descriptive part of the plot |
friedman.nemenyi.test |
(list) of class |
cd.info |
(list) containing info on the critical difference and its positioning |
baseline |
|
p.value |
p.value used for the PMCMRplus::frdAllPairsNemenyiTest and for computation of the critical difference |
Other generate_plot_data:
generateCalibrationData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Estimate how important individual features or groups of features are by contrasting prediction performances. For method “permutation.importance” compute the change in performance from permuting the values of a feature (or a group of features) and compare that to the predictions made on the unmcuted data.
generateFeatureImportanceData( task, method = "permutation.importance", learner, features = getTaskFeatureNames(task), interaction = FALSE, measure, contrast = function(x, y) x - y, aggregation = mean, nmc = 50L, replace = TRUE, local = FALSE, show.info = FALSE )
generateFeatureImportanceData( task, method = "permutation.importance", learner, features = getTaskFeatureNames(task), interaction = FALSE, measure, contrast = function(x, y) x - y, aggregation = mean, nmc = 50L, replace = TRUE, local = FALSE, show.info = FALSE )
task |
(Task) |
method |
( |
learner |
(Learner | |
features |
(character) |
interaction |
( |
measure |
(Measure) |
contrast |
( |
aggregation |
( |
nmc |
( |
replace |
( |
local |
( |
show.info |
( |
(FeatureImportance
). A named list which contains the computed feature importance and the input arguments.
Object members:
res |
(data.frame) |
interaction |
( |
measure |
(Measure) |
The measure used to compute performance.
contrast |
( |
aggregation |
( |
replace |
( |
nmc |
( |
local |
( |
Jerome Friedman; Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232.
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, iris.task) imp = generateFeatureImportanceData(iris.task, "permutation.importance", lrn, "Petal.Width", nmc = 10L, local = TRUE)
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, iris.task) imp = generateFeatureImportanceData(iris.task, "permutation.importance", lrn, "Petal.Width", nmc = 10L, local = TRUE)
Calculates numerical filter values for features. For a list of features, use listFilterMethods.
generateFilterValuesData( task, method = "FSelectorRcpp_information.gain", nselect = getTaskNFeats(task), ..., more.args = list() )
generateFilterValuesData( task, method = "FSelectorRcpp_information.gain", nselect = getTaskNFeats(task), ..., more.args = list() )
task |
(Task) |
method |
(character | list) |
nselect |
( |
... |
(any) |
more.args |
(named list) |
(FilterValues). A list
containing:
task.desc |
[TaskDesc) |
data |
( |
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other filter:
filterFeatures()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
# two simple filter methods fval = generateFilterValuesData(iris.task, method = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain")) # using ensemble method "E-mean" fval = generateFilterValuesData(iris.task, method = list("E-mean", c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain")))
# two simple filter methods fval = generateFilterValuesData(iris.task, method = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain")) # using ensemble method "E-mean" fval = generateFilterValuesData(iris.task, method = list("E-mean", c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain")))
Generate cleaned hyperparameter effect data from a tuning result or from a nested cross-validation tuning result. The object returned can be used for custom visualization or passed downstream to an out of the box mlr method, plotHyperParsEffect.
generateHyperParsEffectData( tune.result, include.diagnostics = FALSE, trafo = FALSE, partial.dep = FALSE )
generateHyperParsEffectData( tune.result, include.diagnostics = FALSE, trafo = FALSE, partial.dep = FALSE )
tune.result |
(TuneResult | ResampleResult) |
include.diagnostics |
( |
trafo |
( |
partial.dep |
( |
(HyperParsEffectData
)
Object containing the hyperparameter effects dataframe, the tuning
performance measures used, the hyperparameters used, a flag for including
diagnostic info, a flag for whether nested cv was used, a flag for whether
partial dependence should be generated, and the optimization algorithm used.
## Not run: # 3-fold cross validation ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4))) ctrl = makeTuneControlGrid() rdesc = makeResampleDesc("CV", iters = 3L) res = tuneParams("classif.ksvm", task = pid.task, resampling = rdesc, par.set = ps, control = ctrl) data = generateHyperParsEffectData(res) plt = plotHyperParsEffect(data, x = "C", y = "mmce.test.mean") plt + ylab("Misclassification Error") # nested cross validation ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4))) ctrl = makeTuneControlGrid() rdesc = makeResampleDesc("CV", iters = 3L) lrn = makeTuneWrapper("classif.ksvm", control = ctrl, resampling = rdesc, par.set = ps) res = resample(lrn, task = pid.task, resampling = cv2, extract = getTuneResult) data = generateHyperParsEffectData(res) plotHyperParsEffect(data, x = "C", y = "mmce.test.mean", plot.type = "line") ## End(Not run)
## Not run: # 3-fold cross validation ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4))) ctrl = makeTuneControlGrid() rdesc = makeResampleDesc("CV", iters = 3L) res = tuneParams("classif.ksvm", task = pid.task, resampling = rdesc, par.set = ps, control = ctrl) data = generateHyperParsEffectData(res) plt = plotHyperParsEffect(data, x = "C", y = "mmce.test.mean") plt + ylab("Misclassification Error") # nested cross validation ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4))) ctrl = makeTuneControlGrid() rdesc = makeResampleDesc("CV", iters = 3L) lrn = makeTuneWrapper("classif.ksvm", control = ctrl, resampling = rdesc, par.set = ps) res = resample(lrn, task = pid.task, resampling = cv2, extract = getTuneResult) data = generateHyperParsEffectData(res) plotHyperParsEffect(data, x = "C", y = "mmce.test.mean", plot.type = "line") ## End(Not run)
Observe how the performance changes with an increasing number of observations.
generateLearningCurveData( learners, task, resampling = NULL, percs = seq(0.1, 1, by = 0.1), measures, stratify = FALSE, show.info = getMlrOption("show.info") )
generateLearningCurveData( learners, task, resampling = NULL, percs = seq(0.1, 1, by = 0.1), measures, stratify = FALSE, show.info = getMlrOption("show.info") )
learners |
[(list of) Learner) |
task |
(Task) |
resampling |
(ResampleDesc | ResampleInstance) |
percs |
(numeric) |
measures |
[(list of) Measure) |
stratify |
( |
show.info |
( |
(LearningCurveData). A list
containing:
The Task
List of Measure)
Performance measures
data (data.frame) with columns:
learner
Names of learners.
percentage
Percentages drawn from the training split.
One column for each Measure passed to generateLearningCurveData.
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
,
plotFilterValues()
Other learning_curve:
plotLearningCurve()
r = generateLearningCurveData(list("classif.rpart", "classif.knn"), task = sonar.task, percs = seq(0.2, 1, by = 0.2), measures = list(tp, fp, tn, fn), resampling = makeResampleDesc(method = "Subsample", iters = 5), show.info = FALSE) plotLearningCurve(r)
r = generateLearningCurveData(list("classif.rpart", "classif.knn"), task = sonar.task, percs = seq(0.2, 1, by = 0.2), measures = list(tp, fp, tn, fn), resampling = makeResampleDesc(method = "Subsample", iters = 5), show.info = FALSE) plotLearningCurve(r)
Estimate how the learned prediction function is affected by one or more features. For a learned function f(x) where x is partitioned into x_s and x_c, the partial dependence of f on x_s can be summarized by averaging over x_c and setting x_s to a range of values of interest, estimating E_(x_c)(f(x_s, x_c)). The conditional expectation of f at observation i is estimated similarly. Additionally, partial derivatives of the marginalized function w.r.t. the features can be computed.
This function requires the mmpf
package to be installed. It is currently not on CRAN, but can
be installed through GitHub using devtools::install_github('zmjones/mmpf/pkg')
.
generatePartialDependenceData( obj, input, features = NULL, interaction = FALSE, derivative = FALSE, individual = FALSE, fun = mean, bounds = c(qnorm(0.025), qnorm(0.975)), uniform = TRUE, n = c(10, NA), ... )
generatePartialDependenceData( obj, input, features = NULL, interaction = FALSE, derivative = FALSE, individual = FALSE, fun = mean, bounds = c(qnorm(0.025), qnorm(0.975)), uniform = TRUE, n = c(10, NA), ... )
obj |
(WrappedModel) |
input |
(data.frame | Task) |
features |
character |
interaction |
( |
derivative |
( |
individual |
( |
fun |
A function which operates on the output on the predictions made on the |
bounds |
( |
uniform |
( |
n |
( |
... |
additional arguments to be passed to |
PartialDependenceData. A named list, which contains the partial dependence, input data, target, features, task description, and other arguments controlling the type of partial dependences made.
Object members:
data |
data.frame |
task.desc |
TaskDesc |
target |
Target feature for regression, target feature levels for classification, survival and event indicator for survival. |
features |
character |
interaction |
( |
derivative |
( |
individual |
( |
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. “Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation.” Journal of Computational and Graphical Statistics. Vol. 24, No. 1 (2015): 44-65.
Friedman, Jerome. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics. Vol. 29. No. 5 (2001): 1189-1232.
Other partial_dependence:
plotPartialDependence()
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generateThreshVsPerfData()
,
plotFilterValues()
lrn = makeLearner("regr.svm") fit = train(lrn, bh.task) pd = generatePartialDependenceData(fit, bh.task, "lstat") plotPartialDependence(pd, data = getTaskData(bh.task)) lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, iris.task) pd = generatePartialDependenceData(fit, iris.task, "Petal.Width") plotPartialDependence(pd, data = getTaskData(iris.task))
lrn = makeLearner("regr.svm") fit = train(lrn, bh.task) pd = generatePartialDependenceData(fit, bh.task, "lstat") plotPartialDependence(pd, data = getTaskData(bh.task)) lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, iris.task) pd = generatePartialDependenceData(fit, iris.task, "Petal.Width") plotPartialDependence(pd, data = getTaskData(iris.task))
Generates data on threshold vs. performance(s) for 2-class classification that can be used for plotting.
generateThreshVsPerfData( obj, measures, gridsize = 100L, aggregate = TRUE, task.id = NULL )
generateThreshVsPerfData( obj, measures, gridsize = 100L, aggregate = TRUE, task.id = NULL )
obj |
(list of Prediction | list of ResampleResult | BenchmarkResult) |
measures |
(Measure | list of Measure) |
gridsize |
( |
aggregate |
( |
task.id |
( |
(ThreshVsPerfData). A named list containing the measured performance across the threshold grid, the measures, and whether the performance estimates were aggregated (only applicable for (list of) ResampleResults).
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
plotFilterValues()
Other thresh_vs_perf:
plotROCCurves()
,
plotThreshVsPerf()
Either a list of lists of “aggr” numeric vectors, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
getBMRAggrPerformances( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRAggrPerformances( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Returns a nested list of FeatSelResults. The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. If as.df
is TRUE
, a data frame with “task.id”, “learner.id”, the resample iteration and the selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple rows for the same dataset-learner-iteration; one for each selected feature.
getBMRFeatSelResults( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRFeatSelResults( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Returns a nested list of characters The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. The list at the lowest level is the list of selected features. If as.df
is TRUE
, a data frame with “task.id”, “learner.id”, the resample iteration and the selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple rows for the same dataset-learner-iteration; one for each selected feature.
getBMRFilteredFeatures( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRFilteredFeatures( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the IDs of the learners used in a benchmark experiment.
getBMRLearnerIds(bmr)
getBMRLearnerIds(bmr)
bmr |
(BenchmarkResult) |
(character).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the learners used in a benchmark experiment.
getBMRLearners(bmr)
getBMRLearners(bmr)
bmr |
(BenchmarkResult) |
(list).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the learner short.names of the learners used in a benchmark experiment.
getBMRLearnerShortNames(bmr)
getBMRLearnerShortNames(bmr)
bmr |
(BenchmarkResult) |
(character).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the IDs of the measures used in a benchmark experiment.
getBMRMeasureIds(bmr)
getBMRMeasureIds(bmr)
bmr |
(BenchmarkResult) |
(list). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the measures used in a benchmark experiment.
getBMRMeasures(bmr)
getBMRMeasures(bmr)
bmr |
(BenchmarkResult) |
(list). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
A list of lists containing all WrappedModels trained in the benchmark experiment.
If models
is FALSE
in the call to benchmark, the function will return NULL
.
getBMRModels(bmr, task.ids = NULL, learner.ids = NULL, drop = FALSE)
getBMRModels(bmr, task.ids = NULL, learner.ids = NULL, drop = FALSE)
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
drop |
( |
(list).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Either a list of lists of “measure.test” data.frames, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
getBMRPerformances( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRPerformances( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Either a list of lists of ResamplePrediction objects, as returned by resample, or these objects are rbind-ed with extra columns “task.id” and “learner.id”.
If predict.type
is “prob”, the probabilities for each class are returned in addition to the response.
If keep.pred
is FALSE
in the call to benchmark, the function will return NULL
.
getBMRPredictions( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRPredictions( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
A list containing all TaskDescs for each task contained in the benchmark experiment.
getBMRTaskDescriptions(bmr)
getBMRTaskDescriptions(bmr)
bmr |
(BenchmarkResult) |
(list).
A list containing all TaskDescs for each task contained in the benchmark experiment.
getBMRTaskDescs(bmr)
getBMRTaskDescs(bmr)
bmr |
(BenchmarkResult) |
(list).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Gets the task IDs used in a benchmark experiment.
getBMRTaskIds(bmr)
getBMRTaskIds(bmr)
bmr |
(BenchmarkResult) |
(character).
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Returns a nested list of TuneResults. The first level of nesting is by data set, the second by learner, the third for the benchmark resampling iterations. If as.df
is TRUE
, a data frame with the “task.id”, “learner.id”, the resample iteration, the parameter values and the performances is returned.
getBMRTuneResults( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
getBMRTuneResults( bmr, task.ids = NULL, learner.ids = NULL, as.df = FALSE, drop = FALSE )
bmr |
(BenchmarkResult) |
task.ids |
( |
learner.ids |
( |
as.df |
( |
drop |
( |
(list | data.frame). See above.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Constructs a grid of tuning parameters from a learner of the caret
R-package. These values are then converted into a list of non-tunable
parameters (par.vals
) and a tunable
ParamHelpers::ParamSet (par.set
), which can be used by
tuneParams for tuning the learner. Numerical parameters will
either be specified by their lower and upper bounds or they will be
discretized into specific values.
getCaretParamSet(learner, length = 3L, task, discretize = TRUE)
getCaretParamSet(learner, length = 3L, task, discretize = TRUE)
learner |
( |
length |
( |
task |
(Task) |
discretize |
( |
(list(2)
). A list of parameters:
par.vals
contains a list of all constant tuning parameters
par.set
is a ParamHelpers::ParamSet, containing all the configurable
tuning parameters
if (requireNamespace("caret") && requireNamespace("mlbench")) { library(caret) classifTask = makeClassifTask(data = iris, target = "Species") # (1) classification (random forest) with discretized parameters getCaretParamSet("rf", length = 9L, task = classifTask, discretize = TRUE) # (2) regression (gradient boosting machine) without discretized parameters library(mlbench) data(BostonHousing) regrTask = makeRegrTask(data = BostonHousing, target = "medv") getCaretParamSet("gbm", length = 9L, task = regrTask, discretize = FALSE) }
if (requireNamespace("caret") && requireNamespace("mlbench")) { library(caret) classifTask = makeClassifTask(data = iris, target = "Species") # (1) classification (random forest) with discretized parameters getCaretParamSet("rf", length = 9L, task = classifTask, discretize = TRUE) # (2) regression (gradient boosting machine) without discretized parameters library(mlbench) data(BostonHousing) regrTask = makeRegrTask(data = BostonHousing, target = "medv") getCaretParamSet("gbm", length = 9L, task = regrTask, discretize = FALSE) }
Gets the class weight parameter of a learner.
getClassWeightParam(learner, lrn.id = NULL)
getClassWeightParam(learner, lrn.id = NULL)
learner |
(Learner | |
lrn.id |
(character) |
numeric LearnerParam: A numeric parameter object, containing the class weight parameter of the given learner.
Other learner:
LearnerProperties
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
getConfMatrix
is deprecated. Please use calculateConfusionMatrix.
Calculates confusion matrix for (possibly resampled) prediction. Rows indicate true classes, columns predicted classes.
The marginal elements count the number of classification errors for the respective row or column, i.e., the number of errors when you condition on the corresponding true (rows) or predicted (columns) class. The last element in the margin diagonal displays the total amount of errors.
Note that for resampling no further aggregation is currently performed. All predictions on all test sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated vs y, as if both were computed on a single test set. This probably mainly makes sense when cross-validation is used for resampling.
getConfMatrix(pred, relative = FALSE)
getConfMatrix(pred, relative = FALSE)
pred |
(Prediction) |
relative |
( |
(matrix). A confusion matrix.
Get the default measure for a task type, task, task description or a learner.
Currently these are:
classif: mmce
regr: mse
cluster: db
surv: cindex
costsen: mcp
multilabel: multilabel.hamloss
getDefaultMeasure(x)
getDefaultMeasure(x)
x |
([character(1)' | Task | TaskDesc | Learner) |
(Measure).
Returns the error dump that can be used with debugger()
to evaluate errors.
If configureMlr configuration on.error.dump
is FALSE
, this returns
NULL
.
getFailureModelDump(model)
getFailureModelDump(model)
model |
(WrappedModel) |
(last.dump
).
Such a model is created when one sets the corresponding option in configureMlr.
If no failure occurred, NA
is returned.
For complex wrappers this getter returns the first error message encountered in ANY model that failed.
getFailureModelMsg(model)
getFailureModelMsg(model)
model |
(WrappedModel) |
(character(1)
).
Returns the selected feature set and optimization path after training.
getFeatSelResult(object)
getFeatSelResult(object)
object |
(WrappedModel) |
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
makeFeatSelWrapper()
,
selectFeatures()
For some learners it is possible to calculate a feature importance measure.
getFeatureImportance
extracts those values from trained models.
See below for a list of supported learners.
getFeatureImportance(object, ...)
getFeatureImportance(object, ...)
object |
(WrappedModel) |
... |
(any) |
boosting
Measure which accounts the gain of Gini index given by a feature
in a tree and the weight of that tree.
cforest
Permutation principle of the 'mean decrease in accuracy' principle in
randomForest. If auc=TRUE
(only for binary classification), area under
the curve is used as measure. The algorithm used for the survival learner
is 'extremely slow and experimental; use at your own risk'. See
party::varimp()
for details and further parameters.
gbm
Estimation of relative influence for each feature. See
gbm::relative.influence()
for details and further parameters.
h2o
Relative feature importances as returned by
h2o::h2o.varimp()
.
randomForest
For type = 2
(the default) the 'MeanDecreaseGini' is measured, which is
based on the Gini impurity index used for the calculation of the nodes.
Alternatively, you can set type
to 1, then the measure is the mean
decrease in accuracy calculated on OOB data. Note, that in this case the
learner's parameter importance
needs to be set to be able to compute
feature importance values.
See randomForest::importance()
for details.
RRF
This is identical to randomForest.
ranger
Supports both measures mentioned above for the randomForest
learner. Note, that you need to specifically set the learners parameter
importance
, to be able to compute feature importance measures.
See ranger::importance()
and
ranger::ranger()
for details.
rpart
Sum of decrease in impurity for each of the surrogate variables at each
node
xgboost
The value implies the relative contribution of the corresponding feature
to the model calculated by taking each feature's contribution for each
tree in the model. The exact computation of the importance in xgboost is
undocumented.
(FeatureImportance
) An object containing a data.frame
of the
variable importances and further information.
Returns the filtered features.
getFilteredFeatures(model)
getFilteredFeatures(model)
model |
(WrappedModel) |
(character).
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
The parameters “subset”, “features”, and “recode.target” are ignored for the data.frame method.
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no") ## S3 method for class 'Task' getFunctionalFeatures(object, subset = NULL, features, recode.target = "no") ## S3 method for class 'data.frame' getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no") ## S3 method for class 'Task' getFunctionalFeatures(object, subset = NULL, features, recode.target = "no") ## S3 method for class 'data.frame' getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
object |
(Task/data.frame) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
recode.target |
( |
Returns a data.frame
containing only the functional features.
getLearnerModel
instead.Deprecated, use getLearnerModel
instead.
getHomogeneousEnsembleModels(model, learner.models = FALSE)
getHomogeneousEnsembleModels(model, learner.models = FALSE)
model |
Deprecated. |
learner.models |
Deprecated. |
Retrieves the current hyperparameter settings of a learner.
getHyperPars(learner, for.fun = c("train", "predict", "both"))
getHyperPars(learner, for.fun = c("train", "predict", "both"))
learner |
(Learner) |
for.fun |
( |
This function only shows hyperparameters that differ from the
learner default (because mlr
changed the default) or if the user set
hyperparameters manually during learner creation. If you want to have an
overview of all available hyperparameters use getParamSet()
.
(list). A named list of values.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
getHyperPars(makeLearner("classif.ranger")) ## set learner hyperparameter `mtry` manually getHyperPars(makeLearner("classif.ranger", mtry = 100))
getHyperPars(makeLearner("classif.ranger")) ## set learner hyperparameter `mtry` manually getHyperPars(makeLearner("classif.ranger", mtry = 100))
Get the ID of the learner.
getLearnerId(learner)
getLearnerId(learner)
learner |
(Learner | |
(character(1)
).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get underlying R model of learner integrated into mlr.
getLearnerModel(model, more.unwrap = FALSE)
getLearnerModel(model, more.unwrap = FALSE)
model |
(WrappedModel) |
more.unwrap |
( |
(any). A fitted model, depending the learner / wrapped package. E.g., a model of class rpart::rpart for learner “classif.rpart”.
Get the note for the learner.
getLearnerNote(learner)
getLearnerNote(learner)
learner |
(Learner | |
(character).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the R packages the learner requires.
getLearnerPackages(learner)
getLearnerPackages(learner)
learner |
(Learner | |
(character).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Alias for getParamSet.
getLearnerParamSet(learner)
getLearnerParamSet(learner)
learner |
(Learner | |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Alias for getHyperPars.
getLearnerParVals(learner, for.fun = c("train", "predict", "both"))
getLearnerParVals(learner, for.fun = c("train", "predict", "both"))
learner |
(Learner | |
for.fun |
( |
(list). A named list of values.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the predict type of the learner.
getLearnerPredictType(learner)
getLearnerPredictType(learner)
learner |
(Learner | |
(character(1)
).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
For an ordinary learner simply its short name is returned. For wrapped learners, the wrapper id is successively attached to the short name of the base learner. E.g: “rf.bagged.imputed”
getLearnerShortName(learner)
getLearnerShortName(learner)
learner |
(Learner | |
(character(1)
).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Get the type of the learner.
getLearnerType(learner)
getLearnerType(learner)
learner |
(Learner | |
(character(1)
).
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Gets the options for mlr.
getMlrOptions()
getMlrOptions()
(list).
Other configure:
configureMlr()
Measures the quality of each binary label prediction w.r.t. some binary classification performance measure.
getMultilabelBinaryPerformances(pred, measures)
getMultilabelBinaryPerformances(pred, measures)
pred |
(Prediction) |
measures |
(Measure | list of Measure) |
(named matrix
). Performance value(s), column names are measure(s), row names are labels.
Other multilabel:
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
# see makeMultilabelBinaryRelevanceWrapper
# see makeMultilabelBinaryRelevanceWrapper
opt.path
s from each tuning step from the outer resampling.After you resampled a tuning wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
this helper returns a data.frame
with
with all opt.path
s combined by rbind
.
An additional column iter
indicates to what resampling iteration the row belongs.
getNestedTuneResultsOptPathDf(r, trafo = FALSE)
getNestedTuneResultsOptPathDf(r, trafo = FALSE)
r |
(ResampleResult) |
trafo |
( |
(data.frame). See above.
Other tune:
TuneControl
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
# see example of makeTuneWrapper
# see example of makeTuneWrapper
After you resampled a tuning wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
this helper returns a data.frame
with
the best found hyperparameter settings for each resampling iteration.
getNestedTuneResultsX(r)
getNestedTuneResultsX(r)
r |
(ResampleResult) |
(data.frame). One column for each tuned hyperparameter and one row for each outer resampling iteration.
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
# see example of makeTuneWrapper
# see example of makeTuneWrapper
Learners like randomForest
produce out-of-bag predictions.
getOOBPreds
extracts this information from trained models and builds a
prediction object as provided by predict (with prediction time set to NA).
In the classification case:
What is stored exactly in the (Prediction) object depends
on the predict.type
setting of the Learner.
You can call listLearners(properties = "oobpreds")
to get a list of learners
which provide this.
getOOBPreds(model, task)
getOOBPreds(model, task)
model |
(WrappedModel) |
task |
(Task) |
(Prediction).
training.set = sample(1:150, 50) lrn = makeLearner("classif.ranger", predict.type = "prob", predict.threshold = 0.6) mod = train(lrn, sonar.task, subset = training.set) oob = getOOBPreds(mod, sonar.task) oob performance(oob, measures = list(auc, mmce))
training.set = sample(1:150, 50) lrn = makeLearner("classif.ranger", predict.type = "prob", predict.threshold = 0.6) mod = train(lrn, sonar.task, subset = training.set) oob = getOOBPreds(mod, sonar.task) oob performance(oob, measures = list(auc, mmce))
Returns the ParamHelpers::ParamSet from a Learner.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Returns the error dump that can be used with debugger()
to evaluate errors.
If configureMlr configuration on.error.dump
is FALSE
or if the
prediction did not fail, this returns NULL
.
getPredictionDump(pred)
getPredictionDump(pred)
pred |
(Prediction) |
(last.dump
).
Other debug:
FailureModel
,
ResampleResult
,
getRRDump()
Get probabilities for some classes.
getPredictionProbabilities(pred, cl)
getPredictionProbabilities(pred, cl)
pred |
(Prediction) |
cl |
(character) |
(data.frame) with numerical columns or a numerical vector if length of cl
is 1.
Order of columns is defined by cl
.
Other predict:
asROCRPrediction()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda", predict.type = "prob") mod = train(lrn, task) # predict probabilities pred = predict(mod, newdata = iris) # Get probabilities for all classes head(getPredictionProbabilities(pred)) # Get probabilities for a subset of classes head(getPredictionProbabilities(pred, c("setosa", "virginica")))
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda", predict.type = "prob") mod = train(lrn, task) # predict probabilities pred = predict(mod, newdata = iris) # Get probabilities for all classes head(getPredictionProbabilities(pred)) # Get probabilities for a subset of classes head(getPredictionProbabilities(pred, c("setosa", "virginica")))
The following types are returned, depending on task type:
classif | factor |
regr | numeric |
se | numeric |
cluster | integer |
surv | numeric |
multilabel | logical matrix, columns named with labels |
getPredictionResponse(pred) getPredictionSE(pred) getPredictionTruth(pred)
getPredictionResponse(pred) getPredictionSE(pred) getPredictionTruth(pred)
pred |
(Prediction) |
See above.
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
See title.
getPredictionTaskDesc(pred)
getPredictionTaskDesc(pred)
pred |
(Prediction) |
ret_taskdesc
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
predict.WrappedModel()
,
setPredictThreshold()
,
setPredictType()
getPredictionProbabilities
instead.Deprecated, use getPredictionProbabilities
instead.
getProbabilities(pred, cl)
getProbabilities(pred, cl)
pred |
Deprecated. |
cl |
Deprecated. |
After you resampled a tuning or feature selection wrapper (see makeTuneWrapper)
with resample(..., extract = getTuneResult)
or resample(..., extract = getFeatSelResult)
this helper returns a list
with
the resampling indices used for the respective method.
getResamplingIndices(object, inner = FALSE)
getResamplingIndices(object, inner = FALSE)
object |
(ResampleResult) |
inner |
(logical) |
(list). One list for each outer resampling fold.
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.rpart") # stupid mini grid ps = makeParamSet( makeDiscreteParam("cp", values = c(0.05, 0.1)), makeDiscreteParam("minsplit", values = c(10, 20)) ) ctrl = makeTuneControlGrid() inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl) # nested resampling for evaluation # we also extract tuned hyper pars in each iteration and by that the resampling indices r = resample(lrn, task, outer, extract = getTuneResult) # get tuning indices getResamplingIndices(r, inner = TRUE)
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.rpart") # stupid mini grid ps = makeParamSet( makeDiscreteParam("cp", values = c(0.05, 0.1)), makeDiscreteParam("minsplit", values = c(10, 20)) ) ctrl = makeTuneControlGrid() inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl) # nested resampling for evaluation # we also extract tuned hyper pars in each iteration and by that the resampling indices r = resample(lrn, task, outer, extract = getTuneResult) # get tuning indices getResamplingIndices(r, inner = TRUE)
Returns the error dumps generated during resampling, which can be used with debugger()
to debug errors. These dumps are saved if configureMlr configuration on.error.dump
,
or the corresponding learner config
, is TRUE
.
The returned object is a list with as many entries as the resampling being used has folds. Each of these entries can have a subset of the following slots, depending on which step in the resampling iteration failed: “train” (error during training step), “predict.train” (prediction on training subset), “predict.test” (prediction on test subset).
getRRDump(res)
getRRDump(res)
res |
(ResampleResult) |
list.
Other debug:
FailureModel
,
ResampleResult
,
getPredictionDump()
This function creates a list with two slots train
and test
where
each slot is again a list of Prediction objects for each single
resample iteration.
In case that predict = "train"
was used for the resample description
(see makeResampleDesc), the slot test
will be NULL
and in case that predict = "test"
was used, the slot train
will be
NULL
.
getRRPredictionList(res, ...)
getRRPredictionList(res, ...)
res |
(ResampleResult) |
... |
(any) |
list.
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Very simple getter.
getRRPredictions(res)
getRRPredictions(res)
res |
(ResampleResult) |
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get a summarizing task description.
getRRTaskDesc(res)
getRRTaskDesc(res)
res |
(ResampleResult) |
(TaskDesc).
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Get a summarizing task description.
getRRTaskDescription(res)
getRRTaskDescription(res)
res |
(ResampleResult) |
(TaskDesc).
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Returns the predictions for each base learner.
getStackedBaseLearnerPredictions(model, newdata = NULL)
getStackedBaseLearnerPredictions(model, newdata = NULL)
model |
(WrappedModel) |
newdata |
(data.frame) |
None.
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
getTaskClassLevels(x)
getTaskClassLevels(x)
x |
(character).
Other task:
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Returns “NULL” if the task is not of type “costsens”.
getTaskCosts(task, subset = NULL)
getTaskCosts(task, subset = NULL)
task |
(CostSensTask) |
subset |
(integer | logical | |
(matrix
| NULL
).
Other task:
getTaskClassLevels()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Useful in trainLearner when you add a learning machine to the package.
getTaskData( task, subset = NULL, features, target.extra = FALSE, recode.target = "no", functionals.as = "dfcols" )
getTaskData( task, subset = NULL, features, target.extra = FALSE, recode.target = "no", functionals.as = "dfcols" )
task |
(Task) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
target.extra |
( |
recode.target |
( |
functionals.as |
( |
Either a data.frame or a list with data.frame data
and vector target
.
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
library("mlbench") data(BreastCancer) df = BreastCancer df$Id = NULL task = makeClassifTask(id = "BreastCancer", data = df, target = "Class", positive = "malignant") head(getTaskData) head(getTaskData(task, features = c("Cell.size", "Cell.shape"), recode.target = "-1+1")) head(getTaskData(task, subset = 1:100, recode.target = "01"))
library("mlbench") data(BreastCancer) df = BreastCancer df$Id = NULL task = makeClassifTask(id = "BreastCancer", data = df, target = "Class", positive = "malignant") head(getTaskData) head(getTaskData(task, features = c("Cell.size", "Cell.shape"), recode.target = "-1+1")) head(getTaskData(task, subset = 1:100, recode.target = "01"))
See title.
getTaskDesc(x)
getTaskDesc(x)
x |
ret_taskdesc
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Deprecated, use getTaskDesc instead.
getTaskDescription(x)
getTaskDescription(x)
x |
Target column name is not included.
getTaskFeatureNames(task)
getTaskFeatureNames(task)
task |
(Task) |
(character).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
This is usually simply <target> ~
.
For multilabel it is <target_1> + ... + <target_k> ~
.
getTaskFormula( x, target = getTaskTargetNames(x), explicit.features = FALSE, env = parent.frame() )
getTaskFormula( x, target = getTaskTargetNames(x), explicit.features = FALSE, env = parent.frame() )
x |
|
target |
( |
explicit.features |
( |
env |
(environment) |
(formula).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
See title.
getTaskId(x)
getTaskId(x)
x |
(character(1)
).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
See title.
getTaskNFeats(x)
getTaskNFeats(x)
x |
(integer(1)
).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
See title.
getTaskSize(x)
getTaskSize(x)
x |
(integer(1)
).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
getTaskTargetNames(x)
getTaskTargetNames(x)
x |
(character).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargets()
,
getTaskType()
,
subsetTask()
Get target data of task.
getTaskTargets(task, recode.target = "no")
getTaskTargets(task, recode.target = "no")
task |
(Task) |
recode.target |
( |
A factor
for classification or a numeric
for regression, a data.frame
of logical columns for multilabel.
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskType()
,
subsetTask()
task = makeClassifTask(data = iris, target = "Species") getTaskTargets(task)
task = makeClassifTask(data = iris, target = "Species") getTaskTargets(task)
See title.
getTaskType(x)
getTaskType(x)
x |
(character(1)
).
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
subsetTask()
Returns the optimal hyperparameters and optimization path after training.
getTuneResult(object)
getTuneResult(object)
object |
(WrappedModel) |
(TuneResult).
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Returns the opt.path from a (TuneResult) object.
getTuneResultOptPath(tune.result, as.df = TRUE)
getTuneResultOptPath(tune.result, as.df = TRUE)
tune.result |
(TuneResult) |
as.df |
( |
(ParamHelpers::OptPath) or (data.frame).
Contains the task (gunpoint.task
).
You have to classify whether a person raises up a gun or just an empty hand.
See Ratanamahatana, C. A. & Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong. Proceedings of SIAM International Conference on Data Mining (SDM05), 506-510.
See title.
hasFunctionalFeatures(obj)
hasFunctionalFeatures(obj)
obj |
( |
(logical(1)
)
hasLearnerProperties
instead.Deprecated, use hasLearnerProperties
instead.
hasProperties(learner, props)
hasProperties(learner, props)
learner |
Deprecated. |
props |
Deprecated. |
Interactive function that gives the user quick access to the help pages associated with various functions involved in the given learner.
helpLearner(learner)
helpLearner(learner)
learner |
(Learner | |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Other help:
helpLearnerParam()
Print the description of parameters of a given learner. The description is automatically extracted from the help pages of the learner, so it may be incomplete.
helpLearnerParam(learner, param = NULL)
helpLearnerParam(learner, param = NULL)
learner |
(Learner | |
param |
( |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Other help:
helpLearner()
The built-ins are:
imputeConstant(const)
for imputation using a constant value,
imputeMedian()
for imputation using the median,
imputeMode()
for imputation using the mode,
imputeMin(multiplier)
for imputing constant values shifted below the minimum
using min(x) - multiplier * diff(range(x))
,
imputeMax(multiplier)
for imputing constant values shifted above the maximum
using max(x) + multiplier * diff(range(x))
,
imputeNormal(mean, sd)
for imputation using normally
distributed random values. Mean and standard deviation will be calculated
from the data if not provided.
imputeHist(breaks, use.mids)
for imputation using random values
with probabilities calculated using table
or hist
.
imputeLearner(learner, features = NULL)
for imputations using the response
of a classification or regression learner.
imputeConstant(const) imputeMedian() imputeMean() imputeMode() imputeMin(multiplier = 1) imputeMax(multiplier = 1) imputeUniform(min = NA_real_, max = NA_real_) imputeNormal(mu = NA_real_, sd = NA_real_) imputeHist(breaks, use.mids = TRUE) imputeLearner(learner, features = NULL)
imputeConstant(const) imputeMedian() imputeMean() imputeMode() imputeMin(multiplier = 1) imputeMax(multiplier = 1) imputeUniform(min = NA_real_, max = NA_real_) imputeNormal(mu = NA_real_, sd = NA_real_) imputeHist(breaks, use.mids = TRUE) imputeLearner(learner, features = NULL)
const |
(any) |
multiplier |
( |
min |
( |
max |
( |
mu |
( |
sd |
( |
breaks |
( |
use.mids |
( |
learner |
(Learner | |
features |
(character) |
Other impute:
impute()
,
makeImputeMethod()
,
makeImputeWrapper()
,
reimpute()
Allows imputation of missing feature values through various techniques. Note that you have the possibility to re-impute a data set in the same way as the imputation was performed during training. This especially comes in handy during resampling when one wants to perform the same imputation on the test set as on the training set.
The function impute
performs the imputation on a data set and returns,
alongside with the imputed data set, an “ImputationDesc” object
which can contain “learned” coefficients and helpful data.
It can then be passed together with a new data set to reimpute.
The imputation techniques can be specified for certain features or for feature classes, see function arguments.
You can either provide an arbitrary object, use a built-in imputation method listed under imputations or create one yourself using makeImputeMethod.
impute( obj, target = character(0L), classes = list(), cols = list(), dummy.classes = character(0L), dummy.cols = character(0L), dummy.type = "factor", force.dummies = FALSE, impute.new.levels = TRUE, recode.factor.levels = TRUE )
impute( obj, target = character(0L), classes = list(), cols = list(), dummy.classes = character(0L), dummy.cols = character(0L), dummy.type = "factor", force.dummies = FALSE, impute.new.levels = TRUE, recode.factor.levels = TRUE )
obj |
(data.frame | Task) |
target |
(character) |
classes |
(named list) |
cols |
(named list) |
dummy.classes |
(character) |
dummy.cols |
(character) |
dummy.type |
( |
force.dummies |
( |
impute.new.levels |
( |
recode.factor.levels |
( |
The description object contains these slots
target (character): See argument
features (character): Feature names (column names of data
)
classes (character): Feature classes (storage type of data
)
lvls (named list): Mapping of column names of factor features to their levels, including newly created ones during imputation
impute (named list): Mapping of column names to imputation functions
dummies (named list): Mapping of column names to imputation functions
impute.new.levels (logical(1)
): See argument
recode.factor.levels (logical(1)
): See argument
(list)
data (data.frame): Imputed data.
desc (ImputationDesc
): Description object.
Other impute:
imputations
,
makeImputeMethod()
,
makeImputeWrapper()
,
reimpute()
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3) imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode())) print(imputed$data) reimpute(data.frame(x = NA_real_), imputed$desc)
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3) imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode())) print(imputed$data) reimpute(data.frame(x = NA_real_), imputed$desc)
Such a model is created when one sets the corresponding option in configureMlr.
For complex wrappers this getter returns TRUE
if ANY model contained in it failed.
isFailureModel(model)
isFailureModel(model)
model |
(WrappedModel) |
(logical(1)
).
Join some class existing levels to new, larger class levels for classification problems.
joinClassLevels(task, new.levels)
joinClassLevels(task, new.levels)
task |
(Task) |
new.levels |
( |
Task.
joinClassLevels(iris.task, new.levels = list(foo = c("setosa", "virginica")))
joinClassLevels(iris.task, new.levels = list(foo = c("setosa", "virginica")))
Find all elements in ...
which are not missing and
call control
on them.
learnerArgsToControl(control, ...)
learnerArgsToControl(control, ...)
control |
( |
... |
(any) |
Control structure for learner.
Properties can be accessed with getLearnerProperties(learner)
, which returns a
character vector.
The learner properties are defined as follows:
Can numeric, factor or ordered factor features be handled?
Can an arbitrary number of functional features be handled?
Can exactly one functional feature be handled?
Can missing values in features be handled?
Can observations be weighted during fitting?
Only for classif: Can one-class, two-class or multi-class classification problems be handled?
Only for classif: Can class weights be handled?
Only for surv: Can right, left, or interval censored data be handled?
For classif, cluster, multilabel, surv: Can probabilites be predicted?
Only for regr: Can standard errors be predicted?
Only for classif, regr and surv: Can out of bag predictions be extracted from the trained model?
For classif, regr, surv: Does the model support extracting information on feature importance?
getLearnerProperties(learner) hasLearnerProperties(learner, props)
getLearnerProperties(learner) hasLearnerProperties(learner, props)
learner |
(Learner | |
props |
(character) |
getLearnerProperties
returns a character vector with learner properties.
hasLearnerProperties
returns a logical vector of the same length as props
.
Other learner:
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
All supported learners can be found by listLearners or as a table in the tutorial appendix: https://mlr.mlr-org.com/articles/tutorial/integrated_learners.html.
Returns a subset-able dataframe with filter information.
listFilterEnsembleMethods(desc = TRUE)
listFilterEnsembleMethods(desc = TRUE)
desc |
( |
(data.frame).
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Returns a subset-able dataframe with filter information.
listFilterMethods( desc = TRUE, tasks = FALSE, features = FALSE, include.deprecated = FALSE )
listFilterMethods( desc = TRUE, tasks = FALSE, features = FALSE, include.deprecated = FALSE )
desc |
( |
tasks |
( |
features |
( |
include.deprecated |
( |
(data.frame).
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
This is useful for determining which learner properties are available.
listLearnerProperties(type = "any")
listLearnerProperties(type = "any")
type |
( |
(character).
Returns learning algorithms which have specific characteristics, e.g. whether they support missing values, case weights, etc.
Note that the packages of all learners are loaded during the search if you create them. This can be a lot. If you do not create them we only inspect properties of the S3 classes. This will be a lot faster.
Note that for general cost-sensitive learning, mlr currently supports mainly “wrapper” approaches like CostSensWeightedPairsWrapper, which are not listed, as they are not basic R learning algorithms. The same applies for many multilabel methods, see, e.g., makeMultilabelBinaryRelevanceWrapper.
listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## Default S3 method: listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## S3 method for class 'character' listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## S3 method for class 'Task' listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = TRUE, create = FALSE )
listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## Default S3 method: listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## S3 method for class 'character' listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = FALSE, create = FALSE ) ## S3 method for class 'Task' listLearners( obj = NA_character_, properties = character(0L), quiet = TRUE, warn.missing.packages = TRUE, check.packages = TRUE, create = FALSE )
obj |
( |
properties |
(character) |
quiet |
( |
warn.missing.packages |
( |
check.packages |
( |
create |
( |
([data.frame|
list' of Learner).
Either a descriptive data.frame that allows access to all properties of the learners
or a list of created learner objects (named by ids of listed learners).
## Not run: listLearners("classif", properties = c("multiclass", "prob")) data = iris task = makeClassifTask(data = data, target = "Species") listLearners(task) ## End(Not run)
## Not run: listLearners("classif", properties = c("multiclass", "prob")) data = iris task = makeClassifTask(data = data, target = "Species") listLearners(task) ## End(Not run)
This is useful for determining which measure properties are available.
listMeasureProperties()
listMeasureProperties()
(character).
Returns the matching measures which have specific characteristics, e.g. whether they supports classification or regression.
listMeasures(obj, properties = character(0L), create = FALSE) ## Default S3 method: listMeasures(obj, properties = character(0L), create = FALSE) ## S3 method for class 'character' listMeasures(obj, properties = character(0L), create = FALSE) ## S3 method for class 'Task' listMeasures(obj, properties = character(0L), create = FALSE)
listMeasures(obj, properties = character(0L), create = FALSE) ## Default S3 method: listMeasures(obj, properties = character(0L), create = FALSE) ## S3 method for class 'character' listMeasures(obj, properties = character(0L), create = FALSE) ## S3 method for class 'Task' listMeasures(obj, properties = character(0L), create = FALSE)
obj |
( |
properties |
(character) |
create |
( |
([character|
list' of Measure). Class names of matching
measures or instantiated objects.
Returns a character vector with each of the supported task types in mlr.
listTaskTypes()
listTaskTypes()
(character).
Contains the task (lung.task
).
See survival::lung. Incomplete cases have been removed from the task.
This is an advanced feature of mlr. It gives access to some inner workings so the result might not be compatible with everything!
makeAggregation(id, name = id, properties, fun)
makeAggregation(id, name = id, properties, fun)
id |
( |
name |
( |
properties |
(character)
|
fun |
(
|
(Aggregation).
# computes the interquartile range on all performance values test.iqr = makeAggregation( id = "test.iqr", name = "Test set interquartile range", properties = "req.test", fun = function(task, perf.test, perf.train, measure, group, pred) IQR(perf.test) )
# computes the interquartile range on all performance values test.iqr = makeAggregation( id = "test.iqr", name = "Test set interquartile range", properties = "req.test", fun = function(task, perf.test, perf.train, measure, group, pred) IQR(perf.test) )
Fuses a learner with the bagging method
(i.e., similar to what a randomForest
does).
Creates a learner object, which can be
used like any other learner object.
Models can easily be accessed via getLearnerModel.
Bagging is implemented as follows: For each iteration a random data subset is sampled (with or without replacement) and potentially the number of features is also restricted to a random subset. Note that this is usually handled in a slightly different way in the random forest where features are sampled at each tree split).
Prediction works as follows: For classification we do majority voting to create a discrete label and probabilities are predicted by considering the proportions of all predicted labels. For regression the mean value and the standard deviations across predictions is computed.
Note that the passed base learner must always have predict.type = 'response'
,
while the BaggingWrapper can estimate probabilities and standard errors, so it can
be set, e.g., to predict.type = 'prob'
. For this reason, when you call
setPredictType, the type is only set for the BaggingWrapper, not passed
down to the inner learner.
makeBaggingWrapper( learner, bw.iters = 10L, bw.replace = TRUE, bw.size, bw.feats = 1 )
makeBaggingWrapper( learner, bw.iters = 10L, bw.replace = TRUE, bw.size, bw.feats = 1 )
learner |
(Learner | |
bw.iters |
( |
bw.replace |
( |
bw.size |
( |
bw.feats |
( |
Other wrapper:
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Builds regression models that predict for the positive class whether a particular example belongs to it (1) or not (-1).
Probabilities are generated by transforming the predictions with a softmax.
Inspired by WEKA's ClassificationViaRegression (http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/ClassificationViaRegression.html).
makeClassificationViaRegressionWrapper(learner, predict.type = "response")
makeClassificationViaRegressionWrapper(learner, predict.type = "response")
learner |
(Learner | |
predict.type |
( |
Other wrapper:
makeBaggingWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
lrn = makeLearner("regr.rpart") lrn = makeClassificationViaRegressionWrapper(lrn) mod = train(lrn, sonar.task, subset = 1:140) predictions = predict(mod, newdata = getTaskData(sonar.task)[141:208, 1:60])
lrn = makeLearner("regr.rpart") lrn = makeClassificationViaRegressionWrapper(lrn) mod = train(lrn, sonar.task, subset = 1:140) predictions = predict(mod, newdata = getTaskData(sonar.task)[141:208, 1:60])
Create a classification task.
makeClassifTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, positive = NA_character_, fixup.data = "warn", check.data = TRUE )
makeClassifTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, positive = NA_character_, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
positive |
( |
fixup.data |
( |
check.data |
( |
Task CostSensTask ClusterTask MultilabelTask RegrTask SurvTask
Create a cluster task.
makeClusterTask( id = deparse(substitute(data)), data, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
makeClusterTask( id = deparse(substitute(data)), data, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
Task ClassifTask CostSensTask MultilabelTask RegrTask SurvTask
If the training data contains only a single class (or almost only a single class), this wrapper creates a model that always predicts the constant class in the training data. In all other cases, the underlying learner is trained and the resulting model used for predictions.
Probabilities can be predicted and will be 1 or 0 depending on whether the label matches the majority class or not.
makeConstantClassWrapper(learner, frac = 0)
makeConstantClassWrapper(learner, frac = 0)
learner |
(Learner | |
frac |
|
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Creates a cost measure for non-standard classification error costs.
makeCostMeasure( id = "costs", minimize = TRUE, costs, combine = mean, best = NULL, worst = NULL, name = id, note = "" )
makeCostMeasure( id = "costs", minimize = TRUE, costs, combine = mean, best = NULL, worst = NULL, name = id, note = "" )
id |
( |
minimize |
( |
costs |
(matrix) |
combine |
( |
best |
( |
worst |
( |
name |
(character) |
note |
(character) |
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Creates a wrapper, which can be used like any other learner object. The classification model can easily be accessed via getLearnerModel.
This is a very naive learner, where the costs are transformed into classification labels - the label for each case is the name of class with minimal costs. (If ties occur, the label which is better on average w.r.t. costs over all training data is preferred.) Then the classifier is fitted to that data and subsequently used for prediction.
makeCostSensClassifWrapper(learner)
makeCostSensClassifWrapper(learner)
learner |
(Learner | |
Other costsens:
makeCostSensRegrWrapper()
,
makeCostSensTask()
,
makeCostSensWeightedPairsWrapper()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
For each class in the task, an individual regression model is fitted for the costs of that class. During prediction, the class with the lowest predicted costs is selected.
makeCostSensRegrWrapper(learner)
makeCostSensRegrWrapper(learner)
learner |
(Learner | |
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensTask()
,
makeCostSensWeightedPairsWrapper()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create a cost-sensitive classification task.
makeCostSensTask( id = deparse(substitute(data)), data, costs, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
makeCostSensTask( id = deparse(substitute(data)), data, costs, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
costs |
(data.frame) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
Task ClassifTask ClusterTask MultilabelTask RegrTask SurvTask
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeCostSensWeightedPairsWrapper()
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
For each pair of labels, we fit a binary classifier. For each observation we define the label to be the element of the pair with minimal costs. During fitting, we also weight the observation with the absolute difference in costs. Prediction is performed by simple voting.
This approach is sometimes called cost-sensitive one-vs-one (CS-OVO), because it is obviously very similar to the one-vs-one approach where one reduces a normal multi-class problem to multiple binary ones and aggregates by voting.
makeCostSensWeightedPairsWrapper(learner)
makeCostSensWeightedPairsWrapper(learner)
learner |
(Learner | |
(Learner).
Lin, HT.: Reduction from Cost-sensitive Multiclass Classification to One-versus-one Binary Classification. In: Proceedings of the Sixth Asian Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol 39, pp. 371-386. JMLR W&CP (2014). https://proceedings.mlr.press/v39/lin14.pdf
Other costsens:
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeCostSensTask()
Construct your own performance measure, used after resampling. Note that
individual training / test set performance values will be set to NA
, you
only calculate an aggregated value. If you can define a function that makes
sense for every single training / test set, implement your own Measure.
makeCustomResampledMeasure( measure.id, aggregation.id, minimize = TRUE, properties = character(0L), fun, extra.args = list(), best = NULL, worst = NULL, measure.name = measure.id, aggregation.name = aggregation.id, note = "" )
makeCustomResampledMeasure( measure.id, aggregation.id, minimize = TRUE, properties = character(0L), fun, extra.args = list(), best = NULL, worst = NULL, measure.name = measure.id, aggregation.name = aggregation.id, note = "" )
measure.id |
( |
aggregation.id |
( |
minimize |
( |
properties |
(character) |
fun |
( |
extra.args |
(list) |
best |
( |
worst |
( |
measure.name |
( |
aggregation.name |
( |
note |
(character) |
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
Creates a learner object, which can be used like any other learner object. It will only be trained on a subset of the original data to save computational time.
makeDownsampleWrapper(learner, dw.perc = 1, dw.stratify = FALSE)
makeDownsampleWrapper(learner, dw.perc = 1, dw.stratify = FALSE)
learner |
(Learner | |
dw.perc |
( |
dw.stratify |
( |
Other downsample:
downsample()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuses a base learner with the dummy feature creator (see createDummyFeatures). Returns a learner which can be used like any other learner.
makeDummyFeaturesWrapper(learner, method = "1-of-n", cols = NULL)
makeDummyFeaturesWrapper(learner, method = "1-of-n", cols = NULL)
learner |
(Learner | |
method |
(
Default is “1-of-n”. |
cols |
(character) |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
This can be used to implement custom FDA feature extraction.
Takes a learn
and a reextract
function along with some optional
parameters to those as argument.
makeExtractFDAFeatMethod(learn, reextract, args = list(), par.set = NULL)
makeExtractFDAFeatMethod(learn, reextract, args = list(), par.set = NULL)
learn |
(
|
reextract |
( |
args |
(list) |
par.set |
(ParamSet) |
Other fda:
extractFDAFeatures()
,
makeExtractFDAFeatsWrapper()
Fuses a base learner with an extractFDAFeatures method. Creates a learner object, which can be used like any other learner object. Internally uses extractFDAFeatures before training the learner and reextractFDAFeatures before predicting.
makeExtractFDAFeatsWrapper(learner, feat.methods = list())
makeExtractFDAFeatsWrapper(learner, feat.methods = list())
learner |
(Learner | |
feat.methods |
(named list) |
Other fda:
extractFDAFeatures()
,
makeExtractFDAFeatMethod()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuses a base learner with a search strategy to select variables. Creates a learner object, which can be used like any other learner object, but which internally uses selectFeatures. If the train function is called on it, the search strategy and resampling are invoked to select an optimal set of variables. Finally, a model is fitted on the complete training data with these variables and returned. See selectFeatures for more details.
After training, the optimal features (and other related information) can be retrieved with getFeatSelResult.
makeFeatSelWrapper( learner, resampling, measures, bit.names, bits.to.features, control, show.info = getMlrOption("show.info") )
makeFeatSelWrapper( learner, resampling, measures, bit.names, bits.to.features, control, show.info = getMlrOption("show.info") )
learner |
(Learner | |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
bit.names |
character |
bits.to.features |
( |
control |
[see FeatSelControl) Control object for search method. Also selects the optimization algorithm for feature selection. |
show.info |
( |
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
getFeatSelResult()
,
selectFeatures()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
# nested resampling with feature selection (with a nonsense algorithm for selection) outer = makeResampleDesc("CV", iters = 2L) inner = makeResampleDesc("Holdout") ctrl = makeFeatSelControlRandom(maxit = 1) lrn = makeFeatSelWrapper("classif.ksvm", resampling = inner, control = ctrl) # we also extract the selected features for all iteration here r = resample(lrn, iris.task, outer, extract = getFeatSelResult)
# nested resampling with feature selection (with a nonsense algorithm for selection) outer = makeResampleDesc("CV", iters = 2L) inner = makeResampleDesc("Holdout") ctrl = makeFeatSelControlRandom(maxit = 1) lrn = makeFeatSelWrapper("classif.ksvm", resampling = inner, control = ctrl) # we also extract the selected features for all iteration here r = resample(lrn, iris.task, outer, extract = getFeatSelResult)
Creates and registers custom feature filters. Implemented filters
can be listed with listFilterMethods. Additional
documentation for the fun
parameter specific to each filter can
be found in the description.
makeFilter(name, desc, pkg, supported.tasks, supported.features, fun)
makeFilter(name, desc, pkg, supported.tasks, supported.features, fun)
name |
( |
desc |
( |
pkg |
( |
supported.tasks |
(character) |
supported.features |
(character) |
fun |
( |
Object of class “Filter”.
Kira, Kenji and Rendell, Larry (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings.
Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55.
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
plotFilterValues()
Creates and registers custom ensemble feature filters. Implemented ensemble filters
can be listed with listFilterEnsembleMethods. Additional
documentation for the fun
parameter specific to each filter can
be found in the description.
makeFilterEnsemble(name, base.methods, desc, fun)
makeFilterEnsemble(name, base.methods, desc, fun)
name |
( |
base.methods |
the base filter methods which the ensemble method will use. |
desc |
( |
fun |
( |
Object of class “FilterEnsemble”.
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterWrapper()
,
plotFilterValues()
Fuses a base learner with a filter method. Creates a learner object, which can be used like any other learner object. Internally uses filterFeatures before every model fit.
makeFilterWrapper( learner, fw.method = "FSelectorRcpp_information.gain", fw.base.methods = NULL, fw.perc = NULL, fw.abs = NULL, fw.threshold = NULL, fw.fun = NULL, fw.fun.args = NULL, fw.mandatory.feat = NULL, cache = FALSE, ... )
makeFilterWrapper( learner, fw.method = "FSelectorRcpp_information.gain", fw.base.methods = NULL, fw.perc = NULL, fw.abs = NULL, fw.threshold = NULL, fw.fun = NULL, fw.fun.args = NULL, fw.mandatory.feat = NULL, cache = FALSE, ... )
learner |
(Learner | |
fw.method |
( |
fw.base.methods |
( |
fw.perc |
( |
fw.abs |
( |
fw.threshold |
( |
fw.fun |
( |
fw.fun.args |
(any) |
fw.mandatory.feat |
(character) |
cache |
( |
... |
(any) |
If ensemble = TRUE
, ensemble feature selection using all methods specified
in fw.method
is performed. At least two methods need to be selected.
After training, the selected features can be retrieved with getFilteredFeatures.
Note that observation weights do not influence the filtering and are simply passed down to the next learner.
If cache = TRUE
, the default mlr cache directory is used to cache filter
values. The directory is operating system dependent and can be checked with
getCacheDir()
. Alternatively a custom directory can be passed to store
the cache. The cache can be cleared with deleteCacheDir()
. Caching is
disabled by default. Care should be taken when operating on large clusters
due to possible write conflicts to disk if multiple workers try to write
the same cache at the same time.
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
plotFilterValues()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda") inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeFilterWrapper(lrn, fw.perc = 0.5) mod = train(lrn, task) print(getFilteredFeatures(mod)) # now nested resampling, where we extract the features that the filter method selected r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract) # usage of an ensemble filter lrn = makeLearner("classif.lda") lrn = makeFilterWrapper(lrn, fw.method = "E-Borda", fw.base.methods = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"), fw.perc = 0.5) r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract) # usage of a custom thresholding function biggest_gap = function(values, diff) { gap_size = 0 gap_location = 0 for (i in (diff + 1):length(values)) { gap = values[[i - diff]] - values[[i]] if (gap > gap_size) { gap_size = gap gap_location = i - 1 } } return(gap_location) } lrn = makeLearner("classif.lda") lrn = makeFilterWrapper(lrn, fw.method = "FSelectorRcpp_information.gain", fw.fun = biggest_gap, fw.fun.args = list("diff" = 1)) r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract)
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda") inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeFilterWrapper(lrn, fw.perc = 0.5) mod = train(lrn, task) print(getFilteredFeatures(mod)) # now nested resampling, where we extract the features that the filter method selected r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract) # usage of an ensemble filter lrn = makeLearner("classif.lda") lrn = makeFilterWrapper(lrn, fw.method = "E-Borda", fw.base.methods = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"), fw.perc = 0.5) r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract) # usage of a custom thresholding function biggest_gap = function(values, diff) { gap_size = 0 gap_location = 0 for (i in (diff + 1):length(values)) { gap = values[[i - diff]] - values[[i]] if (gap > gap_size) { gap_size = gap gap_location = i - 1 } } return(gap_location) } lrn = makeLearner("classif.lda") lrn = makeFilterWrapper(lrn, fw.method = "FSelectorRcpp_information.gain", fw.fun = biggest_gap, fw.fun.args = list("diff" = 1)) r = resample(lrn, task, outer, extract = function(model) { getFilteredFeatures(model) }) print(r$extract)
Generate a fixed holdout instance for resampling.
makeFixedHoldoutInstance(train.inds, test.inds, size)
makeFixedHoldoutInstance(train.inds, test.inds, size)
train.inds |
(integer) |
test.inds |
(integer) |
size |
( |
To work with functional features, those features need to be
stored as a matrix
column in the data.frame, so mlr
can automatically
recognize them as functional features.
This function allows for an easy conversion from a data.frame with numeric columns
to the required format. If the data already contains matrix columns, they are left as-is
if not specified otherwise in fd.features
. See Examples
for the structure
of the generated output.
makeFunctionalData(data, fd.features = NULL, exclude.cols = NULL)
makeFunctionalData(data, fd.features = NULL, exclude.cols = NULL)
data |
(data.frame) |
fd.features |
(list) |
exclude.cols |
(character | integer) |
(data.frame).
# data.frame where columns 1:6 and 8:10 belong to a functional feature d1 = data.frame(matrix(rnorm(100), nrow = 10), "target" = seq_len(10)) # Transform to functional data d2 = makeFunctionalData(d1, fd.features = list("fd1" = 1:6, "fd2" = 8:10)) # Create a regression task makeRegrTask(data = d2, target = "target")
# data.frame where columns 1:6 and 8:10 belong to a functional feature d1 = data.frame(matrix(rnorm(100), nrow = 10), "target" = seq_len(10)) # Transform to functional data d2 = makeFunctionalData(d1, fd.features = list("fd1" = 1:6, "fd2" = 8:10)) # Create a regression task makeRegrTask(data = d2, target = "target")
This is a constructor to create your own imputation methods.
makeImputeMethod(learn, impute, args = list())
makeImputeMethod(learn, impute, args = list())
learn |
( |
impute |
( |
args |
(list) |
Other impute:
imputations
,
impute()
,
makeImputeWrapper()
,
reimpute()
Fuses a base learner with an imputation method. Creates a learner object, which can be used like any other learner object. Internally uses impute before training the learner and reimpute before predicting.
makeImputeWrapper( learner, classes = list(), cols = list(), dummy.classes = character(0L), dummy.cols = character(0L), dummy.type = "factor", force.dummies = FALSE, impute.new.levels = TRUE, recode.factor.levels = TRUE )
makeImputeWrapper( learner, classes = list(), cols = list(), dummy.classes = character(0L), dummy.cols = character(0L), dummy.type = "factor", force.dummies = FALSE, impute.new.levels = TRUE, recode.factor.levels = TRUE )
learner |
(Learner | |
classes |
(named list) |
cols |
(named list) |
dummy.classes |
(character) |
dummy.cols |
(character) |
dummy.type |
( |
force.dummies |
( |
impute.new.levels |
( |
recode.factor.levels |
( |
Other impute:
imputations
,
impute()
,
makeImputeMethod()
,
reimpute()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
For a classification learner the predict.type
can be set to
“prob” to predict probabilities and the maximum value selects the
label. The threshold used to assign the label can later be changed using the
setThreshold function.
To see all possible properties of a learner, go to: LearnerProperties.
makeLearner( cl, id = cl, predict.type = "response", predict.threshold = NULL, fix.factors.prediction = FALSE, ..., par.vals = list(), config = list() )
makeLearner( cl, id = cl, predict.type = "response", predict.threshold = NULL, fix.factors.prediction = FALSE, ..., par.vals = list(), config = list() )
cl |
( |
id |
( |
predict.type |
( |
predict.threshold |
(numeric) |
fix.factors.prediction |
( |
... |
(any) |
par.vals |
(list) |
config |
(named list) |
(Learner).
par.vals
vs. ...
The former aims at specifying default hyperparameter settings from mlr
which differ from the actual defaults in the underlying learner. For
example, respect.unordered.factors
is set to order
in mlr
while the
default in ranger::ranger depends on the argument splitrule
.
getHyperPars(<learner>)
can be used to query hyperparameter defaults that
differ from the underlying learner. This function also shows all
hyperparameters set by the user during learner creation (if these differ
from the learner defaults).
For this learner we added additional uncertainty estimation functionality
(predict.type = "se"
) for the randomForest, which is not provided by the
underlying package.
Currently implemented methods are:
If se.method = "jackknife"
the standard error of a prediction is
estimated by computing the jackknife-after-bootstrap, the mean-squared
difference between the prediction made by only using trees which did not
contain said observation and the ensemble prediction.
If se.method = "bootstrap"
the standard error of a prediction is
estimated by bootstrapping the random forest, where the number of bootstrap
replicates and the number of trees in the ensemble are controlled by
se.boot
and se.ntree
respectively, and then taking the standard deviation
of the bootstrap predictions. The "brute force" bootstrap is executed when
ntree = se.ntree
, the latter of which controls the number of trees in the
individual random forests which are bootstrapped. The "noisy bootstrap" is
executed when se.ntree < ntree
which is less computationally expensive. A
Monte-Carlo bias correction may make the latter option preferable in many
cases. Defaults are se.boot = 50
and se.ntree = 100
.
If se.method = "sd"
, the default, the standard deviation of the
predictions across trees is returned as the variance estimate. This can be
computed quickly but is also a very naive estimator.
For both “jackknife” and “bootstrap”, a Monte-Carlo bias correction is applied and, in the case that this results in a negative variance estimate, the values are truncated at 0.
Note that when using the “jackknife” procedure for se estimation, using a small number of trees can lead to training data observations that are never out-of-bag. The current implementation ignores these observations, but in the original definition, the resulting se estimation would be undefined.
Please note that all of the mentioned se.method
variants do not affect the
computation of the posterior mean “response” value. This is always the
same as from the underlying randomForest.
A very basic baseline method which is useful for model comparisons (if you don't beat this, you very likely have a problem). Does not consider any features of the task and only uses the target feature of the training data to make predictions. Using observation weights is currently not supported.
Methods “mean” and “median” always predict a constant value for each new observation which corresponds to the observed mean or median of the target feature in training data, respectively.
The default method is “mean” which corresponds to the ZeroR algorithm from WEKA.
Method “majority” predicts always the majority class for each new observation. In the case of ties, one randomly sampled, constant class is predicted for all observations in the test set. This method is used as the default. It is very similar to the ZeroR classifier from WEKA. The only difference is that ZeroR always predicts the first class of the tied class values instead of sampling them randomly.
Method “sample-prior” always samples a random class for each individual test observation according to the prior probabilities observed in the training data.
If you opt to predict probabilities, the class probabilities always correspond to the prior probabilities observed in the training data.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
makeLearner("classif.rpart") makeLearner("classif.lda", predict.type = "prob") lrn = makeLearner("classif.lda", method = "t", nu = 10) getHyperPars(lrn)
makeLearner("classif.rpart") makeLearner("classif.lda", predict.type = "prob") lrn = makeLearner("classif.lda", method = "t", nu = 10) getHyperPars(lrn)
Small helper function that can save some typing when creating mutiple learner objects. Calls makeLearner multiple times internally.
makeLearners(cls, ids = NULL, type = NULL, ...)
makeLearners(cls, ids = NULL, type = NULL, ...)
cls |
(character) |
ids |
(character) |
type |
( |
... |
(any) |
(named list of Learner). Named by ids
.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
makeLearners(c("rpart", "lda"), type = "classif", predict.type = "prob")
makeLearners(c("rpart", "lda"), type = "classif", predict.type = "prob")
A measure object encapsulates a function to evaluate the performance of a prediction. Information about already implemented measures can be obtained here: measures.
A learner is trained on a training set d1, results in a model m and predicts another set d2 (which may be a different one or the training set) resulting in the prediction. The performance measure can now be defined using all of the information of the original task, the fitted model and the prediction.
makeMeasure( id, minimize, properties = character(0L), fun, extra.args = list(), aggr = test.mean, best = NULL, worst = NULL, name = id, note = "" )
makeMeasure( id, minimize, properties = character(0L), fun, extra.args = list(), aggr = test.mean, best = NULL, worst = NULL, name = id, note = "" )
id |
( |
minimize |
( |
properties |
(character) Default is |
fun |
( |
extra.args |
(list) |
aggr |
(Aggregation) |
best |
( |
worst |
( |
name |
(character) |
note |
(character) |
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
measures
,
performance()
,
setAggregation()
,
setMeasurePars()
f = function(task, model, pred, extra.args) { sum((pred$data$response - pred$data$truth)^2) } makeMeasure(id = "my.sse", minimize = TRUE, properties = c("regr", "response"), fun = f)
f = function(task, model, pred, extra.args) { sum((pred$data$response - pred$data$truth)^2) } makeMeasure(id = "my.sse", minimize = TRUE, properties = c("regr", "response"), fun = f)
Combines multiple base learners by dispatching on the hyperparameter “selected.learner” to a specific model class. This allows to tune not only the model class (SVM, random forest, etc) but also their hyperparameters in one go. Combine this with tuneParams and makeTuneControlIrace for a very powerful approach, see example below.
The parameter set is the union of all (unique) base learners. In order to
avoid name clashes all parameter names are prefixed with the base learner id,
i.e. learnerId.parameterName
.
The predict.type of the Multiplexer is inherited from the predict.type of the base learners.
The getter getLearnerProperties returns the properties of the selected base learner.
makeModelMultiplexer(base.learners)
makeModelMultiplexer(base.learners)
base.learners |
([list' of Learner) |
(ModelMultiplexer). A Learner specialized as ModelMultiplexer
.
Note that logging output during tuning is somewhat shortened to make it more readable. I.e., the artificial prefix before parameter names is suppressed.
Other multiplexer:
makeModelMultiplexerParamSet()
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
set.seed(123) library(BBmisc) bls = list( makeLearner("classif.ksvm"), makeLearner("classif.randomForest") ) lrn = makeModelMultiplexer(bls) # simple way to contruct param set for tuning # parameter names are prefixed automatically and the 'requires' # element is set, too, to make all paramaters subordinate to 'selected.learner' ps = makeModelMultiplexerParamSet(lrn, makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x), makeIntegerParam("ntree", lower = 1L, upper = 500L) ) print(ps) rdesc = makeResampleDesc("CV", iters = 2L) # to save some time we use random search. but you probably want something like this: # ctrl = makeTuneControlIrace(maxExperiments = 500L) ctrl = makeTuneControlRandom(maxit = 10L) res = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl) print(res) df = as.data.frame(res$opt.path) print(head(df[, -ncol(df)])) # more unique and reliable way to construct the param set ps = makeModelMultiplexerParamSet(lrn, classif.ksvm = makeParamSet( makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x) ), classif.randomForest = makeParamSet( makeIntegerParam("ntree", lower = 1L, upper = 500L) ) ) # this is how you would construct the param set manually, works too ps = makeParamSet( makeDiscreteParam("selected.learner", values = extractSubList(bls, "id")), makeNumericParam("classif.ksvm.sigma", lower = -10, upper = 10, trafo = function(x) 2^x, requires = quote(selected.learner == "classif.ksvm")), makeIntegerParam("classif.randomForest.ntree", lower = 1L, upper = 500L, requires = quote(selected.learner == "classif.randomForst")) ) # all three ps-objects are exactly the same internally.
set.seed(123) library(BBmisc) bls = list( makeLearner("classif.ksvm"), makeLearner("classif.randomForest") ) lrn = makeModelMultiplexer(bls) # simple way to contruct param set for tuning # parameter names are prefixed automatically and the 'requires' # element is set, too, to make all paramaters subordinate to 'selected.learner' ps = makeModelMultiplexerParamSet(lrn, makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x), makeIntegerParam("ntree", lower = 1L, upper = 500L) ) print(ps) rdesc = makeResampleDesc("CV", iters = 2L) # to save some time we use random search. but you probably want something like this: # ctrl = makeTuneControlIrace(maxExperiments = 500L) ctrl = makeTuneControlRandom(maxit = 10L) res = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl) print(res) df = as.data.frame(res$opt.path) print(head(df[, -ncol(df)])) # more unique and reliable way to construct the param set ps = makeModelMultiplexerParamSet(lrn, classif.ksvm = makeParamSet( makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x) ), classif.randomForest = makeParamSet( makeIntegerParam("ntree", lower = 1L, upper = 500L) ) ) # this is how you would construct the param set manually, works too ps = makeParamSet( makeDiscreteParam("selected.learner", values = extractSubList(bls, "id")), makeNumericParam("classif.ksvm.sigma", lower = -10, upper = 10, trafo = function(x) 2^x, requires = quote(selected.learner == "classif.ksvm")), makeIntegerParam("classif.randomForest.ntree", lower = 1L, upper = 500L, requires = quote(selected.learner == "classif.randomForst")) ) # all three ps-objects are exactly the same internally.
Handy way to create the param set with less typing.
The following is done automatically:
The selected.learner
param is created
Parameter names are prefixed.
The requires
field of each param is set.
This makes all parameters subordinate to selected.learner
makeModelMultiplexerParamSet(multiplexer, ..., .check = TRUE)
makeModelMultiplexerParamSet(multiplexer, ..., .check = TRUE)
multiplexer |
(ModelMultiplexer) |
... |
(ParamHelpers::ParamSet | ParamHelpers::Param) |
.check |
(logical) |
Other multiplexer:
makeModelMultiplexer()
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
# See makeModelMultiplexer
# See makeModelMultiplexer
Fuses a base learner with a multi-class method. Creates a learner object, which can be used like any other learner object. This way learners which can only handle binary classification will be able to handle multi-class problems, too.
We use a multiclass-to-binary reduction principle, where multiple binary problems are created from the multiclass task. How these binary problems are generated is defined by an error-correcting-output-code (ECOC) code book. This also allows the simple and well-known one-vs-one and one-vs-rest approaches. Decoding is currently done via Hamming decoding, see e.g. here https://jmlr.org/papers/volume11/escalera10a/escalera10a.pdf.
Currently, the approach always operates on the discrete predicted labels of the binary base models (instead of their probabilities) and the created wrapper cannot predict posterior probabilities.
makeMulticlassWrapper(learner, mcw.method = "onevsrest")
makeMulticlassWrapper(learner, mcw.method = "onevsrest")
learner |
(Learner | |
mcw.method |
( |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped binary relevance multilabel learner. The multilabel classification problem is converted into simple binary classifications for each label/target on which the binary learner is applied.
Models can easily be accessed via getLearnerModel.
Note that it does not make sense to set a threshold in the used base learner
when you predict probabilities.
On the other hand, it can make a lot of sense, to call setThreshold
on the MultilabelBinaryRelevanceWrapper
for each label indvidually;
Or to tune these thresholds with tuneThreshold; especially when you face very
unabalanced class distributions for each binary label.
makeMultilabelBinaryRelevanceWrapper(learner)
makeMultilabelBinaryRelevanceWrapper(learner)
learner |
(Learner | |
Tsoumakas, G., & Katakis, I. (2006) Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece.
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped classifier chains multilabel learner. CC trains a binary classifier for each label following a given order. In training phase, the feature space of each classifier is extended with true label information of all previous labels in the chain. During the prediction phase, when true labels are not available, they are replaced by predicted labels.
Models can easily be accessed via getLearnerModel.
makeMultilabelClassifierChainsWrapper(learner, order = NULL)
makeMultilabelClassifierChainsWrapper(learner, order = NULL)
learner |
(Learner | |
order |
(character) |
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped DBR multilabel learner. The multilabel classification problem is converted into simple binary classifications for each label/target on which the binary learner is applied. For each target, actual information of all binary labels (except the target variable) is used as additional features. During prediction these labels need are obtained by the binary relevance method using the same binary learner.
Models can easily be accessed via getLearnerModel.
makeMultilabelDBRWrapper(learner)
makeMultilabelDBRWrapper(learner)
learner |
(Learner | |
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped nested stacking multilabel learner. Nested stacking trains a binary classifier for each label following a given order. In training phase, the feature space of each classifier is extended with predicted label information (by cross validation) of all previous labels in the chain. During the prediction phase, predicted labels are obtained by the classifiers, which have been learned on all training data.
Models can easily be accessed via getLearnerModel.
makeMultilabelNestedStackingWrapper(learner, order = NULL, cv.folds = 2)
makeMultilabelNestedStackingWrapper(learner, order = NULL, cv.folds = 2)
learner |
(Learner | |
order |
(character) |
cv.folds |
( |
Montanes, E. et al. (2013), Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelStackingWrapper()
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
Every learner which is implemented in mlr and which supports binary classification can be converted to a wrapped stacking multilabel learner. Stacking trains a binary classifier for each label using predicted label information of all labels (including the target label) as additional features (by cross validation). During prediction these labels need are obtained by the binary relevance method using the same binary learner.
Models can easily be accessed via getLearnerModel.
makeMultilabelStackingWrapper(learner, cv.folds = 2)
makeMultilabelStackingWrapper(learner, cv.folds = 2)
learner |
(Learner | |
cv.folds |
( |
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artificial Intelligence Center, University of Oviedo at Gijon, Spain.
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Other multilabel:
getMultilabelBinaryPerformances()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
if (requireNamespace("rpart")) { d = getTaskData(yeast.task) # drop some labels so example runs faster d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)] task = makeMultilabelTask(data = d, target = c("label1", "label2")) lrn = makeLearner("classif.rpart") lrn = makeMultilabelBinaryRelevanceWrapper(lrn) lrn = setPredictType(lrn, "prob") # train, predict and evaluate mod = train(lrn, task) pred = predict(mod, task) performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1)) # the next call basically has the same structure for any multilabel meta wrapper getMultilabelBinaryPerformances(pred, measures = list(mmce, auc)) # above works also with predictions from resample! }
Create a multilabel task.
makeMultilabelTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
makeMultilabelTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
For multilabel classification we assume that the presence of labels is encoded via logical
columns in data
. The name of the column specifies the name of the label. target
is then a char vector that points to these columns.
For multilabel classification we assume that the presence of labels is encoded via logical
columns in data
. The name of the column specifies the name of the label. target
is then a char vector that points to these columns.
Task ClassifTask ClusterTask CostSensTask RegrTask SurvTask
Fuses a classification learner for binary classification with an over-bagging method for imbalancy correction when we have strongly unequal class sizes. Creates a learner object, which can be used like any other learner object. Models can easily be accessed via getLearnerModel.
OverBagging is implemented as follows: For each iteration a random data subset is sampled. Class examples are oversampled with replacement with a given rate. Members of the other class are either simply copied into each bag, or bootstrapped with replacement until we have as many majority class examples as in the original training data. Features are currently not changed or sampled.
Prediction works as follows: For classification we do majority voting to create a discrete label and probabilities are predicted by considering the proportions of all predicted labels.
makeOverBaggingWrapper( learner, obw.iters = 10L, obw.rate = 1, obw.maxcl = "boot", obw.cl = NULL )
makeOverBaggingWrapper( learner, obw.iters = 10L, obw.rate = 1, obw.maxcl = "boot", obw.cl = NULL )
learner |
(Learner | |
obw.iters |
( |
obw.rate |
( |
obw.maxcl |
( |
obw.cl |
( |
Other imbalancy:
makeUndersampleWrapper()
,
oversample()
,
smote()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuses a base learner with a preprocessing method. Creates a learner object, which can be used like any other learner object, but which internally preprocesses the data as requested. If the train or predict function is called on data / a task, the preprocessing is always performed automatically.
makePreprocWrapper( learner, train, predict, par.set = makeParamSet(), par.vals = list() )
makePreprocWrapper( learner, train, predict, par.set = makeParamSet(), par.vals = list() )
learner |
(Learner | |
train |
( |
predict |
( |
par.set |
(ParamHelpers::ParamSet) |
par.vals |
(list) |
(Learner).
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Fuses a learner with preprocessing methods provided by caret::preProcess. Before training the preprocessing will be performed and the preprocessing model will be stored. Before prediction the preprocessing model will transform the test data according to the trained model.
After being wrapped the learner will support missing values although this will only be the case if ppc.knnImpute
, ppc.bagImpute
or ppc.medianImpute
is set to TRUE
.
makePreprocWrapperCaret(learner, ...)
makePreprocWrapperCaret(learner, ...)
learner |
(Learner | |
... |
(any) |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
Create a regression task.
makeRegrTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
makeRegrTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
Task ClassifTask CostSensTask ClusterTask MultilabelTask SurvTask
Fuses a base learner with the preprocessing implemented in removeConstantFeatures.
makeRemoveConstantFeaturesWrapper( learner, perc = 0, dont.rm = character(0L), na.ignore = FALSE, wrap.tol = .Machine$double.eps^0.5 )
makeRemoveConstantFeaturesWrapper( learner, perc = 0, dont.rm = character(0L), na.ignore = FALSE, wrap.tol = .Machine$double.eps^0.5 )
learner |
(Learner | |
perc |
( |
dont.rm |
(character) |
na.ignore |
( |
wrap.tol |
( |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
A description of a resampling algorithm contains all necessary information to create a ResampleInstance, when given the size of the data set.
makeResampleDesc( method, predict = "test", ..., stratify = FALSE, stratify.cols = NULL, fixed = FALSE, blocking.cv = FALSE )
makeResampleDesc( method, predict = "test", ..., stratify = FALSE, stratify.cols = NULL, fixed = FALSE, blocking.cv = FALSE )
method |
( |
predict |
( |
... |
(any)
|
stratify |
( |
stratify.cols |
(character) |
fixed |
( |
blocking.cv |
( |
Some notes on some special strategies:
Use “RepCV”. Then you have to set the aggregation function for your preferred performance measure to “testgroup.mean” via setAggregation.
Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632” via setAggregation.
Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the aggregation function for your preferred performance measure to “b632plus” via setAggregation.
Object slots:
character(1)
)Name of resampling strategy.
integer(1)
)Number of iterations. Note that this is always the complete number of generated train/test sets, so for a 10-times repeated 5fold cross-validation it would be 50.
character(1)
)See argument.
logical(1)
)See argument.
See arguments.
(ResampleDesc).
For common resampling strategies you can save some typing by using the following description objects:
holdout a.k.a. test sample estimation (two-thirds training set, one-third testing set)
2-fold cross-validation
3-fold cross-validation
5-fold cross-validation
10-fold cross-validation
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleInstance()
,
resample()
# Bootstraping makeResampleDesc("Bootstrap", iters = 10) makeResampleDesc("Bootstrap", iters = 10, predict = "both") # Subsampling makeResampleDesc("Subsample", iters = 10, split = 3 / 4) makeResampleDesc("Subsample", iters = 10) # Holdout a.k.a. test sample estimation makeResampleDesc("Holdout")
# Bootstraping makeResampleDesc("Bootstrap", iters = 10) makeResampleDesc("Bootstrap", iters = 10, predict = "both") # Subsampling makeResampleDesc("Subsample", iters = 10, split = 3 / 4) makeResampleDesc("Subsample", iters = 10) # Holdout a.k.a. test sample estimation makeResampleDesc("Holdout")
This class encapsulates training and test sets generated from the data set for a number of iterations. It mainly stores a set of integer vectors indicating the training and test examples for each iteration.
makeResampleInstance(desc, task, size, ...)
makeResampleInstance(desc, task, size, ...)
desc |
(ResampleDesc | |
task |
(Task) |
size |
(integer) |
... |
(any) |
Object slots:
See argument.
integer(1)
)See argument.
List of of training indices for all iterations.
List of of test indices for all iterations.
Optional grouping of resampling iterations. This encodes whether specific iterations 'belong together' (e.g. repeated CV), and it can later be used to aggregate performance values accordingly. Default is 'factor()'.
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
resample()
rdesc = makeResampleDesc("Bootstrap", iters = 10) rin = makeResampleInstance(rdesc, task = iris.task) rdesc = makeResampleDesc("CV", iters = 50) rin = makeResampleInstance(rdesc, size = nrow(iris)) rin = makeResampleInstance("CV", iters = 10, task = iris.task)
rdesc = makeResampleDesc("Bootstrap", iters = 10) rin = makeResampleInstance(rdesc, task = iris.task) rdesc = makeResampleDesc("CV", iters = 50) rin = makeResampleInstance(rdesc, size = nrow(iris)) rin = makeResampleInstance("CV", iters = 10, task = iris.task)
Learner for classification using Generalized Linear Models.
## S3 method for class 'classif.fdausc.glm' makeRLearner()
## S3 method for class 'classif.fdausc.glm' makeRLearner()
Learner for kernel Classification.
## S3 method for class 'classif.fdausc.kernel' makeRLearner()
## S3 method for class 'classif.fdausc.kernel' makeRLearner()
Learner for Nonparametric Supervised Classification.
## S3 method for class 'classif.fdausc.np' makeRLearner()
## S3 method for class 'classif.fdausc.np' makeRLearner()
Creates a learner object, which can be used like any other learner object. Internally uses smote before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next learner.
makeSMOTEWrapper( learner, sw.rate = 1, sw.nn = 5L, sw.standardize = TRUE, sw.alt.logic = FALSE )
makeSMOTEWrapper( learner, sw.rate = 1, sw.nn = 5L, sw.standardize = TRUE, sw.alt.logic = FALSE )
learner |
(Learner | |
sw.rate |
( |
sw.nn |
( |
sw.standardize |
( |
sw.alt.logic |
( |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
A stacked learner uses predictions of several base learners and fits a super learner using these predictions as features in order to predict the outcome. The following stacking methods are available:
average
Averaging of base learner predictions without weights.
stack.nocv
Fits the super learner, where in-sample predictions of
the base learners are used.
stack.cv
Fits the super learner, where the base learner predictions
are computed by cross-validated predictions (the resampling strategy can be
set via the resampling
argument).
hill.climb
Select a subset of base learner predictions by hill
climbing algorithm.
compress
Train a neural network to compress the model from a
collection of base learners.
makeStackedLearner( base.learners, super.learner = NULL, predict.type = NULL, method = "stack.nocv", use.feat = FALSE, resampling = NULL, parset = list() )
makeStackedLearner( base.learners, super.learner = NULL, predict.type = NULL, method = "stack.nocv", use.feat = FALSE, resampling = NULL, parset = list() )
base.learners |
((list of) Learner) |
super.learner |
(Learner | character(1)) |
predict.type |
(
|
method |
( |
use.feat |
( |
resampling |
(ResampleDesc) |
parset |
the parameters for
the parameters for
|
# Classification data(iris) tsk = makeClassifTask(data = iris, target = "Species") base = c("classif.rpart", "classif.lda", "classif.svm") lrns = lapply(base, makeLearner) lrns = lapply(lrns, setPredictType, "prob") m = makeStackedLearner(base.learners = lrns, predict.type = "prob", method = "hill.climb") tmp = train(m, tsk) res = predict(tmp, tsk) # Regression data(BostonHousing, package = "mlbench") tsk = makeRegrTask(data = BostonHousing, target = "medv") base = c("regr.rpart", "regr.svm") lrns = lapply(base, makeLearner) m = makeStackedLearner(base.learners = lrns, predict.type = "response", method = "compress") tmp = train(m, tsk) res = predict(tmp, tsk)
# Classification data(iris) tsk = makeClassifTask(data = iris, target = "Species") base = c("classif.rpart", "classif.lda", "classif.svm") lrns = lapply(base, makeLearner) lrns = lapply(lrns, setPredictType, "prob") m = makeStackedLearner(base.learners = lrns, predict.type = "prob", method = "hill.climb") tmp = train(m, tsk) res = predict(tmp, tsk) # Regression data(BostonHousing, package = "mlbench") tsk = makeRegrTask(data = BostonHousing, target = "medv") base = c("regr.rpart", "regr.svm") lrns = lapply(base, makeLearner) m = makeStackedLearner(base.learners = lrns, predict.type = "response", method = "compress") tmp = train(m, tsk) res = predict(tmp, tsk)
Create a survival task.
makeSurvTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
makeSurvTask( id = deparse(substitute(data)), data, target, weights = NULL, blocking = NULL, coordinates = NULL, fixup.data = "warn", check.data = TRUE )
id |
( |
data |
(data.frame) |
target |
( |
weights |
(numeric) |
blocking |
(factor) |
coordinates |
(data.frame) |
fixup.data |
( |
check.data |
( |
Task ClassifTask ClusterTask CostSensTask MultilabelTask RegrTask
CMA Evolution Strategy with method cmaes::cma_es. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded. The sigma variance parameter is initialized to 1/4 of the span of box-constraints per parameter dimension.
makeTuneControlCMAES( same.resampling.instance = TRUE, impute.val = NULL, start = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
makeTuneControlCMAES( same.resampling.instance = TRUE, impute.val = NULL, start = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Completely pre-specifiy a data.frame
of design points to be evaluated
during tuning. All kinds of parameter types can be handled.
makeTuneControlDesign( same.resampling.instance = TRUE, impute.val = NULL, design = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" )
makeTuneControlDesign( same.resampling.instance = TRUE, impute.val = NULL, design = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default" )
same.resampling.instance |
( |
impute.val |
(numeric) |
design |
(data.frame) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Generalized simulated annealing with method GenSA::GenSA. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded.
makeTuneControlGenSA( same.resampling.instance = TRUE, impute.val = NULL, start = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
makeTuneControlGenSA( same.resampling.instance = TRUE, impute.val = NULL, start = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
A basic grid search can handle all kinds of parameter types.
You can either use their correct param type and resolution
,
or discretize them yourself by always using ParamHelpers::makeDiscreteParam
in the par.set
passed to tuneParams.
makeTuneControlGrid( same.resampling.instance = TRUE, impute.val = NULL, resolution = 10L, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL )
makeTuneControlGrid( same.resampling.instance = TRUE, impute.val = NULL, resolution = 10L, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL )
same.resampling.instance |
( |
impute.val |
(numeric) |
resolution |
(integer) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Tuning with iterated F-Racing with method irace::irace. All
kinds of parameter types can be handled. We return the best of the final
elite candidates found by irace in the last race. Its estimated performance
is the mean of all evaluations ever done for that candidate. More information
on irace can be found in package vignette: vignette("irace-package", package = "irace")
For resampling you have to pass a ResampleDesc, not a ResampleInstance.
The resampling strategy is randomly instantiated n.instances
times and
these are the instances in the sense of irace (instances
element of
tunerConfig
in irace::irace). Also note that irace will always store its
tuning results in a file on disk, see the package documentation for details
on this and how to change the file path.
makeTuneControlIrace( impute.val = NULL, n.instances = 100L, show.irace.output = FALSE, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
makeTuneControlIrace( impute.val = NULL, n.instances = 100L, show.irace.output = FALSE, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL, ... )
impute.val |
(numeric) |
n.instances |
( |
show.irace.output |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
... |
(any) |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Model-based / Bayesian optimization with the function mlrMBO::mbo from the mlrMBO package. Please refer to https://github.com/mlr-org/mlrMBO for further info.
makeTuneControlMBO( same.resampling.instance = TRUE, impute.val = NULL, learner = NULL, mbo.control = NULL, tune.threshold = FALSE, tune.threshold.args = list(), continue = FALSE, log.fun = "default", final.dw.perc = NULL, budget = NULL, mbo.design = NULL )
makeTuneControlMBO( same.resampling.instance = TRUE, impute.val = NULL, learner = NULL, mbo.control = NULL, tune.threshold = FALSE, tune.threshold.args = list(), continue = FALSE, log.fun = "default", final.dw.perc = NULL, budget = NULL, mbo.design = NULL )
same.resampling.instance |
( |
impute.val |
(numeric) |
learner |
(Learner | |
mbo.control |
(mlrMBO::MBOControl | |
tune.threshold |
( |
tune.threshold.args |
(list) |
continue |
( |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
mbo.design |
(data.frame | |
Bernd Bischl, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas and Michel Lang; mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions, Preprint: https://arxiv.org/abs/1703.03373 (2017).
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Random search. All kinds of parameter types can be handled.
makeTuneControlRandom( same.resampling.instance = TRUE, maxit = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL )
makeTuneControlRandom( same.resampling.instance = TRUE, maxit = NULL, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default", final.dw.perc = NULL, budget = NULL )
same.resampling.instance |
( |
maxit |
( |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
Fuses a base learner with a search strategy to select its hyperparameters. Creates a learner object, which can be used like any other learner object, but which internally uses tuneParams. If the train function is called on it, the search strategy and resampling are invoked to select an optimal set of hyperparameter values. Finally, a model is fitted on the complete training data with these optimal hyperparameters and returned. See tuneParams for more details.
After training, the optimal hyperparameters (and other related information) can be retrieved with getTuneResult.
makeTuneWrapper( learner, resampling, measures, par.set, control, show.info = getMlrOption("show.info") )
makeTuneWrapper( learner, resampling, measures, par.set, control, show.info = getMlrOption("show.info") )
learner |
(Learner | |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneControl) |
show.info |
( |
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
tuneParams()
,
tuneThreshold()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeUndersampleWrapper()
,
makeWeightedClassesWrapper()
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.rpart") # stupid mini grid ps = makeParamSet( makeDiscreteParam("cp", values = c(0.05, 0.1)), makeDiscreteParam("minsplit", values = c(10, 20)) ) ctrl = makeTuneControlGrid() inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl) mod = train(lrn, task) print(getTuneResult(mod)) # nested resampling for evaluation # we also extract tuned hyper pars in each iteration r = resample(lrn, task, outer, extract = getTuneResult) print(r$extract) getNestedTuneResultsOptPathDf(r) getNestedTuneResultsX(r)
task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.rpart") # stupid mini grid ps = makeParamSet( makeDiscreteParam("cp", values = c(0.05, 0.1)), makeDiscreteParam("minsplit", values = c(10, 20)) ) ctrl = makeTuneControlGrid() inner = makeResampleDesc("Holdout") outer = makeResampleDesc("CV", iters = 2) lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl) mod = train(lrn, task) print(getTuneResult(mod)) # nested resampling for evaluation # we also extract tuned hyper pars in each iteration r = resample(lrn, task, outer, extract = getTuneResult) print(r$extract) getNestedTuneResultsOptPathDf(r) getNestedTuneResultsX(r)
Creates a learner object, which can be used like any other learner object. Internally uses oversample or undersample before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next learner.
makeUndersampleWrapper(learner, usw.rate = 1, usw.cl = NULL) makeOversampleWrapper(learner, osw.rate = 1, osw.cl = NULL)
makeUndersampleWrapper(learner, usw.rate = 1, usw.cl = NULL) makeOversampleWrapper(learner, osw.rate = 1, osw.cl = NULL)
learner |
(Learner | |
usw.rate |
( |
usw.cl |
( |
osw.rate |
( |
osw.cl |
( |
Other imbalancy:
makeOverBaggingWrapper()
,
oversample()
,
smote()
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeWeightedClassesWrapper()
Creates a wrapper, which can be used like any other learner object.
Fitting is performed in a weighted fashion where each observation receives a weight,
depending on the class it belongs to, see wcw.weight
.
This might help to mitigate problems caused by imbalanced class distributions.
This weighted fitting can be achieved in two ways:
a) The learner already has a parameter for class weighting, so one weight can directly be defined
per class. Example: “classif.ksvm” and parameter class.weights
.
In this case we don't really do anything fancy. We convert wcw.weight
a bit,
but basically simply bind its value to the class weighting param.
The wrapper in this case simply offers a convenient, consistent fashion for class weighting -
and tuning! See example below.
b) The learner does not have a direct parameter to support class weighting, but
supports observation weights, so hasLearnerProperties(learner, 'weights')
is TRUE
.
This means that an individual, arbitrary weight can be set per observation during training.
We set this weight depending on the class internally in the wrapper. Basically we introduce
something like a new “class.weights” parameter for the learner via observation weights.
makeWeightedClassesWrapper(learner, wcw.param = NULL, wcw.weight = 1)
makeWeightedClassesWrapper(learner, wcw.param = NULL, wcw.weight = 1)
learner |
(Learner | |
wcw.param |
( |
wcw.weight |
(numeric) |
Other wrapper:
makeBaggingWrapper()
,
makeClassificationViaRegressionWrapper()
,
makeConstantClassWrapper()
,
makeCostSensClassifWrapper()
,
makeCostSensRegrWrapper()
,
makeDownsampleWrapper()
,
makeDummyFeaturesWrapper()
,
makeExtractFDAFeatsWrapper()
,
makeFeatSelWrapper()
,
makeFilterWrapper()
,
makeImputeWrapper()
,
makeMulticlassWrapper()
,
makeMultilabelBinaryRelevanceWrapper()
,
makeMultilabelClassifierChainsWrapper()
,
makeMultilabelDBRWrapper()
,
makeMultilabelNestedStackingWrapper()
,
makeMultilabelStackingWrapper()
,
makeOverBaggingWrapper()
,
makePreprocWrapper()
,
makePreprocWrapperCaret()
,
makeRemoveConstantFeaturesWrapper()
,
makeSMOTEWrapper()
,
makeTuneWrapper()
,
makeUndersampleWrapper()
set.seed(123) # using the direct parameter of the SVM (which is already defined in the learner) lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.weight = 0.01) res = holdout(lrn, sonar.task) print(calculateConfusionMatrix(res$pred)) # using the observation weights of logreg lrn = makeWeightedClassesWrapper("classif.logreg", wcw.weight = 0.01) res = holdout(lrn, sonar.task) print(calculateConfusionMatrix(res$pred)) # tuning the imbalancy param and the SVM param in one go lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.param = "class.weights") ps = makeParamSet( makeNumericParam("wcw.weight", lower = 1, upper = 10), makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneControlRandom(maxit = 3L) rdesc = makeResampleDesc("CV", iters = 2L, stratify = TRUE) res = tuneParams(lrn, sonar.task, rdesc, par.set = ps, control = ctrl) print(res) # print(res$opt.path)
set.seed(123) # using the direct parameter of the SVM (which is already defined in the learner) lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.weight = 0.01) res = holdout(lrn, sonar.task) print(calculateConfusionMatrix(res$pred)) # using the observation weights of logreg lrn = makeWeightedClassesWrapper("classif.logreg", wcw.weight = 0.01) res = holdout(lrn, sonar.task) print(calculateConfusionMatrix(res$pred)) # tuning the imbalancy param and the SVM param in one go lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.param = "class.weights") ps = makeParamSet( makeNumericParam("wcw.weight", lower = 1, upper = 10), makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneControlRandom(maxit = 3L) rdesc = makeResampleDesc("CV", iters = 2L, stratify = TRUE) res = tuneParams(lrn, sonar.task, rdesc, par.set = ps, control = ctrl) print(res) # print(res$opt.path)
Result from train.
It internally stores the underlying fitted model, the subset used for training, features used for training, levels of factors in the data set and computation time that was spent for training.
Object members: See arguments.
The constructor makeWrappedModel
is mainly for internal use.
makeWrappedModel( learner, learner.model, task.desc, subset, features, factor.levels, time )
makeWrappedModel( learner, learner.model, task.desc, subset, features, factor.levels, time )
learner |
(Learner | |
learner.model |
(any) |
task.desc |
TaskDesc |
subset |
(integer | logical | |
features |
(character) |
factor.levels |
(named list of character) |
time |
( |
Properties can be accessed with getMeasureProperties(measure)
, which returns a
character vector.
The measure properties are defined in Measure.
getMeasureProperties(measure) hasMeasureProperties(measure, props)
getMeasureProperties(measure) hasMeasureProperties(measure, props)
measure |
(Measure) |
props |
(character) |
getMeasureProperties
returns a character vector with measure properties.
hasMeasureProperties
returns a logical vector of the same length as props
.
A performance measure is evaluated after a single train/predict step and returns a single number to assess the quality of the prediction (or maybe only the model, think AIC). The measure itself knows whether it wants to be minimized or maximized and for what tasks it is applicable.
All supported measures can be found by listMeasures or as a table in the tutorial appendix: https://mlr.mlr-org.com/articles/tutorial/measures.html.
If you want a measure for a misclassification cost matrix, look at makeCostMeasure. If you want to implement your own measure, look at makeMeasure.
Most measures can directly be accessed via the function named after the scheme measureX (e.g. measureSSE).
For clustering measures, we compact the predicted cluster IDs such that they form a continuous series starting with 1. If this is not the case, some of the measures will generate warnings.
Some measure have parameters. Their defaults are set in the constructor makeMeasure and can be overwritten using setMeasurePars.
measureSSE(truth, response) measureMSE(truth, response) measureRMSE(truth, response) measureMEDSE(truth, response) measureSAE(truth, response) measureMAE(truth, response) measureMEDAE(truth, response) measureRSQ(truth, response) measureEXPVAR(truth, response) measureRRSE(truth, response) measureRAE(truth, response) measureMAPE(truth, response) measureMSLE(truth, response) measureRMSLE(truth, response) measureKendallTau(truth, response) measureSpearmanRho(truth, response) measureMMCE(truth, response) measureACC(truth, response) measureBER(truth, response) measureAUNU(probabilities, truth) measureAUNP(probabilities, truth) measureAU1U(probabilities, truth) measureAU1P(probabilities, truth) measureMulticlassBrier(probabilities, truth) measureLogloss(probabilities, truth) measureSSR(probabilities, truth) measureQSR(probabilities, truth) measureLSR(probabilities, truth) measureKAPPA(truth, response) measureWKAPPA(truth, response) measureAUC(probabilities, truth, negative, positive) measureBrier(probabilities, truth, negative, positive) measureBrierScaled(probabilities, truth, negative, positive) measureBAC(truth, response) measureTP(truth, response, positive) measureTN(truth, response, negative) measureFP(truth, response, positive) measureFN(truth, response, negative) measureTPR(truth, response, positive) measureTNR(truth, response, negative) measureFPR(truth, response, negative, positive) measureFNR(truth, response, negative, positive) measurePPV(truth, response, positive, probabilities = NULL) measureNPV(truth, response, negative) measureFDR(truth, response, positive) measureMCC(truth, response, negative, positive) measureF1(truth, response, positive) measureGMEAN(truth, response, negative, positive) measureGPR(truth, response, positive) measureMultilabelHamloss(truth, response) measureMultilabelSubset01(truth, response) measureMultilabelF1(truth, response) measureMultilabelACC(truth, response) measureMultilabelPPV(truth, response) measureMultilabelTPR(truth, response)
measureSSE(truth, response) measureMSE(truth, response) measureRMSE(truth, response) measureMEDSE(truth, response) measureSAE(truth, response) measureMAE(truth, response) measureMEDAE(truth, response) measureRSQ(truth, response) measureEXPVAR(truth, response) measureRRSE(truth, response) measureRAE(truth, response) measureMAPE(truth, response) measureMSLE(truth, response) measureRMSLE(truth, response) measureKendallTau(truth, response) measureSpearmanRho(truth, response) measureMMCE(truth, response) measureACC(truth, response) measureBER(truth, response) measureAUNU(probabilities, truth) measureAUNP(probabilities, truth) measureAU1U(probabilities, truth) measureAU1P(probabilities, truth) measureMulticlassBrier(probabilities, truth) measureLogloss(probabilities, truth) measureSSR(probabilities, truth) measureQSR(probabilities, truth) measureLSR(probabilities, truth) measureKAPPA(truth, response) measureWKAPPA(truth, response) measureAUC(probabilities, truth, negative, positive) measureBrier(probabilities, truth, negative, positive) measureBrierScaled(probabilities, truth, negative, positive) measureBAC(truth, response) measureTP(truth, response, positive) measureTN(truth, response, negative) measureFP(truth, response, positive) measureFN(truth, response, negative) measureTPR(truth, response, positive) measureTNR(truth, response, negative) measureFPR(truth, response, negative, positive) measureFNR(truth, response, negative, positive) measurePPV(truth, response, positive, probabilities = NULL) measureNPV(truth, response, negative) measureFDR(truth, response, positive) measureMCC(truth, response, negative, positive) measureF1(truth, response, positive) measureGMEAN(truth, response, negative, positive) measureGPR(truth, response, positive) measureMultilabelHamloss(truth, response) measureMultilabelSubset01(truth, response) measureMultilabelF1(truth, response) measureMultilabelACC(truth, response) measureMultilabelPPV(truth, response) measureMultilabelTPR(truth, response)
truth |
(factor) |
response |
(factor) |
probabilities |
(numeric | matrix) |
negative |
( |
positive |
( |
He, H. & Garcia, E. A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9. pp. 1263-1284.
H. Uno et al. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Procedures with Censored Survival Data Statistics in medicine. 2011;30(10):1105-1117. doi:10.1002/sim.4154.
H. Uno et al. Evaluating Prediction Rules for T-Year Survivors with Censored Regression Models Journal of the American Statistical Association 102, no. 478 (2007): 527-37.
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
performance()
,
setAggregation()
,
setMeasurePars()
The function automatically combines a list of BenchmarkResult objects into a single BenchmarkResult object as long as the full crossproduct of all task-learner combinations are available.
mergeBenchmarkResults(bmrs)
mergeBenchmarkResults(bmrs)
bmrs |
(list of BenchmarkResult) |
Note that if you want to merge several BenchmarkResult objects, you must ensure that all possible learner and task combinations will be contained in the returned object. Otherwise, the user will be notified which task-learner combinations are missing or duplicated.
When merging BenchmarkResult objects with different measures, all missing measures will automatically be recomputed.
Merges factor levels that occur only infrequently into combined levels with a higher frequency.
mergeSmallFactorLevels( task, cols = NULL, min.perc = 0.01, new.level = ".merged" )
mergeSmallFactorLevels( task, cols = NULL, min.perc = 0.01, new.level = ".merged" )
task |
(Task) |
cols |
(character) Which columns to convert. Default is all factor and character columns. |
min.perc |
( |
new.level |
( |
Task
, where merged levels are combined into a new level of name new.level
.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
List of all mlr documentation families with members.
benchmark |
batchmark, reduceBatchmarkResults, benchmark, benchmarkParallel, getBMRTaskIds, getBMRLearners, getBMRLearnerIds, getBMRLearnerShortNames, getBMRMeasures, getBMRMeasureIds, getBMRPredictions, getBMRPerformances, getBMRAggrPerformances, getBMRTuneResults, getBMRFeatSelResults, getBMRFilteredFeatures, getBMRModels, getBMRTaskDescs, convertBMRToRankMatrix, friedmanPostHocTestBMR, friedmanTestBMR, plotBMRBoxplots, plotBMRRanksAsBarChart, generateCritDifferencesData, plotCritDifferences |
calibration |
generateCalibrationData, plotCalibration |
configure |
configureMlr, getMlrOptions |
costsens |
makeCostSensTask, makeCostSensWeightedPairsWrapper |
debug |
predictFailureModel, getPredictionDump, getRRDump, print.ResampleResult |
downsample |
downsample |
eda_and_preprocess |
capLargeValues, createDummyFeatures, dropFeatures, mergeSmallFactorLevels, normalizeFeatures, removeConstantFeatures, summarizeColumns, summarizeLevels |
extractFDAFeatures |
reextractFDAFeatures |
fda_featextractor |
extractFDAFourier, extractFDAWavelets, extractFDAFPCA, extractFDAMultiResFeatures |
fda |
makeExtractFDAFeatMethod, extractFDAFeatures |
featsel |
analyzeFeatSelResult, makeFeatSelControl, getFeatSelResult, selectFeatures |
filter |
filterFeatures, makeFilter, listFilterMethods, getFilteredFeatures, generateFilterValuesData, getFilterValues |
generate_plot_data |
generateFeatureImportanceData, plotFilterValues, generatePartialDependenceData |
help |
helpLearner, helpLearnerParam |
imbalancy |
oversample, smote |
impute |
makeImputeMethod, imputeConstant, impute, reimpute |
learner |
getClassWeightParam, getHyperPars, getParamSet.Learner, getLearnerType, getLearnerId, getLearnerPredictType, getLearnerPackages, getLearnerParamSet, getLearnerParVals, setLearnerId, getLearnerShortName, getLearnerProperties, makeLearner, makeLearners, removeHyperPars, setHyperPars, setId, setPredictThreshold, setPredictType |
learning_curve |
generateLearningCurveData |
multilabel |
getMultilabelBinaryPerformances, makeMultilabelBinaryRelevanceWrapper, makeMultilabelClassifierChainsWrapper, makeMultilabelDBRWrapper, makeMultilabelNestedStackingWrapper, makeMultilabelStackingWrapper |
performance |
calculateConfusionMatrix, calculateROCMeasures, makeCustomResampledMeasure, makeCostMeasure, setMeasurePars, setAggregation, makeMeasure, featperc, performance, estimateRelativeOverfitting |
plot |
createSpatialResamplingPlots, plotLearningCurve, plotPartialDependence, plotBMRSummary, plotResiduals |
predict |
asROCRPrediction, getPredictionProbabilities, getPredictionTaskDesc, getPredictionResponse, predict.WrappedModel |
resample |
makeResampleDesc, makeResampleInstance, makeResamplePrediction, resample, getRRPredictions, getRRTaskDescription, getRRTaskDesc, getRRPredictionList, addRRMeasure |
task |
getTaskDesc, getTaskType, getTaskId, getTaskTargetNames, getTaskClassLevels, getTaskFeatureNames, getTaskNFeats, getTaskSize, getTaskFormula, getTaskTargets, getTaskData, getTaskCosts, subsetTask |
thresh_vs_perf |
generateThreshVsPerfData, plotThreshVsPerf, plotROCCurves |
tune |
getNestedTuneResultsX, getNestedTuneResultsOptPathDf, getResamplingIndices, getTuneResult, makeModelMultiplexerParamSet, makeModelMultiplexer, makeTuneControlCMAES, makeTuneControlDesign, makeTuneControlGenSA, makeTuneControlGrid, makeTuneControlIrace, makeTuneControlMBO, makeTuneControl, makeTuneControlRandom, tuneParams, tuneThreshold |
tune_multicrit |
plotTuneMultiCritResult, makeTuneMultiCritControl, tuneParamsMultiCrit |
wrapper |
makeBaggingWrapper, makeClassificationViaRegressionWrapper, makeConstantClassWrapper, makeCostSensClassifWrapper, makeCostSensRegrWrapper, makeDownsampleWrapper, makeDummyFeaturesWrapper, makeExtractFDAFeatsWrapper, makeFeatSelWrapper, makeFilterWrapper, makeImputeWrapper, makeMulticlassWrapper, makeOverBaggingWrapper, makeUndersampleWrapper, makePreprocWrapperCaret, makePreprocWrapper, makeRemoveConstantFeaturesWrapper, makeSMOTEWrapper, makeTuneWrapper, makeWeightedClassesWrapper |
Contains the task (mtcars.task
).
See datasets::mtcars.
Normalize features by different methods. Internally BBmisc::normalize is used for every feature column. Non numerical features will be left untouched and passed to the result. For constant features most methods fail, special behaviour for this case is implemented.
normalizeFeatures( obj, target = character(0L), method = "standardize", cols = NULL, range = c(0, 1), on.constant = "quiet" )
normalizeFeatures( obj, target = character(0L), method = "standardize", cols = NULL, range = c(0, 1), on.constant = "quiet" )
obj |
(data.frame | Task) |
target |
( |
method |
( |
cols |
(character) |
range |
( |
on.constant |
( |
data.frame | Task. Same type as obj
.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
removeConstantFeatures()
,
summarizeColumns()
,
summarizeLevels()
Oversampling: For a given class (usually the smaller one) all existing observations are taken and copied and extra observations are added by randomly sampling with replacement from this class.
Undersampling: For a given class (usually the larger one) the number of observations is reduced (downsampled) by randomly sampling without replacement from this class.
oversample(task, rate, cl = NULL) undersample(task, rate, cl = NULL)
oversample(task, rate, cl = NULL) undersample(task, rate, cl = NULL)
task |
(Task) |
rate |
( |
cl |
( |
Task.
Other imbalancy:
makeOverBaggingWrapper()
,
makeUndersampleWrapper()
,
smote()
mlr supports different methods to activate parallel computing capabilities through the integration of the parallelMap::parallelMap package, which supports all major parallelization backends for R.
You can start parallelization with parallelStart*
, where *
should be replaced with the chosen backend.
parallelMap::parallelStop is used to stop all parallelization backends.
Parallelization is divided into different levels and will automatically be carried out for the first level that occurs, e.g. if you call resample()
after parallelMap::parallelStart, each resampling iteration is a parallel job and possible underlying calls like parameter tuning won't be parallelized further.
The supported levels of parallelization are:
"mlr.resample"
Each resampling iteration (a train/test step) is a parallel job.
"mlr.benchmark"
Each experiment "run this learner on this data set" is a parallel job.
"mlr.tuneParams"
Each evaluation in hyperparameter space "resample with these parameter settings" is a parallel job. How many of these can be run independently in parallel depends on the tuning algorithm. For grid search or random search there is no limit, but for other tuners it depends on how many points to evaluate are produced in each iteration of the optimization. If a tuner works in a purely sequential fashion, we cannot work magic and the hyperparameter evaluation will also run sequentially. But note that you can still parallelize the underlying resampling.
"mlr.selectFeatures"
Each evaluation in feature space "resample with this feature subset" is a parallel job. The same comments as for "mlr.tuneParams"
apply here.
"mlr.ensemble"
For all ensemble methods, the training and prediction of each individual learner is a parallel job. Supported ensemble methods are the makeBaggingWrapper, makeCostSensRegrWrapper, makeMulticlassWrapper, makeMultilabelBinaryRelevanceWrapper and the makeOverBaggingWrapper.
Measures the quality of a prediction w.r.t. some performance measure.
performance( pred, measures, task = NULL, model = NULL, feats = NULL, simpleaggr = FALSE )
performance( pred, measures, task = NULL, model = NULL, feats = NULL, simpleaggr = FALSE )
pred |
(Prediction) |
measures |
(Measure | list of Measure) |
task |
(Task) |
model |
(WrappedModel) |
feats |
(data.frame) |
simpleaggr |
(logical) |
(named numeric). Performance value(s), named by measure(s).
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
setAggregation()
,
setMeasurePars()
training.set = seq(1, nrow(iris), by = 2) test.set = seq(2, nrow(iris), by = 2) task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda") mod = train(lrn, task, subset = training.set) pred = predict(mod, newdata = iris[test.set, ]) performance(pred, measures = mmce) # Compute multiple performance measures at once ms = list("mmce" = mmce, "acc" = acc, "timetrain" = timetrain) performance(pred, measures = ms, task, mod)
training.set = seq(1, nrow(iris), by = 2) test.set = seq(2, nrow(iris), by = 2) task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda") mod = train(lrn, task, subset = training.set) pred = predict(mod, newdata = iris[test.set, ]) performance(pred, measures = mmce) # Compute multiple performance measures at once ms = list("mmce" = mmce, "acc" = acc, "timetrain" = timetrain) performance(pred, measures = ms, task, mod)
Contains the task (phoneme.task
).
The task contains a single functional covariate and 5 equally big classes (aa, ao, dcl, iy, sh).
The aim is to predict the class of the phoneme in the functional.
The dataset is contained in the package fda.usc.
F. Ferraty and P. Vieu (2003) "Curve discrimination: a nonparametric functional approach", Computational Statistics and Data Analysis, 44(1-2), 161-173. F. Ferraty and P. Vieu (2006) Nonparametric functional data analysis, New York: Springer. T. Hastie and R. Tibshirani and J. Friedman (2009) The elements of statistical learning: Data mining, inference and prediction, 2nd edn, New York: Springer.
Contains the task (pid.task
).
See mlbench::PimaIndiansDiabetes. Note that this is the uncorrected version from mlbench.
Plots box or violin plots for a selected measure
across all iterations
of the resampling strategy, faceted by the task.id
.
plotBMRBoxplots( bmr, measure = NULL, style = "box", order.lrns = NULL, order.tsks = NULL, pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
plotBMRBoxplots( bmr, measure = NULL, style = "box", order.lrns = NULL, order.tsks = NULL, pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
style |
( |
order.lrns |
( |
order.tsks |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol
|
(integer) |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
Plots a bar chart from the ranks of algorithms. Alternatively,
tiles can be plotted for every rank-task combination, see pos
for details. In all plot variants the ranks of the learning algorithms are displayed on
the x-axis. Areas are always colored according to the learner.id
.
plotBMRRanksAsBarChart( bmr, measure = NULL, ties.method = "average", aggregation = "default", pos = "stack", order.lrns = NULL, order.tsks = NULL, pretty.names = TRUE )
plotBMRRanksAsBarChart( bmr, measure = NULL, ties.method = "average", aggregation = "default", pos = "stack", order.lrns = NULL, order.tsks = NULL, pretty.names = TRUE )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
ties.method |
( |
aggregation |
( |
pos |
( |
order.lrns |
( |
order.tsks |
( |
pretty.names |
( |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRSummary()
,
plotCritDifferences()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
Creates a scatter plot, where each line refers to a task. On that line the aggregated scores for all learners are plotted, for that task. Optionally, you can apply a rank transformation or just use one of ggplot2's transformations like ggplot2::scale_x_log10.
plotBMRSummary( bmr, measure = NULL, trafo = "none", order.tsks = NULL, pointsize = 4L, jitter = 0.05, pretty.names = TRUE )
plotBMRSummary( bmr, measure = NULL, trafo = "none", order.tsks = NULL, pointsize = 4L, jitter = 0.05, pretty.names = TRUE )
bmr |
(BenchmarkResult) |
measure |
(Measure) |
trafo |
( |
order.tsks |
( |
pointsize |
( |
jitter |
( |
pretty.names |
( |
ggplot2 plot object.
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotCritDifferences()
,
reduceBatchmarkResults()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
# see benchmark
# see benchmark
Plots calibration data from generateCalibrationData.
plotCalibration( obj, smooth = FALSE, reference = TRUE, rag = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
plotCalibration( obj, smooth = FALSE, reference = TRUE, rag = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
obj |
(CalibrationData) |
smooth |
( |
reference |
( |
rag |
( |
facet.wrap.nrow , facet.wrap.ncol
|
(integer) |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other calibration:
generateCalibrationData()
## Not run: lrns = list(makeLearner("classif.rpart", predict.type = "prob"), makeLearner("classif.nnet", predict.type = "prob")) fit = lapply(lrns, train, task = iris.task) pred = lapply(fit, predict, task = iris.task) names(pred) = c("rpart", "nnet") out = generateCalibrationData(pred, groups = 3) plotCalibration(out) fit = lapply(lrns, train, task = sonar.task) pred = lapply(fit, predict, task = sonar.task) names(pred) = c("rpart", "lda") out = generateCalibrationData(pred) plotCalibration(out) ## End(Not run)
## Not run: lrns = list(makeLearner("classif.rpart", predict.type = "prob"), makeLearner("classif.nnet", predict.type = "prob")) fit = lapply(lrns, train, task = iris.task) pred = lapply(fit, predict, task = iris.task) names(pred) = c("rpart", "nnet") out = generateCalibrationData(pred, groups = 3) plotCalibration(out) fit = lapply(lrns, train, task = sonar.task) pred = lapply(fit, predict, task = sonar.task) names(pred) = c("rpart", "lda") out = generateCalibrationData(pred) plotCalibration(out) ## End(Not run)
Plots a critical-differences diagram for all classifiers and a selected measure. If a baseline is selected for the Bonferroni-Dunn test, the critical difference interval will be positioned around the baseline. If not, the best performing algorithm will be chosen as baseline.
The positioning of some descriptive elements can be moved by modifying the generated data.
plotCritDifferences(obj, baseline = NULL, pretty.names = TRUE)
plotCritDifferences(obj, baseline = NULL, pretty.names = TRUE)
obj |
( |
baseline |
( |
pretty.names |
( |
ggplot2 plot object.
Janez Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, JMLR, 2006
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
reduceBatchmarkResults()
# see benchmark
# see benchmark
Plot filter values using ggplot2.
plotFilterValues( fvalues, sort = "dec", n.show = nrow(fvalues$data), filter = NULL, feat.type.cols = FALSE )
plotFilterValues( fvalues, sort = "dec", n.show = nrow(fvalues$data), filter = NULL, feat.type.cols = FALSE )
fvalues |
(FilterValues) |
sort |
(
Default is decreasing. |
n.show |
( |
filter |
( |
feat.type.cols |
( |
ggplot2 plot object.
Other filter:
filterFeatures()
,
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilter()
,
makeFilterEnsemble()
,
makeFilterWrapper()
Other generate_plot_data:
generateCalibrationData()
,
generateCritDifferencesData()
,
generateFeatureImportanceData()
,
generateFilterValuesData()
,
generateLearningCurveData()
,
generatePartialDependenceData()
,
generateThreshVsPerfData()
fv = generateFilterValuesData(iris.task, method = "variance") plotFilterValues(fv)
fv = generateFilterValuesData(iris.task, method = "variance") plotFilterValues(fv)
Plot hyperparameter validation path. Automated plotting method for
HyperParsEffectData
object. Useful for determining the importance
or effect of a particular hyperparameter on some performance measure and/or
optimizer.
plotHyperParsEffect( hyperpars.effect.data, x = NULL, y = NULL, z = NULL, plot.type = "scatter", loess.smooth = FALSE, facet = NULL, global.only = TRUE, interpolate = NULL, show.experiments = FALSE, show.interpolated = FALSE, nested.agg = mean, partial.dep.learn = NULL )
plotHyperParsEffect( hyperpars.effect.data, x = NULL, y = NULL, z = NULL, plot.type = "scatter", loess.smooth = FALSE, facet = NULL, global.only = TRUE, interpolate = NULL, show.experiments = FALSE, show.interpolated = FALSE, nested.agg = mean, partial.dep.learn = NULL )
hyperpars.effect.data |
( |
x |
( |
y |
( |
z |
( |
plot.type |
( |
loess.smooth |
( |
facet |
( |
global.only |
( |
interpolate |
(Learner | |
show.experiments |
( |
show.interpolated |
( |
nested.agg |
( |
partial.dep.learn |
(Learner | |
ggplot2 plot object.
Any NAs incurred from learning algorithm crashes will be indicated in
the plot (except in the case of partial dependence) and the NA values will be
replaced with the column min/max depending on the optimal values for the
respective measure. Execution time will be replaced with the max.
Interpolation by its nature will result in predicted values for the
performance measure. Use interpolation with caution. If “partial.dep”
is set to TRUE
in generateHyperParsEffectData, only
partial dependence will be plotted.
Since a ggplot2 plot object is returned, the user can change the axis labels and other aspects of the plot using the appropriate ggplot2 syntax.
# see generateHyperParsEffectData
# see generateHyperParsEffectData
Trains the model for 1 or 2 selected features, then displays it via ggplot2::ggplot. Good for teaching or exploring models.
For classification and clustering, only 2D plots are supported. The data points, the classification and potentially through color alpha blending the posterior probabilities are shown.
For regression, 1D and 2D plots are supported. 1D shows the data, the estimated mean and potentially the estimated standard error. 2D does not show estimated standard error, but only the estimated mean via background color.
The plot title displays the model id, its parameters, the training performance and the cross-validation performance.
plotLearnerPrediction( learner, task, features = NULL, measures, cv = 10L, ..., gridsize, pointsize = 2, prob.alpha = TRUE, se.band = TRUE, err.mark = "train", bg.cols = c("darkblue", "green", "darkred"), err.col = "white", err.size = pointsize, greyscale = FALSE, pretty.names = TRUE )
plotLearnerPrediction( learner, task, features = NULL, measures, cv = 10L, ..., gridsize, pointsize = 2, prob.alpha = TRUE, se.band = TRUE, err.mark = "train", bg.cols = c("darkblue", "green", "darkred"), err.col = "white", err.size = pointsize, greyscale = FALSE, pretty.names = TRUE )
learner |
(Learner | |
task |
(Task) |
features |
(character) |
measures |
(Measure | list of Measure) |
cv |
( |
... |
(any) |
gridsize |
( |
pointsize |
( |
prob.alpha |
( |
se.band |
( |
err.mark |
( |
bg.cols |
( |
err.col |
( |
err.size |
( |
greyscale |
( |
pretty.names |
( |
The ggplot2 object.
Visualizes data size (percentage used for model) vs. performance measure(s).
plotLearningCurve( obj, facet = "measure", pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
plotLearningCurve( obj, facet = "measure", pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
obj |
(LearningCurveData) |
facet |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol
|
(integer) |
ggplot2 plot object.
Other learning_curve:
generateLearningCurveData()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Plot a partial dependence from generatePartialDependenceData using ggplot2.
plotPartialDependence( obj, geom = "line", facet = NULL, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL, p = 1, data = NULL )
plotPartialDependence( obj, geom = "line", facet = NULL, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL, p = 1, data = NULL )
obj |
PartialDependenceData |
geom |
( |
facet |
( |
facet.wrap.nrow , facet.wrap.ncol
|
(integer) |
p |
( |
data |
(data.frame) |
ggplot2 plot object.
Other partial_dependence:
generatePartialDependenceData()
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotROCCurves()
,
plotResiduals()
,
plotThreshVsPerf()
Plots for model diagnostics. Provides scatterplots of true vs. predicted values and histograms of the model's residuals.
plotResiduals( obj, type = "scatterplot", loess.smooth = TRUE, rug = TRUE, pretty.names = TRUE )
plotResiduals( obj, type = "scatterplot", loess.smooth = TRUE, rug = TRUE, pretty.names = TRUE )
obj |
(Prediction | BenchmarkResult) |
type |
Type of plot. Can be “scatterplot”, the default. Or “hist”, for a histogram, or in case of classification problems a barplot, displaying the residuals. |
loess.smooth |
( |
rug |
( |
pretty.names |
( |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotThreshVsPerf()
Plots a ROC curve from predictions.
plotROCCurves( obj, measures, diagonal = TRUE, pretty.names = TRUE, facet.learner = FALSE )
plotROCCurves( obj, measures, diagonal = TRUE, pretty.names = TRUE, facet.learner = FALSE )
obj |
(ThreshVsPerfData) |
measures |
([list(2)' of Measure) |
diagonal |
( |
pretty.names |
( |
facet.learner |
( |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotResiduals()
,
plotThreshVsPerf()
Other thresh_vs_perf:
generateThreshVsPerfData()
,
plotThreshVsPerf()
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, sonar.task) pred = predict(fit, task = sonar.task) roc = generateThreshVsPerfData(pred, list(fpr, tpr)) plotROCCurves(roc) r = bootstrapB632plus(lrn, sonar.task, iters = 3) roc_r = generateThreshVsPerfData(r, list(fpr, tpr), aggregate = FALSE) plotROCCurves(roc_r) r2 = crossval(lrn, sonar.task, iters = 3) roc_l = generateThreshVsPerfData(list(boot = r, cv = r2), list(fpr, tpr), aggregate = FALSE) plotROCCurves(roc_l)
lrn = makeLearner("classif.rpart", predict.type = "prob") fit = train(lrn, sonar.task) pred = predict(fit, task = sonar.task) roc = generateThreshVsPerfData(pred, list(fpr, tpr)) plotROCCurves(roc) r = bootstrapB632plus(lrn, sonar.task, iters = 3) roc_r = generateThreshVsPerfData(r, list(fpr, tpr), aggregate = FALSE) plotROCCurves(roc_r) r2 = crossval(lrn, sonar.task, iters = 3) roc_l = generateThreshVsPerfData(list(boot = r, cv = r2), list(fpr, tpr), aggregate = FALSE) plotROCCurves(roc_l)
Plots threshold vs. performance(s) data that has been generated with generateThreshVsPerfData.
plotThreshVsPerf( obj, measures = obj$measures, facet = "measure", mark.th = NA_real_, pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
plotThreshVsPerf( obj, measures = obj$measures, facet = "measure", mark.th = NA_real_, pretty.names = TRUE, facet.wrap.nrow = NULL, facet.wrap.ncol = NULL )
obj |
(ThreshVsPerfData) |
measures |
(Measure | list of Measure) |
facet |
( |
mark.th |
( |
pretty.names |
( |
facet.wrap.nrow , facet.wrap.ncol
|
(integer) |
ggplot2 plot object.
Other plot:
createSpatialResamplingPlots()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCalibration()
,
plotCritDifferences()
,
plotLearningCurve()
,
plotPartialDependence()
,
plotROCCurves()
,
plotResiduals()
Other thresh_vs_perf:
generateThreshVsPerfData()
,
plotROCCurves()
lrn = makeLearner("classif.rpart", predict.type = "prob") mod = train(lrn, sonar.task) pred = predict(mod, sonar.task) pvs = generateThreshVsPerfData(pred, list(acc, setAggregation(acc, train.mean))) plotThreshVsPerf(pvs)
lrn = makeLearner("classif.rpart", predict.type = "prob") mod = train(lrn, sonar.task) pred = predict(mod, sonar.task) pvs = generateThreshVsPerfData(pred, list(acc, setAggregation(acc, train.mean))) plotThreshVsPerf(pvs)
Visualizes the pareto front and possibly the dominated points.
plotTuneMultiCritResult( res, path = TRUE, col = NULL, shape = NULL, pointsize = 2, pretty.names = TRUE )
plotTuneMultiCritResult( res, path = TRUE, col = NULL, shape = NULL, pointsize = 2, pretty.names = TRUE )
res |
(TuneMultiCritResult) |
path |
( |
col |
( |
shape |
( |
pointsize |
( |
pretty.names |
( |
ggplot2 plot object.
Other tune_multicrit:
TuneMultiCritControl
,
tuneParamsMultiCrit()
# see tuneParamsMultiCrit
# see tuneParamsMultiCrit
Predict the target variable of new data using a fitted model.
What is stored exactly in the (Prediction) object depends
on the predict.type
setting of the Learner.
If predict.type
was set to “prob” probability thresholding
can be done calling the setThreshold function on the
prediction object.
The row names of the input task
or newdata
are preserved in the output.
## S3 method for class 'WrappedModel' predict(object, task, newdata, subset = NULL, ...)
## S3 method for class 'WrappedModel' predict(object, task, newdata, subset = NULL, ...)
object |
(WrappedModel) |
task |
(Task) |
newdata |
(data.frame) |
subset |
(integer | logical | |
... |
(any) |
(Prediction).
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
setPredictThreshold()
,
setPredictType()
# train and predict train.set = seq(1, 150, 2) test.set = seq(2, 150, 2) model = train("classif.lda", iris.task, subset = train.set) p = predict(model, newdata = iris, subset = test.set) print(p) predict(model, task = iris.task, subset = test.set) # predict now probabiliies instead of class labels lrn = makeLearner("classif.lda", predict.type = "prob") model = train(lrn, iris.task, subset = train.set) p = predict(model, task = iris.task, subset = test.set) print(p) getPredictionProbabilities(p)
# train and predict train.set = seq(1, 150, 2) test.set = seq(2, 150, 2) model = train("classif.lda", iris.task, subset = train.set) p = predict(model, newdata = iris, subset = test.set) print(p) predict(model, task = iris.task, subset = test.set) # predict now probabiliies instead of class labels lrn = makeLearner("classif.lda", predict.type = "prob") model = train(lrn, iris.task, subset = train.set) p = predict(model, task = iris.task, subset = test.set) print(p) getPredictionProbabilities(p)
Mainly for internal use. Predict new data with a fitted model. You have to implement this method if you want to add another learner to this package.
predictLearner(.learner, .model, .newdata, ...)
predictLearner(.learner, .model, .newdata, ...)
.learner |
(RLearner) |
.model |
(WrappedModel) |
.newdata |
(data.frame) |
... |
(any) |
Your implementation must adhere to the following:
Predictions for the observations in .newdata
must be made based on the fitted
model (.model$learner.model
).
All parameters in ...
must be passed to the underlying predict function.
For classification: Either a factor with class labels for type “response” or, if the learner supports this, a matrix of class probabilities for type “prob”. In the latter case the columns must be named with the class labels.
For regression: Either a numeric vector for type “response” or, if the learner supports this, a matrix with two columns for type “se”. In the latter case the first column contains the estimated response (mean value) and the second column the estimated standard errors.
For survival: Either a numeric vector with some sort of orderable risk for type “response” or, if supported, a numeric vector with time dependent probabilities for type “prob”.
For clustering: Either an integer with cluster IDs for type “response” or, if supported, a matrix of membership probabilities for type “prob”.
For multilabel: A logical matrix that indicates predicted class labels for type “response” or, if supported, a matrix of class probabilities for type “prob”. The columns must be named with the class labels.
This creates a BenchmarkResult from a batchtools::ExperimentRegistry. To setup the benchmark have a look at batchmark.
reduceBatchmarkResults( ids = NULL, keep.pred = TRUE, keep.extract = FALSE, show.info = getMlrOption("show.info"), reg = batchtools::getDefaultRegistry() )
reduceBatchmarkResults( ids = NULL, keep.pred = TRUE, keep.extract = FALSE, show.info = getMlrOption("show.info"), reg = batchtools::getDefaultRegistry() )
ids |
(data.frame or integer) |
keep.pred |
( |
keep.extract |
( |
show.info |
( |
reg |
(batchtools::ExperimentRegistry) |
Other benchmark:
BenchmarkResult
,
batchmark()
,
benchmark()
,
convertBMRToRankMatrix()
,
friedmanPostHocTestBMR()
,
friedmanTestBMR()
,
generateCritDifferencesData()
,
getBMRAggrPerformances()
,
getBMRFeatSelResults()
,
getBMRFilteredFeatures()
,
getBMRLearnerIds()
,
getBMRLearnerShortNames()
,
getBMRLearners()
,
getBMRMeasureIds()
,
getBMRMeasures()
,
getBMRModels()
,
getBMRPerformances()
,
getBMRPredictions()
,
getBMRTaskDescs()
,
getBMRTaskIds()
,
getBMRTuneResults()
,
plotBMRBoxplots()
,
plotBMRRanksAsBarChart()
,
plotBMRSummary()
,
plotCritDifferences()
This function accepts a data frame or a task and an extractFDAFeatDesc (a FDA feature extraction description) as returned by extractFDAFeatures to extract features from previously unseen data.
reextractFDAFeatures(obj, desc, ...)
reextractFDAFeatures(obj, desc, ...)
obj |
(Task | data.frame) |
desc |
( |
... |
(any) |
data.frame or Task containing the extracted Features
This function accepts a data frame or a task and an imputation description as returned by impute to perform the following actions:
Restore dropped columns, setting them to NA
Add dummy variables for columns as specified in impute
Optionally check factors for new levels to treat them as NA
s
Reorder factor levels to ensure identical integer representation as before
Impute missing values using previously collected data
reimpute(obj, desc)
reimpute(obj, desc)
obj |
(data.frame | Task) |
desc |
( |
Imputated data.frame
or task with imputed data.
Other impute:
imputations
,
impute()
,
makeImputeMethod()
,
makeImputeWrapper()
Constant features can lead to errors in some models and obviously provide no information in the training set that can be learned from. With the argument “perc”, there is a possibility to also remove features for which less than “perc” percent of the observations differ from the mode value.
removeConstantFeatures( obj, perc = 0, dont.rm = character(0L), na.ignore = FALSE, wrap.tol = .Machine$double.eps^0.5, show.info = getMlrOption("show.info"), ... )
removeConstantFeatures( obj, perc = 0, dont.rm = character(0L), na.ignore = FALSE, wrap.tol = .Machine$double.eps^0.5, show.info = getMlrOption("show.info"), ... )
obj |
(data.frame | Task) |
perc |
( |
dont.rm |
(character) |
na.ignore |
( |
wrap.tol |
( |
show.info |
( |
... |
To ensure backward compatibility with old argument |
data.frame | Task. Same type as obj
.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
summarizeColumns()
,
summarizeLevels()
Remove settings (previously set through mlr) for some parameters. Which means that the default behavior for that param will now be used.
removeHyperPars(learner, ids = character(0L))
removeHyperPars(learner, ids = character(0L))
learner |
(Learner | |
ids |
(character) |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
The function resample
fits a model specified by Learner on a Task
and calculates predictions and performance measures for all training
and all test sets specified by a either a resampling description (ResampleDesc)
or resampling instance (ResampleInstance).
You are able to return all fitted models (parameter models
) or extract specific parts
of the models (parameter extract
) as returning all of them completely
might be memory intensive.
The remaining functions on this page are convenience wrappers for the various
existing resampling strategies. Note that if you need to work with precomputed training and
test splits (i.e., resampling instances), you have to stick with resample
.
resample( learner, task, resampling, measures, weights = NULL, models = FALSE, extract, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) crossval( learner, task, iters = 10L, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) repcv( learner, task, folds = 10L, reps = 10L, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) holdout( learner, task, split = 2/3, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) subsample( learner, task, iters = 30, split = 2/3, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapOOB( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapB632( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapB632plus( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) growingcv( learner, task, horizon = 1, initial.window = 0.5, skip = 0, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) fixedcv( learner, task, horizon = 1L, initial.window = 0.5, skip = 0, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") )
resample( learner, task, resampling, measures, weights = NULL, models = FALSE, extract, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) crossval( learner, task, iters = 10L, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) repcv( learner, task, folds = 10L, reps = 10L, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) holdout( learner, task, split = 2/3, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) subsample( learner, task, iters = 30, split = 2/3, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapOOB( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapB632( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) bootstrapB632plus( learner, task, iters = 30, stratify = FALSE, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) growingcv( learner, task, horizon = 1, initial.window = 0.5, skip = 0, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") ) fixedcv( learner, task, horizon = 1L, initial.window = 0.5, skip = 0, measures, models = FALSE, keep.pred = TRUE, ..., show.info = getMlrOption("show.info") )
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleDesc or ResampleInstance) |
measures |
(Measure | list of Measure) |
weights |
(numeric) |
models |
( |
extract |
( |
keep.pred |
( |
... |
(any) |
show.info |
( |
iters |
( |
stratify |
( |
folds |
( |
reps |
( |
split |
( |
horizon |
( |
initial.window |
( |
skip |
( |
If you would like to include results from the training data set, make sure to appropriately adjust the resampling strategy and the aggregation for the measure. See example code below.
Other resample:
ResamplePrediction
,
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
task = makeClassifTask(data = iris, target = "Species") rdesc = makeResampleDesc("CV", iters = 2) r = resample(makeLearner("classif.qda"), task, rdesc) print(r$aggr) print(r$measures.test) print(r$pred) # include the training set performance as well rdesc = makeResampleDesc("CV", iters = 2, predict = "both") r = resample(makeLearner("classif.qda"), task, rdesc, measures = list(mmce, setAggregation(mmce, train.mean))) print(r$aggr)
task = makeClassifTask(data = iris, target = "Species") rdesc = makeResampleDesc("CV", iters = 2) r = resample(makeLearner("classif.qda"), task, rdesc) print(r$aggr) print(r$measures.test) print(r$pred) # include the training set performance as well rdesc = makeResampleDesc("CV", iters = 2, predict = "both") r = resample(makeLearner("classif.qda"), task, rdesc, measures = list(mmce, setAggregation(mmce, train.mean))) print(r$aggr)
Contains predictions from resampling, returned (among other stuff) by function resample.
Can basically be used in the same way as Prediction, its super class.
The main differences are:
(a) The internal data.frame (member data
) contains an additional column iter
, specifying the iteration
of the resampling strategy, and and additional columns set
, specifying whether the prediction
was from an observation in the “train” or “test” set. (b) The prediction time
is
a numeric vector, its length equals the number of iterations.
Other resample:
ResampleResult
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
A container for resample results.
Resample Result:
A resample result is created by resample and contains the following object members:
character(1)
):Name of the Task.
character(1)
):Name of the Learner.
Gives you access to performance measurements on the individual test sets. Rows correspond to sets in resampling iterations, columns to performance measures.
Gives you access to performance measurements on the individual training sets. Rows correspond to sets in resampling iterations, columns to performance measures. Usually not available, only if specifically requested, see general description above.
Named vector of aggregated performance values. Names are coded like
this <measure>.<aggregation>
.
Number of rows equals resampling iterations
and columns are: iter
, train
, predict
.
Stores error messages generated during train or predict, if these were caught
via configureMlr.
List with length equal to number of resampling iterations. Contains lists
of dump.frames
objects that can be fed to debugger()
to inspect
error dumps generated on learner errors. One iteration can generate more than
one error dump depending on which of training, prediction on training set,
or prediction on test set, operations fail. Therefore the lists have named
slots $train
, $predict.train
, or $predict.test
if relevant.
The error dumps are only saved when option on.error.dump
is TRUE
.
Container for all predictions during resampling.
List of fitted models or NULL
.
List of extracted parts from fitted models or NULL
.
numeric(1)
):Time in seconds it took to execute the resampling.
The print method of this object gives a short overview, including task and learner ids, aggregated measures and runtime for the resampling.
Other resample:
ResamplePrediction
,
addRRMeasure()
,
getRRPredictionList()
,
getRRPredictions()
,
getRRTaskDesc()
,
getRRTaskDescription()
,
makeResampleDesc()
,
makeResampleInstance()
,
resample()
Other debug:
FailureModel
,
getPredictionDump()
,
getRRDump()
Wraps an already implemented learning method from R to make it accessible to mlr. Call this method in your constructor. You have to pass an id (name), the required package(s), a description object for all changeable parameters (you do not have to do this for the learner to work, but it is strongly recommended), and use property tags to define features of the learner.
For a general overview on how to integrate a learning algorithm into mlr's system, please read the section in the online tutorial: https://mlr.mlr-org.com/articles/tutorial/create_learner.html
To see all possible properties of a learner, go to: LearnerProperties.
makeRLearner() makeRLearnerClassif( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", class.weights.param = NULL, callees = character(0L) ) makeRLearnerMultilabel( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerRegr( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerSurv( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerCluster( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerCostSens( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) )
makeRLearner() makeRLearnerClassif( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", class.weights.param = NULL, callees = character(0L) ) makeRLearnerMultilabel( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerRegr( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerSurv( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerCluster( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) ) makeRLearnerCostSens( cl, package, par.set, par.vals = list(), properties = character(0L), name = cl, short.name = cl, note = "", callees = character(0L) )
cl |
( |
package |
(character) |
par.set |
(ParamHelpers::ParamSet) |
par.vals |
(list) |
properties |
(character) |
name |
( |
short.name |
( |
note |
( |
class.weights.param |
( |
callees |
(character) |
(RLearner). The specific subclass is one of RLearnerClassif, RLearnerCluster, RLearnerMultilabel, RLearnerRegr, RLearnerSurv.
Optimizes the features for a classification or regression problem by choosing a variable selection wrapper approach. Allows for different optimization methods, such as forward search or a genetic algorithm. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at the subclasses of (FeatSelControl).
All algorithms operate on a 0-1-bit encoding of candidate solutions. Per
default a single bit corresponds to a single feature, but you are able to
change this by using the arguments bit.names
and bits.to.features
. Thus
allowing you to switch on whole groups of features with a single bit.
selectFeatures( learner, task, resampling, measures, bit.names, bits.to.features, control, show.info = getMlrOption("show.info") )
selectFeatures( learner, task, resampling, measures, bit.names, bits.to.features, control, show.info = getMlrOption("show.info") )
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
bit.names |
character |
bits.to.features |
( |
control |
[see FeatSelControl) Control object for search method. Also selects the optimization algorithm for feature selection. |
show.info |
( |
Other featsel:
FeatSelControl
,
analyzeFeatSelResult()
,
getFeatSelResult()
,
makeFeatSelWrapper()
rdesc = makeResampleDesc("Holdout") ctrl = makeFeatSelControlSequential(method = "sfs", maxit = NA) res = selectFeatures("classif.rpart", iris.task, rdesc, control = ctrl) analyzeFeatSelResult(res)
rdesc = makeResampleDesc("Holdout") ctrl = makeFeatSelControlSequential(method = "sfs", maxit = NA) res = selectFeatures("classif.rpart", iris.task, rdesc, control = ctrl) analyzeFeatSelResult(res)
Set how this measure will be aggregated after resampling. To see possible aggregation functions: aggregations.
setAggregation(measure, aggr)
setAggregation(measure, aggr)
measure |
(Measure) |
aggr |
(Aggregation) |
(Measure) with changed aggregation behaviour.
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setMeasurePars()
Set the hyperparameters of a learner object.
setHyperPars(learner, ..., par.vals = list())
setHyperPars(learner, ..., par.vals = list())
learner |
(Learner | |
... |
(any) |
par.vals |
(list) |
If a named (hyper)parameter can't be found for the given learner, the 3 closest (hyper)parameter names will be output in case the user mistyped.
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
cl1 = makeLearner("classif.ksvm", sigma = 1) cl2 = setHyperPars(cl1, sigma = 10, par.vals = list(C = 2)) print(cl1) # note the now set and altered hyperparameters: print(cl2)
cl1 = makeLearner("classif.ksvm", sigma = 1) cl2 = setHyperPars(cl1, sigma = 10, par.vals = list(C = 2)) print(cl1) # note the now set and altered hyperparameters: print(cl2)
Only exported for internal use.
setHyperPars2(learner, par.vals)
setHyperPars2(learner, par.vals)
learner |
(Learner) |
par.vals |
(list) |
Deprecated, use setLearnerId instead.
setId(learner, id)
setId(learner, id)
learner |
(Learner | |
id |
( |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setLearnerId()
,
setPredictThreshold()
,
setPredictType()
Set the ID of the learner.
setLearnerId(learner, id)
setLearnerId(learner, id)
learner |
(Learner | |
id |
( |
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setPredictThreshold()
,
setPredictType()
Sets hyperparameters of measures.
setMeasurePars(measure, ..., par.vals = list())
setMeasurePars(measure, ..., par.vals = list())
measure |
(Measure) |
... |
(any) |
par.vals |
(list) |
Other performance:
ConfusionMatrix
,
calculateConfusionMatrix()
,
calculateROCMeasures()
,
estimateRelativeOverfitting()
,
makeCostMeasure()
,
makeCustomResampledMeasure()
,
makeMeasure()
,
measures
,
performance()
,
setAggregation()
See predict.threshold
in makeLearner and setThreshold.
For complex wrappers only the top-level predict.type
is currently set.
setPredictThreshold(learner, predict.threshold)
setPredictThreshold(learner, predict.threshold)
learner |
(Learner | |
predict.threshold |
(numeric) |
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictType()
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictType()
Possible prediction types are: Classification: Labels or class probabilities (including labels). Regression: Numeric or response or standard errors (including numeric response). Survival: Linear predictor or survival probability.
For complex wrappers the predict type is usually also passed down the encapsulated learner in a recursive fashion.
setPredictType(learner, predict.type)
setPredictType(learner, predict.type)
learner |
(Learner | |
predict.type |
( |
Other predict:
asROCRPrediction()
,
getPredictionProbabilities()
,
getPredictionResponse()
,
getPredictionTaskDesc()
,
predict.WrappedModel()
,
setPredictThreshold()
Other learner:
LearnerProperties
,
getClassWeightParam()
,
getHyperPars()
,
getLearnerId()
,
getLearnerNote()
,
getLearnerPackages()
,
getLearnerParVals()
,
getLearnerParamSet()
,
getLearnerPredictType()
,
getLearnerShortName()
,
getLearnerType()
,
getParamSet()
,
helpLearner()
,
helpLearnerParam()
,
makeLearner()
,
makeLearners()
,
removeHyperPars()
,
setHyperPars()
,
setId()
,
setLearnerId()
,
setPredictThreshold()
Set threshold of prediction object for classification or multilabel classification.
Creates corresponding discrete class response for the newly set threshold.
For binary classification: The positive class is predicted if the probability value exceeds the threshold.
For multiclass: Probabilities are divided by corresponding thresholds and the class with maximum resulting value is selected.
The result of both are equivalent if in the multi-threshold case the values are greater than 0 and sum to 1.
For multilabel classification: A label is predicted (with entry TRUE
) if a probability matrix entry
exceeds the threshold of the corresponding label.
setThreshold(pred, threshold)
setThreshold(pred, threshold)
pred |
(Prediction) |
threshold |
(numeric) |
(Prediction) with changed threshold and corresponding response.
# create task and train learner (LDA) task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda", predict.type = "prob") mod = train(lrn, task) # predict probabilities and compute performance pred = predict(mod, newdata = iris) performance(pred, measures = mmce) head(as.data.frame(pred)) # adjust threshold and predict probabilities again threshold = c(setosa = 0.4, versicolor = 0.3, virginica = 0.3) pred = setThreshold(pred, threshold = threshold) performance(pred, measures = mmce) head(as.data.frame(pred))
# create task and train learner (LDA) task = makeClassifTask(data = iris, target = "Species") lrn = makeLearner("classif.lda", predict.type = "prob") mod = train(lrn, task) # predict probabilities and compute performance pred = predict(mod, newdata = iris) performance(pred, measures = mmce) head(as.data.frame(pred)) # adjust threshold and predict probabilities again threshold = c(setosa = 0.4, versicolor = 0.3, virginica = 0.3) pred = setThreshold(pred, threshold = threshold) performance(pred, measures = mmce) head(as.data.frame(pred))
Clips aggregation names from character vector. E.g: 'mmce.test.mean' becomes 'mmce'. Elements that don't contain a measure name are ignored and returned unchanged.
simplifyMeasureNames(xs)
simplifyMeasureNames(xs)
xs |
(character) |
(character).
In each iteration, samples one minority class element x1, then one of x1's nearest neighbors: x2. Both points are now interpolated / convex-combined, resulting in a new virtual data point x3 for the minority class.
The method handles factor features, too. The gower distance is used for nearest neighbor calculation, see cluster::daisy. For interpolation, the new factor level for x3 is sampled from the two given levels of x1 and x2 per feature.
smote(task, rate, nn = 5L, standardize = TRUE, alt.logic = FALSE)
smote(task, rate, nn = 5L, standardize = TRUE, alt.logic = FALSE)
task |
(Task) |
rate |
( |
nn |
( |
standardize |
( |
alt.logic |
( |
Task.
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, P. (2000) SMOTE: Synthetic Minority Over-sampling TEchnique. In International Conference of Knowledge Based Computer Systems, pp. 46-57. National Center for Software Technology, Mumbai, India, Allied Press.
Other imbalancy:
makeOverBaggingWrapper()
,
makeUndersampleWrapper()
,
oversample()
Contains the task (sonar.task
).
See mlbench::Sonar.
Data set created by Jannes Muenchow, University of Erlangen-Nuremberg, Germany. These data should be cited as Muenchow et al. (2012) (see reference below). This publication also contains additional information on data collection and the geomorphology of the area. The data set provded here is (a subset of) the one from the 'natural' part of the RBSF area and corresponds to landslide distribution in the year 2000.
a data.frame
with point samples of landslide and
non-landslide locations in a study area in the Andes of southern Ecuador.
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of landslides along a humidity gradient in the tropical Andes. Geomorphology, 139-140: 271-284.
Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
See title.
subsetTask(task, subset = NULL, features)
subsetTask(task, subset = NULL, features)
task |
(Task) |
subset |
(integer | logical | |
features |
(character | integer | logical) |
(Task). Task with subsetted data.
Other task:
getTaskClassLevels()
,
getTaskCosts()
,
getTaskData()
,
getTaskDesc()
,
getTaskFeatureNames()
,
getTaskFormula()
,
getTaskId()
,
getTaskNFeats()
,
getTaskSize()
,
getTaskTargetNames()
,
getTaskTargets()
,
getTaskType()
task = makeClassifTask(data = iris, target = "Species") subsetTask(task, subset = 1:100)
task = makeClassifTask(data = iris, target = "Species") subsetTask(task, subset = 1:100)
Summarizes a data.frame, somewhat differently than the normal summary function of R. The function is mainly useful as a basic EDA tool on data.frames before they are converted to tasks, but can be used on tasks as well.
Columns can be of type numeric, integer, logical, factor, or character. Characters and logicals will be treated as factors.
summarizeColumns(obj)
summarizeColumns(obj)
obj |
(data.frame | Task) |
(data.frame). With columns:
name |
Name of column. |
type |
Data type of column. |
na |
Number of NAs in column. |
disp |
Measure of dispersion, for numerics and integers sd is used, for categorical columns the qualitative variation. |
mean |
Mean value of column, NA for categorical columns. |
median |
Median value of column, NA for categorical columns. |
mad |
MAD of column, NA for categorical columns. |
min |
Minimal value of column, for categorical columns the size of the smallest category. |
max |
Maximal value of column, for categorical columns the size of the largest category. |
nlevs |
For categorical columns, the number of factor levels, NA else. |
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeLevels()
summarizeColumns(iris)
summarizeColumns(iris)
Characters and logicals will be treated as factors.
summarizeLevels(obj, cols = NULL)
summarizeLevels(obj, cols = NULL)
obj |
(data.frame | Task) |
cols |
(character) |
(list). Named list of tables.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
removeConstantFeatures()
,
summarizeColumns()
summarizeLevels(iris)
summarizeLevels(iris)
The task encapsulates the data and specifies - through its subclasses - the type of the task. It also contains a description object detailing further aspects of the data.
Useful operators are:
Object members:
environment
)Environment where data for the task are stored. Use getTaskData in order to access it.
See argument. NULL
if not present.
See argument. NULL
if not present.
Encapsulates further information about the task.
Functional data can be added to a task via matrix columns. For more information refer to makeFunctionalData.
id |
( |
data |
(data.frame) |
target |
( |
costs |
(data.frame) |
weights |
(numeric) |
blocking |
(factor) |
positive |
( |
fixup.data |
( |
check.data |
( |
coordinates |
(data.frame) |
Task.
ClassifTask ClusterTask CostSensTask MultilabelTask RegrTask SurvTask
if (requireNamespace("mlbench")) { library(mlbench) data(BostonHousing) data(Ionosphere) makeClassifTask(data = iris, target = "Species") makeRegrTask(data = BostonHousing, target = "medv") # an example of a classification task with more than those standard arguments: blocking = factor(c(rep(1, 51), rep(2, 300))) makeClassifTask(id = "myIonosphere", data = Ionosphere, target = "Class", positive = "good", blocking = blocking) makeClusterTask(data = iris[, -5L]) }
if (requireNamespace("mlbench")) { library(mlbench) data(BostonHousing) data(Ionosphere) makeClassifTask(data = iris, target = "Species") makeRegrTask(data = BostonHousing, target = "medv") # an example of a classification task with more than those standard arguments: blocking = factor(c(rep(1, 51), rep(2, 300))) makeClassifTask(id = "myIonosphere", data = Ionosphere, target = "Class", positive = "good", blocking = blocking) makeClusterTask(data = iris[, -5L]) }
Description object for task, encapsulates basic properties of the task without having to store the complete data set.
Object members:
character(1)
)Id string of task.
character(1)
)Type of task, “classif” for classification, “regr” for regression, “surv” for survival and “cluster” for cluster analysis, “costsens” for cost-sensitive classification, and “multilabel” for multilabel classification.
character(0)
| character(1)
| character(2)
| character(n.classes)
)Name(s) of the target variable(s). For “surv” these are the names of the survival time and event columns, so it has length 2. For “costsens” it has length 0, as there is no target column, but a cost matrix instead. For “multilabel” these are the names of logical columns that indicate whether a class label is present and the number of target variables corresponds to the number of classes.
integer(1)
)Number of cases in data set.
integer(2)
)Number of features, named vector with entries: “numerics”, “factors”, “ordered”, “functionals”.
logical(1)
)Are missing values present?
logical(1)
)Are weights specified for each observation?
logical(1)
)Is a blocking factor for cases available in the task?
All possible classes. Only present for “classif”, “costsens”, and “multilabel”.
character(1)
)Positive class label for binary classification. Only present for “classif”, NA for multiclass.
character(1)
)Negative class label for binary classification. Only present for “classif”, NA for multiclass.
Given a Task, creates a model for the learning machine which can be used for predictions on new data.
train(learner, task, subset = NULL, weights = NULL)
train(learner, task, subset = NULL, weights = NULL)
learner |
(Learner | |
task |
(Task) |
subset |
(integer | logical | |
weights |
(numeric) |
(WrappedModel).
training.set = sample(seq_len(nrow(iris)), nrow(iris) / 2) ## use linear discriminant analysis to classify iris data task = makeClassifTask(data = iris, target = "Species") learner = makeLearner("classif.lda", method = "mle") mod = train(learner, task, subset = training.set) print(mod) ## use random forest to classify iris data task = makeClassifTask(data = iris, target = "Species") learner = makeLearner("classif.rpart", minsplit = 7, predict.type = "prob") mod = train(learner, task, subset = training.set) print(mod)
training.set = sample(seq_len(nrow(iris)), nrow(iris) / 2) ## use linear discriminant analysis to classify iris data task = makeClassifTask(data = iris, target = "Species") learner = makeLearner("classif.lda", method = "mle") mod = train(learner, task, subset = training.set) print(mod) ## use random forest to classify iris data task = makeClassifTask(data = iris, target = "Species") learner = makeLearner("classif.rpart", minsplit = 7, predict.type = "prob") mod = train(learner, task, subset = training.set) print(mod)
Mainly for internal use. Trains a wrapped learner on a given training set. You have to implement this method if you want to add another learner to this package.
trainLearner(.learner, .task, .subset, .weights = NULL, ...)
trainLearner(.learner, .task, .subset, .weights = NULL, ...)
.learner |
(RLearner) |
.task |
(Task) |
.subset |
(integer) |
.weights |
(numeric) |
... |
(any) |
Your implementation must adhere to the following:
The model must be fitted on the subset of .task
given by .subset
. All parameters
in ...
must be passed to the underlying training function.
(any). Model of the underlying learner.
General tune control object.
same.resampling.instance |
( |
impute.val |
(numeric) |
start |
(list) |
tune.threshold |
( |
tune.threshold.args |
(list) |
log.fun |
( |
final.dw.perc |
( |
... |
(any) |
Other tune:
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
,
tuneThreshold()
The following tuners are available:
Grid search. All kinds of parameter types can be handled.
You can either use their correct param type and resolution
,
or discretize them yourself by always using ParamHelpers::makeDiscreteParam
in the par.set
passed to tuneParams.
Random search. All kinds of parameter types can be handled.
Evolutionary method mco::nsga2. Can handle numeric(vector) and integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric values are automatically rounded.
Model-based/ Bayesian optimization. All kinds of parameter types can be handled.
makeTuneMultiCritControlGrid( same.resampling.instance = TRUE, resolution = 10L, log.fun = "default", final.dw.perc = NULL, budget = NULL ) makeTuneMultiCritControlMBO( n.objectives = mbo.control$n.objectives, same.resampling.instance = TRUE, impute.val = NULL, learner = NULL, mbo.control = NULL, tune.threshold = FALSE, tune.threshold.args = list(), continue = FALSE, log.fun = "default", final.dw.perc = NULL, budget = NULL, mbo.design = NULL ) makeTuneMultiCritControlNSGA2( same.resampling.instance = TRUE, impute.val = NULL, log.fun = "default", final.dw.perc = NULL, budget = NULL, ... ) makeTuneMultiCritControlRandom( same.resampling.instance = TRUE, maxit = 100L, log.fun = "default", final.dw.perc = NULL, budget = NULL )
makeTuneMultiCritControlGrid( same.resampling.instance = TRUE, resolution = 10L, log.fun = "default", final.dw.perc = NULL, budget = NULL ) makeTuneMultiCritControlMBO( n.objectives = mbo.control$n.objectives, same.resampling.instance = TRUE, impute.val = NULL, learner = NULL, mbo.control = NULL, tune.threshold = FALSE, tune.threshold.args = list(), continue = FALSE, log.fun = "default", final.dw.perc = NULL, budget = NULL, mbo.design = NULL ) makeTuneMultiCritControlNSGA2( same.resampling.instance = TRUE, impute.val = NULL, log.fun = "default", final.dw.perc = NULL, budget = NULL, ... ) makeTuneMultiCritControlRandom( same.resampling.instance = TRUE, maxit = 100L, log.fun = "default", final.dw.perc = NULL, budget = NULL )
same.resampling.instance |
( |
resolution |
(integer) |
log.fun |
( |
final.dw.perc |
( |
budget |
( |
n.objectives |
( |
impute.val |
(numeric) |
learner |
(Learner | |
mbo.control |
(mlrMBO::MBOControl | |
tune.threshold |
( |
tune.threshold.args |
(list) |
continue |
( |
mbo.design |
(data.frame | |
... |
(any) |
maxit |
( |
(TuneMultiCritControl). The specific subclass is one of TuneMultiCritControlGrid, TuneMultiCritControlRandom, TuneMultiCritControlNSGA2, TuneMultiCritControlMBO.
Other tune_multicrit:
plotTuneMultiCritResult()
,
tuneParamsMultiCrit()
Container for results of hyperparameter tuning. Contains the obtained pareto set and front and the optimization path which lead there.
Object members:
Learner that was optimized.
Control object from tuning.
List of lists of non-dominated hyperparameter settings in pareto set.
Note that when you have trafos on some of your params, x
will always be
on the TRANSFORMED scale so you directly use it.
Pareto front for x
.
Currently NULL
.
Optimization path which lead to x
.
Note that when you have trafos on some of your params, the opt.path always contains the
UNTRANSFORMED values on the original scale. You can simply call trafoOptPath(opt.path)
to
transform them, or, as.data.frame{trafoOptPath(opt.path)}
integer(n)
)Indices of Pareto optimal params in opt.path
.
Performance measures.
Optimizes the hyperparameters of a learner. Allows for different optimization methods, such as grid search, evolutionary strategies, iterated F-race, etc. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at TuneControl.
Multi-criteria tuning can be done with tuneParamsMultiCrit.
tuneParams( learner, task, resampling, measures, par.set, control, show.info = getMlrOption("show.info"), resample.fun = resample )
tuneParams( learner, task, resampling, measures, par.set, control, show.info = getMlrOption("show.info"), resample.fun = resample )
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
(list of Measure | Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneControl) |
show.info |
( |
resample.fun |
(closure) |
(TuneResult).
If you would like to include results from the training data set, make sure to appropriately adjust the resampling strategy and the aggregation for the measure. See example code below.
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneThreshold()
set.seed(123) # a grid search for an SVM (with a tiny number of points...) # note how easily we can optimize on a log-scale ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneControlGrid(resolution = 2L) rdesc = makeResampleDesc("CV", iters = 2L) res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl) print(res) # access data for all evaluated points df = as.data.frame(res$opt.path) df1 = as.data.frame(res$opt.path, trafo = TRUE) print(head(df[, -ncol(df)])) print(head(df1[, -ncol(df)])) # access data for all evaluated points - alternative df2 = generateHyperParsEffectData(res) df3 = generateHyperParsEffectData(res, trafo = TRUE) print(head(df2$data[, -ncol(df2$data)])) print(head(df3$data[, -ncol(df3$data)])) ## Not run: # we optimize the SVM over 3 kernels simultanously # note how we use dependent params (requires = ...) and iterated F-racing here ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeDiscreteParam("kernel", values = c("vanilladot", "polydot", "rbfdot")), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x, requires = quote(kernel == "rbfdot")), makeIntegerParam("degree", lower = 2L, upper = 5L, requires = quote(kernel == "polydot")) ) print(ps) ctrl = makeTuneControlIrace(maxExperiments = 5, nbIterations = 1, minNbSurvival = 1) rdesc = makeResampleDesc("Holdout") res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl) print(res) df = as.data.frame(res$opt.path) print(head(df[, -ncol(df)])) # include the training set performance as well rdesc = makeResampleDesc("Holdout", predict = "both") res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl, measures = list(mmce, setAggregation(mmce, train.mean))) print(res) df2 = as.data.frame(res$opt.path) print(head(df2[, -ncol(df2)])) ## End(Not run)
set.seed(123) # a grid search for an SVM (with a tiny number of points...) # note how easily we can optimize on a log-scale ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneControlGrid(resolution = 2L) rdesc = makeResampleDesc("CV", iters = 2L) res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl) print(res) # access data for all evaluated points df = as.data.frame(res$opt.path) df1 = as.data.frame(res$opt.path, trafo = TRUE) print(head(df[, -ncol(df)])) print(head(df1[, -ncol(df)])) # access data for all evaluated points - alternative df2 = generateHyperParsEffectData(res) df3 = generateHyperParsEffectData(res, trafo = TRUE) print(head(df2$data[, -ncol(df2$data)])) print(head(df3$data[, -ncol(df3$data)])) ## Not run: # we optimize the SVM over 3 kernels simultanously # note how we use dependent params (requires = ...) and iterated F-racing here ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeDiscreteParam("kernel", values = c("vanilladot", "polydot", "rbfdot")), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x, requires = quote(kernel == "rbfdot")), makeIntegerParam("degree", lower = 2L, upper = 5L, requires = quote(kernel == "polydot")) ) print(ps) ctrl = makeTuneControlIrace(maxExperiments = 5, nbIterations = 1, minNbSurvival = 1) rdesc = makeResampleDesc("Holdout") res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl) print(res) df = as.data.frame(res$opt.path) print(head(df[, -ncol(df)])) # include the training set performance as well rdesc = makeResampleDesc("Holdout", predict = "both") res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl, measures = list(mmce, setAggregation(mmce, train.mean))) print(res) df2 = as.data.frame(res$opt.path) print(head(df2[, -ncol(df2)])) ## End(Not run)
Optimizes the hyperparameters of a learner in a multi-criteria fashion. Allows for different optimization methods, such as grid search, evolutionary strategies, etc. You can select such an algorithm (and its settings) by passing a corresponding control object. For a complete list of implemented algorithms look at TuneMultiCritControl.
tuneParamsMultiCrit( learner, task, resampling, measures, par.set, control, show.info = getMlrOption("show.info"), resample.fun = resample )
tuneParamsMultiCrit( learner, task, resampling, measures, par.set, control, show.info = getMlrOption("show.info"), resample.fun = resample )
learner |
(Learner | |
task |
(Task) |
resampling |
(ResampleInstance | ResampleDesc) |
measures |
[list of Measure) |
par.set |
(ParamHelpers::ParamSet) |
control |
(TuneMultiCritControl) |
show.info |
( |
resample.fun |
(closure) |
Other tune_multicrit:
TuneMultiCritControl
,
plotTuneMultiCritResult()
# multi-criteria optimization of (tpr, fpr) with NGSA-II lrn = makeLearner("classif.ksvm") rdesc = makeResampleDesc("Holdout") ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneMultiCritControlNSGA2(popsize = 4L, generations = 1L) res = tuneParamsMultiCrit(lrn, sonar.task, rdesc, par.set = ps, measures = list(tpr, fpr), control = ctrl) plotTuneMultiCritResult(res, path = TRUE)
# multi-criteria optimization of (tpr, fpr) with NGSA-II lrn = makeLearner("classif.ksvm") rdesc = makeResampleDesc("Holdout") ps = makeParamSet( makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x), makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x) ) ctrl = makeTuneMultiCritControlNSGA2(popsize = 4L, generations = 1L) res = tuneParamsMultiCrit(lrn, sonar.task, rdesc, par.set = ps, measures = list(tpr, fpr), control = ctrl) plotTuneMultiCritResult(res, path = TRUE)
Container for results of hyperparameter tuning. Contains the obtained point in search space, its performance values and the optimization path which lead there.
Object members:
Learner that was optimized.
Control object from tuning.
Named list of hyperparameter values identified as optimal.
Note that when you have trafos on some of your params, x
will always be
on the TRANSFORMED scale so you directly use it.
Performance values for optimal x
.
Vector of finally found and used thresholds
if tune.threshold
was enabled in TuneControl, otherwise not present and
hence NULL
.
Optimization path which lead to x
.
Note that when you have trafos on some of your params, the opt.path always contains the
UNTRANSFORMED values on the original scale. You can simply call trafoOptPath(opt.path)
to
transform them, or, as.data.frame{trafoOptPath(opt.path)}
.
If mlr option on.error.dump
is TRUE
, OptPath
will have a .dump
object
in its extra
column which contains error dump traces from failed optimization evaluations.
It can be accessed by getOptPathEl(opt.path)$extra$.dump
.
Optimizes the threshold of predictions based on probabilities. Works for classification and multilabel tasks. Uses BBmisc::optimizeSubInts for normal binary class problems and GenSA::GenSA for multiclass and multilabel problems.
tuneThreshold(pred, measure, task, model, nsub = 20L, control = list())
tuneThreshold(pred, measure, task, model, nsub = 20L, control = list())
pred |
(Prediction) |
measure |
(Measure) |
task |
(Task) |
model |
(WrappedModel) |
nsub |
( |
control |
(list) |
(list). A named list with with the following components:
th
is the optimal threshold, perf
the performance value.
Other tune:
TuneControl
,
getNestedTuneResultsOptPathDf()
,
getNestedTuneResultsX()
,
getResamplingIndices()
,
getTuneResult()
,
makeModelMultiplexer()
,
makeModelMultiplexerParamSet()
,
makeTuneControlCMAES()
,
makeTuneControlDesign()
,
makeTuneControlGenSA()
,
makeTuneControlGrid()
,
makeTuneControlIrace()
,
makeTuneControlMBO()
,
makeTuneControlRandom()
,
makeTuneWrapper()
,
tuneParams()
Contains the task (wpbc.task
).
See TH.data::wpbc. Incomplete cases have been removed from the task.
Contains the task (yeast.task
).
https://archive.ics.uci.edu/ml/datasets/Yeast (In long instead of wide format)
Elisseeff, A., & Weston, J. (2001): A kernel method for multi-labelled classification. In Advances in neural information processing systems (pp. 681-687).