Package MLR': June 12, 2024
Package MLR': June 12, 2024
BugReports https://github.com/mlr-org/mlr/issues
Depends ParamHelpers (>= 1.10), R (>= 3.0.2)
Imports backports (>= 1.1.0), BBmisc (>= 1.11), checkmate (>= 1.8.2),
data.table (>= 1.12.4), ggplot2, methods, parallelMap (>= 1.3),
stats, stringi, survival, utils, XML
Suggests ada, adabag, batchtools, bit64, brnn, bst, C50, care, caret
(>= 6.0-57), class, clue, cluster, ClusterR, clusterSim (>=
0.44-5), cmaes, cowplot, crs, Cubist, deepnet, DiceKriging,
e1071, earth, elasticnet, emoa, evtree, fda.usc, FDboost, FNN,
forecast (>= 8.3), fpc, frbs, FSelector, FSelectorRcpp (>=
0.3.5), gbm, GenSA, ggpubr, glmnet, GPfit, h2o (>= 3.6.0.8),
Hmisc, irace (>= 2.0), kernlab, kknn, klaR, knitr, laGP,
LiblineaR, lintr (>= 1.0.0.9001), MASS, mboost, mco, mda,
memoise, mlbench, mldr, mlrMBO, modeltools, mRMRe, neuralnet,
nnet, numDeriv, pamr, pander, party, pec, penalized (>=
0.9-47), pls, PMCMRplus, praznik (>= 5.0.0), randomForest,
ranger (>= 0.8.0), rappdirs, refund, rex, rFerns, rgenoud,
rmarkdown, Rmpi, ROCR, rotationForest, rpart, RRF, rsm, RSNNS,
rucrdtw, RWeka, sda, sf, smoof, sparseLDA, stepPlr, survAUC,
1
2
Contents
mlr-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
addRRMeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
aggregations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
agri.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
analyzeFeatSelResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
asROCRPrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
batchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
bc.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
BenchmarkResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
bh.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
cache_helpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
calculateConfusionMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
calculateROCMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
capLargeValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
configureMlr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ConfusionMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
convertBMRToRankMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
convertMLBenchObjToTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
costiris.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
createDummyFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
createSpatialResamplingPlots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
downsample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
dropFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
estimateRelativeOverfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
estimateResidualVariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
extractFDABsignal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
extractFDADTWKernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
extractFDAFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
extractFDAFourier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
extractFDAFPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
extractFDAMultiResFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
extractFDATsfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
extractFDAWavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
FailureModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
FeatSelControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
FeatSelResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
filterFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
friedmanPostHocTestBMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Contents
friedmanTestBMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
fuelsubset.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
generateCalibrationData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
generateCritDifferencesData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
generateFeatureImportanceData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
generateFilterValuesData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
generateHyperParsEffectData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
generateLearningCurveData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
generatePartialDependenceData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
generateThreshVsPerfData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
getBMRAggrPerformances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
getBMRFeatSelResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
getBMRFilteredFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
getBMRLearnerIds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
getBMRLearners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
getBMRLearnerShortNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
getBMRMeasureIds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
getBMRMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
getBMRModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
getBMRPerformances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
getBMRPredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
getBMRTaskDescriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
getBMRTaskDescs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
getBMRTaskIds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
getBMRTuneResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
getCaretParamSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
getClassWeightParam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
getConfMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
getDefaultMeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
getFailureModelDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
getFailureModelMsg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
getFeatSelResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
getFeatureImportance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
getFilteredFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
getFunctionalFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
getHomogeneousEnsembleModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
getHyperPars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
getLearnerId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
getLearnerModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
getLearnerNote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
getLearnerPackages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
getLearnerParamSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
getLearnerParVals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
getLearnerPredictType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
getLearnerShortName . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
getLearnerType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
getMlrOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
getMultilabelBinaryPerformances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Contents 5
getNestedTuneResultsOptPathDf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
getNestedTuneResultsX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
getOOBPreds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
getParamSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
getPredictionDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
getPredictionProbabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
getPredictionResponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
getPredictionTaskDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
getProbabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
getResamplingIndices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
getRRDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
getRRPredictionList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
getRRPredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
getRRTaskDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
getRRTaskDescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
getStackedBaseLearnerPredictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
getTaskClassLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
getTaskCosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
getTaskData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
getTaskDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
getTaskDescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
getTaskFeatureNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
getTaskFormula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
getTaskId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
getTaskNFeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
getTaskSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
getTaskTargetNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
getTaskTargets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
getTaskType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
getTuneResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
getTuneResultOptPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
gunpoint.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
hasFunctionalFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
hasProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
helpLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
helpLearnerParam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
imputations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
impute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
iris.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
isFailureModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
joinClassLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
learnerArgsToControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
LearnerProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
listFilterEnsembleMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
listFilterMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
listLearnerProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
listLearners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6 Contents
listMeasureProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
listMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
listTaskTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
lung.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
makeAggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
makeBaggingWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
makeClassificationViaRegressionWrapper . . . . . . . . . . . . . . . . . . . . . . . . . 135
makeClassifTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
makeClusterTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
makeConstantClassWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
makeCostMeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
makeCostSensClassifWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
makeCostSensRegrWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
makeCostSensTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
makeCostSensWeightedPairsWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
makeCustomResampledMeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
makeDownsampleWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
makeDummyFeaturesWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
makeExtractFDAFeatMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
makeExtractFDAFeatsWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
makeFeatSelWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
makeFilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
makeFilterEnsemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
makeFilterWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
makeFixedHoldoutInstance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
makeFunctionalData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
makeImputeMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
makeImputeWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
makeLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
makeLearners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
makeMeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
makeModelMultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
makeModelMultiplexerParamSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
makeMulticlassWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
makeMultilabelBinaryRelevanceWrapper . . . . . . . . . . . . . . . . . . . . . . . . . 171
makeMultilabelClassifierChainsWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . 172
makeMultilabelDBRWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
makeMultilabelNestedStackingWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . 175
makeMultilabelStackingWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
makeMultilabelTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
makeOverBaggingWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
makePreprocWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
makePreprocWrapperCaret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
makeRegrTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
makeRemoveConstantFeaturesWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . 184
makeResampleDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
makeResampleInstance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
makeRLearner.classif.fdausc.glm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Contents 7
makeRLearner.classif.fdausc.kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
makeRLearner.classif.fdausc.np . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
makeSMOTEWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
makeStackedLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
makeSurvTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
makeTuneControlCMAES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
makeTuneControlDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
makeTuneControlGenSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
makeTuneControlGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
makeTuneControlIrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
makeTuneControlMBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
makeTuneControlRandom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
makeTuneWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
makeUndersampleWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
makeWeightedClassesWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
makeWrappedModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
MeasureProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
mergeBenchmarkResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
mergeSmallFactorLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
mlrFamilies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
mtcars.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
normalizeFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
oversample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
phoneme.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
pid.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
plotBMRBoxplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
plotBMRRanksAsBarChart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
plotBMRSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
plotCalibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
plotCritDifferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
plotFilterValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
plotHyperParsEffect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
plotLearnerPrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
plotLearningCurve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
plotPartialDependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
plotResiduals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
plotROCCurves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
plotThreshVsPerf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
plotTuneMultiCritResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
predict.WrappedModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
predictLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
reduceBatchmarkResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
reextractFDAFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
reimpute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
removeConstantFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
8 mlr-package
removeHyperPars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
resample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
ResamplePrediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
ResampleResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
RLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
selectFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
setAggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
setHyperPars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
setHyperPars2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
setId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
setLearnerId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
setMeasurePars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
setPredictThreshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
setPredictType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
setThreshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
simplifyMeasureNames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
smote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
sonar.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
spam.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
spatial.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
subsetTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
summarizeColumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
summarizeLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
TaskDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
train . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
trainLearner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
TuneControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
TuneMultiCritControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
TuneMultiCritResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
tuneParams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
tuneParamsMultiCrit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
TuneResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
tuneThreshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
wpbc.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
yeast.task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Index 291
Description
Interface to a large number of classification and regression techniques, including machine-readable
parameter descriptions. There is also an experimental extension for survival analysis, clustering and
general, example-specific cost-sensitive learning. Generic resampling, including cross-validation,
bootstrapping and subsampling. Hyperparameter tuning with modern optimization techniques, for
single- and multi-objective problems. Filter and wrapper methods for feature selection. Extension
of basic learners with additional operations common in machine learning, also allowing for easy
nested resampling. Most operations can be parallelized.
Author(s)
Maintainer: Martin Binder <mlr.developer@mb706.com>
Authors:
• Bernd Bischl <bernd_bischl@gmx.net> (ORCID)
• Michel Lang <michellang@gmail.com> (ORCID)
• Lars Kotthoff <larsko@uwyo.edu>
• Patrick Schratz <patrick.schratz@gmail.com> (ORCID)
• Julia Schiffner <schiffner@math.uni-duesseldorf.de>
• Jakob Richter <code@jakob-r.de>
• Zachary Jones <zmj@zmjones.com>
• Giuseppe Casalicchio <giuseppe.casalicchio@stat.uni-muenchen.de> (ORCID)
• Mason Gallo <masonagallo@gmail.com>
Other contributors:
• Jakob Bossek <jakob.bossek@tu-dortmund.de> (ORCID) [contributor]
• Erich Studerus <erich.studerus@upkbs.ch> (ORCID) [contributor]
• Leonard Judt <leonard.judt@tu-dortmund.de> [contributor]
• Tobias Kuehn <tobi.kuehn@gmx.de> [contributor]
• Pascal Kerschke <kerschke@uni-muenster.de> (ORCID) [contributor]
• Florian Fendt <flo_fendt@gmx.de> [contributor]
• Philipp Probst <philipp_probst@gmx.de> (ORCID) [contributor]
• Xudong Sun <xudong.sun@stat.uni-muenchen.de> (ORCID) [contributor]
• Janek Thomas <janek.thomas@stat.uni-muenchen.de> (ORCID) [contributor]
• Bruno Vieira <bruno.hebling.vieira@usp.br> [contributor]
• Laura Beggel <laura.beggel@web.de> (ORCID) [contributor]
• Quay Au <quay.au@stat.uni-muenchen.de> (ORCID) [contributor]
• Florian Pfisterer <pfistererf@googlemail.com> [contributor]
• Stefan Coors <stefan.coors@gmx.net> [contributor]
• Steve Bronder <sab2287@columbia.edu> [contributor]
• Alexander Engelhardt <alexander.w.engelhardt@gmail.com> [contributor]
• Christoph Molnar <christoph.molnar@stat.uni-muenchen.de> [contributor]
• Annette Spooner <a.spooner@unsw.edu.au> [contributor]
10 addRRMeasure
See Also
Useful links:
• https://mlr.mlr-org.com
• https://github.com/mlr-org/mlr
• Report bugs at https://github.com/mlr-org/mlr/issues
Description
Usage
addRRMeasure(res, measures)
Arguments
res (ResampleResult)
The result of resample run with keep.pred = TRUE.
measures (Measure | list of Measure)
Performance measure(s) to evaluate. Default is the default measure for the task,
see here getDefaultMeasure.
Value
(ResampleResult).
See Also
Description
An aggregation method reduces the performance values of the test (and possibly the training sets)
to a single value. To see all possible implemented aggregations look at aggregations.
The aggregation can access all relevant information of the result after resampling and combine
them into a single value. Though usually something very simple like taking the mean of the test set
performances is done.
Object members:
id (character(1)) Name of the aggregation method.
name (character(1)) Long name of the aggregation method.
properties (character) Properties of the aggregation.
fun (‘function(task, perf.test, perf.train, measure, group, pred) )] Aggregation function.
See Also
makeAggregation
Description
test.mean Mean of performance values on test sets.
test.sd Standard deviation of performance values on test sets.
test.median Median of performance values on test sets.
test.min Minimum of performance values on test sets.
test.max Maximum of performance values on test sets.
test.sum Sum of performance values on test sets.
train.mean Mean of performance values on training sets.
train.sd Standard deviation of performance values on training sets.
train.median Median of performance values on training sets.
train.min Minimum of performance values on training sets.
train.max Maximum of performance values on training sets.
train.sum Sum of performance values on training sets.
b632 Aggregation for B632 bootstrap.
b632plus Aggregation for B632+ bootstrap.
12 analyzeFeatSelResult
testgroup.mean Performance values on test sets are grouped according to resampling method. The
mean for every group is calculated, then the mean of those means. Mainly used for repeated
CV.
testgroup.sd Similar to testgroup.mean - after the mean for every group is calculated, the standard
deviation of those means is obtained. Mainly used for repeated CV.
test.join Performance measure on joined test sets. This is especially useful for small sample sizes
where unbalanced group sizes have a significant impact on the aggregation, especially for
cross-validation test.join might make sense now. For the repeated CV, the performance is
calculated on each repetition and then aggregated with the arithmetic mean.
See Also
Aggregation
Description
Contains the task (agri.task).
References
See cluster::agriculture.
Description
This function prints the steps selectFeatures took to find its optimal set of features and the reason
why it stopped. It can also print information about all calculations done in each intermediate step.
Currently only implemented for sequential feature selection.
Usage
analyzeFeatSelResult(res, reduce = TRUE)
Arguments
res (FeatSelResult)
The result of of selectFeatures.
reduce (logical(1))
Per iteration: Print only the selected feature (or all features that were evaluated)?
Default is TRUE.
asROCRPrediction 13
Value
(invisible(NULL)).
See Also
Other featsel: FeatSelControl, getFeatSelResult(), makeFeatSelWrapper(), selectFeatures()
Description
Converts predictions to a format package ROCR can handle.
Usage
asROCRPrediction(pred)
Arguments
pred (Prediction)
Prediction object.
See Also
Other roc: calculateROCMeasures()
Other predict: getPredictionProbabilities(), getPredictionResponse(), getPredictionTaskDesc(),
predict.WrappedModel(), setPredictThreshold(), setPredictType()
Description
This function is a very parallel version of benchmark using batchtools. Experiments are created in
the provided registry for each combination of learners, tasks and resamplings. The experiments are
then stored in a registry and the runs can be started via batchtools::submitJobs. A job is one train/test
split of the outer resampling. In case of nested resampling (e.g. with makeTuneWrapper), each job
is a full run of inner resampling, which can be parallelized in a second step with ParallelMap.
For details on the usage and support backends have a look at the batchtools tutorial page: https:
//github.com/mllg/batchtools.
The general workflow with batchmark looks like this:
2. Call batchmark(...) which defines jobs for all learners and tasks in an base::expand.grid
fashion.
3. Submit jobs using batchtools::submitJobs.
4. Babysit the computation, wait for all jobs to finish using batchtools::waitForJobs.
5. Call reduceBatchmarkResult() to reduce results into a BenchmarkResult.
If you want to use this with OpenML datasets you can generate tasks from a vector of dataset IDs
easily with tasks = lapply(data.ids, function(x) convertOMLDataSetToMlr(getOMLDataSet(x))).
Usage
batchmark(
learners,
tasks,
resamplings,
measures,
keep.pred = TRUE,
keep.extract = FALSE,
models = FALSE,
reg = batchtools::getDefaultRegistry()
)
Arguments
learners (list of Learner | character)
Learning algorithms which should be compared, can also be a single learner. If
you pass strings the learners will be created via makeLearner.
tasks list of Task
Tasks that learners should be run on.
resamplings [(list of) ResampleDesc)
Resampling strategy for each tasks. If only one is provided, it will be replicated
to match the number of tasks. If missing, a 10-fold cross validation is used.
measures (list of Measure)
Performance measures for all tasks. If missing, the default measure of the first
task is used.
keep.pred (logical(1))
Keep the prediction data in the pred slot of the result object. If you do many ex-
periments (on larger data sets) these objects might unnecessarily increase object
size / mem usage, if you do not really need them. The default is set to TRUE.
keep.extract (logical(1))
Keep the extract slot of the result object. When creating a lot of benchmark
results with extensive tuning, the resulting R objects can become very large in
size. That is why the tuning results stored in the extract slot are removed by
default (keep.extract = FALSE). Note that when keep.extract = FALSE you
will not be able to conduct analysis in the tuning results.
models (logical(1))
Should all fitted models be stored in the ResampleResult? Default is FALSE.
bc.task 15
reg (batchtools::Registry)
Registry, created by batchtools::makeExperimentRegistry. If not explicitly passed,
uses the last created registry.
Value
See Also
Description
References
See mlbench::BreastCancer. The column "Id" and all incomplete cases have been removed from
the task.
Description
Complete benchmark experiment to compare different learning algorithms across one or more tasks
w.r.t. a given resampling strategy. Experiments are paired, meaning always the same training / test
sets are used for the different learners. Furthermore, you can of course pass “enhanced” learners
via wrappers, e.g., a learner can be automatically tuned using makeTuneWrapper.
16 benchmark
Usage
benchmark(
learners,
tasks,
resamplings,
measures,
keep.pred = TRUE,
keep.extract = FALSE,
models = FALSE,
show.info = getMlrOption("show.info")
)
Arguments
Value
BenchmarkResult.
BenchmarkResult 17
See Also
Other benchmark: BenchmarkResult, batchmark(), convertBMRToRankMatrix(), friedmanPostHocTestBMR(),
friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(), getBMRFeatSelResults(),
getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(), getBMRLearners(),
getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Examples
Description
Result of a benchmark experiment conducted by benchmark with the following members:
results (list of ResampleResult): A nested list of resample results, first ordered by task id, then by
learner id.
measures (list of Measure): The performance measures used in the benchmark experiment.
learners (list of Learner): The learning algorithms compared in the benchmark experiment.
The print method of this object shows aggregated performance values for all tasks and learners.
It is recommended to retrieve required information via the getBMR* getter functions. You can also
convert the object using as.data.frame.
18 cache_helpers
See Also
Description
References
See mlbench::BostonHousing.
Description
Usage
getCacheDir()
deleteCacheDir()
Details
calculateConfusionMatrix
Confusion matrix.
Description
Calculates the confusion matrix for a (possibly resampled) prediction. Rows indicate true classes,
columns predicted classes. The marginal elements count the number of classification errors for the
respective row or column, i.e., the number of errors when you condition on the corresponding true
(rows) or predicted (columns) class. The last bottom right element displays the total amount of
errors.
A list is returned that contains multiple matrices. If relative = TRUE we compute three matrices,
one with absolute values and two with relative. The relative confusion matrices are normalized
based on rows and columns respectively, if FALSE we only compute the absolute value matrix.
The print function returns the relative matrices in a compact way so that both row and column
marginals can be seen in one matrix. For details see ConfusionMatrix.
Note that for resampling no further aggregation is currently performed. All predictions on all test
sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated
vs. y, as if both were computed on a single test set. This probably mainly makes sense when
cross-validation is used for resampling.
Usage
calculateConfusionMatrix(pred, relative = FALSE, sums = FALSE, set = "both")
Arguments
pred (Prediction)
Prediction object.
relative (logical(1))
If TRUE two additional matrices are calculated. One is normalized by rows and
one by columns.
sums (logical(1))
If TRUE add absolute number of observations in each group.
set (character(1))
Specifies which part(s) of the data are used for the calculation. If set equals
train or test, the pred object must be the result of a resampling, otherwise an
error is thrown. Defaults to “both”. Possible values are “train”, “test”, or “both”.
x (ConfusionMatrix)
Object to print.
both (logical(1))
If TRUE both the absolute and relative confusion matrices are printed.
20 calculateROCMeasures
digits (integer(1))
How many numbers after the decimal point should be printed, only relevant for
relative confusion matrices.
... (any)
Currently not used.
Value
(ConfusionMatrix).
Functions
• print(ConfusionMatrix):
See Also
Other performance: ConfusionMatrix, calculateROCMeasures(), estimateRelativeOverfitting(),
makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(), measures, performance(),
setAggregation(), setMeasurePars()
Examples
# get confusion matrix after simple manual prediction
allinds = 1:150
train = sample(allinds, 75)
test = setdiff(allinds, train)
mod = train("classif.lda", iris.task, subset = train)
pred = predict(mod, iris.task, subset = test)
print(calculateConfusionMatrix(pred))
print(calculateConfusionMatrix(pred, sums = TRUE))
print(calculateConfusionMatrix(pred, relative = TRUE))
Description
Calculate the absolute number of correct/incorrect classifications and the following evaluation mea-
sures:
• tpr True positive rate (Sensitivity, Recall)
• fpr False positive rate (Fall-out)
• fnr False negative rate (Miss rate)
• tnr True negative rate (Specificity)
calculateROCMeasures 21
For details on the used measures see measures and also https://en.wikipedia.org/wiki/Receiver_
operating_characteristic.
The element for the false omission rate in the resulting object is not called for but fomr since for
should never be used as a variable name in an object.
Usage
calculateROCMeasures(pred)
Arguments
pred (Prediction)
Prediction object.
x (ROCMeasures)
Created by calculateROCMeasures.
abbreviations (logical(1))
If TRUE a short paragraph with explanations of the used measures is printed
additionally.
digits (integer(1))
Number of digits the measures are rounded to.
... (any)
Currently not used.
Value
(ROCMeasures). A list containing two elements confusion.matrix which is the 2 times 2 confu-
sion matrix of absolute frequencies and measures, a list of the above mentioned measures.
Functions
• print(ROCMeasures):
22 capLargeValues
See Also
Other roc: asROCRPrediction()
Other performance: ConfusionMatrix, calculateConfusionMatrix(), estimateRelativeOverfitting(),
makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(), measures, performance(),
setAggregation(), setMeasurePars()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, sonar.task)
pred = predict(fit, task = sonar.task)
calculateROCMeasures(pred)
Description
Convert numeric entries which large/infinite (absolute) values in a data.frame or task. Only nu-
meric/integer columns are affected.
Usage
capLargeValues(
obj,
target = character(0L),
cols = NULL,
threshold = Inf,
impute = threshold,
what = "abs"
)
Arguments
obj (data.frame | Task)
Input data.
target (character)
Name of the column(s) specifying the response. Target columns will not be
capped. Default is character(0).
cols (character)
Which columns to convert. Default is all numeric columns.
threshold (numeric(1))
Threshold for capping. Every entry whose absolute value is equal or larger is
converted. Default is Inf.
configureMlr 23
impute (numeric(1))
Replacement value for large entries. Large negative entries are converted to
-impute. Default is threshold.
what (character(1))
What kind of entries are affected? “abs” means abs(x) > threshold, “pos”
means abs(x) > threshold && x > 0, “neg” means abs(x) > threshold && x
< 0. Default is “abs”.
Value
(data.frame)
See Also
Examples
capLargeValues(iris, threshold = 5, impute = 5)
Description
Usage
configureMlr(
show.info,
on.learner.error,
on.learner.warning,
on.par.without.desc,
on.par.out.of.bounds,
on.measure.not.applicable,
show.learner.output,
on.error.dump
)
24 configureMlr
Arguments
show.info (logical(1))
Some methods of mlr support a show.info argument to enable verbose output
on the console. This option sets the default value for these arguments. Setting
the argument manually in one of these functions will overwrite the default value
for that specific function call. Default is TRUE.
on.learner.error
(character(1))
What should happen if an error in an underlying learning algorithm is caught:
“stop”: R exception is generated.
“warn”: A FailureModel will be created, which predicts only NAs and a warn-
ing will be generated.
“quiet”: Same as “warn” but without the warning.
Default is “stop”.
on.learner.warning
(character(1))
What should happen if a warning in an underlying learning algorithm is gener-
ated:
“warn”: The warning is generated as usual.
“quiet”: The warning is suppressed.
Default is “warn”.
on.par.without.desc
(character(1))
What should happen if a parameter of a learner is set to a value, but no parameter
description object exists, indicating a possibly wrong name:
“stop”: R exception is generated.
“warn”: Warning, but parameter is still passed along to learner.
“quiet”: Same as “warn” but without the warning.
Default is “stop”.
on.par.out.of.bounds
(character(1))
What should happen if a parameter of a learner is set to an out of bounds value.
“stop”: R exception is generated.
“warn”: Warning, but parameter is still passed along to learner.
“quiet”: Same as “warn” but without the warning.
Default is “stop”.
on.measure.not.applicable
(logical(1))
What should happen if a measure is not applicable to a learner.
“stop”: R exception is generated.
“warn”: Warning, but value of the measure will be NA.
“quiet”: Same as “warn” but without the warning.
Default is “stop”.
show.learner.output
(logical(1))
Should the output of the learning algorithm during training and prediction be
shown or captured and suppressed? Default is TRUE.
ConfusionMatrix 25
on.error.dump (logical(1))
Specify whether FailureModel models and failed predictions should contain an
error dump that can be used with debugger to inspect an error. This option is
only effective if on.learner.error is “warn” or “quiet”. If it is TRUE, the dump
can be accessed using getFailureModelDump on the FailureModel, getPredic-
tionDump on the failed prediction, and getRRDump on resample predictions.
Default is FALSE.
Value
(invisible(NULL)).
See Also
Description
result (matrix) Confusion matrix of absolute values and marginals. Can also contain row and
column sums of observations.
task.desc (TaskDesc) Additional information about the task.
sums (logical(1)) Flag if marginal sums of observations are calculated.
relative (logical(1)) Flag if the relative confusion matrices are calculated.
relative.row (matrix) Confusion matrix of relative values and marginals normalized by row.
relative.col (matrix) Confusion matrix of relative values and marginals normalized by column.
relative.error (numeric(1)) Relative error overall.
See Also
convertBMRToRankMatrix
Convert BenchmarkResult to a rank-matrix.
Description
Computes a matrix of all the ranks of different algorithms over different datasets (tasks). Ranks are
computed from aggregated measures. Smaller ranks imply better methods, so for measures that are
minimized, small ranks imply small scores. for measures that are maximized, small ranks imply
large scores.
Usage
convertBMRToRankMatrix(
bmr,
measure = NULL,
ties.method = "average",
aggregation = "default"
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
ties.method (character(1))
See base::rank for details.
aggregation (character(1))
“mean” or “default”. See getBMRAggrPerformances for details on “default”.
Value
(matrix) with measure ranks as entries. The matrix has one row for each learner, and one column
for each task.
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), friedmanPostHocTestBMR(),
friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(), getBMRFeatSelResults(),
getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(), getBMRLearners(),
getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
convertMLBenchObjToTask 27
Examples
# see benchmark
convertMLBenchObjToTask
Convert a machine learning benchmark / demo object from package
mlbench to a task.
Description
We auto-set the target column, drop any column which is called “Id” and convert logicals to factors.
Usage
convertMLBenchObjToTask(x, n = 100L, ...)
Arguments
x (character(1))
Name of an mlbench function or dataset.
n (integer(1))
Number of observations for data simul functions. Note that for a few mlbench
function this setting is not exactly respected by mlbench. Default is 100.
... (any)
Passed on to data simul functions.
Examples
print(convertMLBenchObjToTask("Ionosphere"))
print(convertMLBenchObjToTask("mlbench.spirals", n = 100, sd = 0.1))
Description
Contains the task (costiris.task).
References
See datasets::iris. The cost matrix was generated artificially following
Tu, H.-H. and Lin, H.-T. (2010), One-sided support vector regression for multiclass cost-sensitive
classification. In ICML, J. Fürnkranz and T. Joachims, Eds., Omnipress, 1095–1102.
28 createDummyFeatures
Description
Replace all factor features with their dummy variables. Internally model.matrix is used. Non factor
features will be left untouched and passed to the result.
Usage
createDummyFeatures(
obj,
target = character(0L),
method = "1-of-n",
cols = NULL
)
Arguments
obj (data.frame | Task)
Input data.
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). Only used when obj is a data.frame, otherwise
ignored. If survival analysis is applicable, these are the names of the survival
time and event columns, so it has length 2. For multilabel classification these
are the names of logical columns that indicate whether a class label is present
and the number of target variables corresponds to the number of classes.
method (character(1))
Available are:
"1-of-n": For n factor levels there will be n dummy variables.
"reference": There will be n-1 dummy variables leaving out the first factor
level of each variable.
Default is “1-of-n”.
cols (character)
Columns to create dummy features for. Default is to use all columns.
Value
data.frame | Task. Same type as obj.
See Also
Other eda_and_preprocess: capLargeValues(), dropFeatures(), mergeSmallFactorLevels(),
normalizeFeatures(), removeConstantFeatures(), summarizeColumns(), summarizeLevels()
createSpatialResamplingPlots 29
createSpatialResamplingPlots
Create (spatial) resampling plot objects.
Description
Visualize partitioning of resample objects with spatial information.
Usage
createSpatialResamplingPlots(
task = NULL,
resample = NULL,
crs = NULL,
datum = 4326,
repetitions = 1,
color.train = "#0072B5",
color.test = "#E18727",
point.size = 0.5,
axis.text.size = 14,
x.axis.breaks = waiver(),
y.axis.breaks = waiver()
)
Arguments
task Task
Task object.
resample ResampleResult or named list with (multiple) ResampleResult
As returned by resample.
crs integer
Coordinate reference system (EPSG code number) for the supplied coordinates
in the Task.
datum integer
Coordinate reference system which should be used in the resulting map.
repetitions integer
Number of repetitions.
color.train character
Color for train set.
color.test character
Color for test set.
point.size integer
Point size.
axis.text.size integer
Font size of axis labels.
30 createSpatialResamplingPlots
x.axis.breaks numeric
Custom x axis breaks
y.axis.breaks numeric
Custom y axis breaks
Details
If a named list is given to resample, names will appear in the title of each fold. If multiple inputs
are given to resample, these must be named.
This function makes a hard cut at five columns of the resulting gridded plot. This means if the
resample object consists of folds > 5, these folds will be put into the new row.
For file saving, we recommend to use cowplot::save_plot.
When viewing the resulting plot in RStudio, margins may appear to be different than they really
are. Make sure to save the file to disk and inspect the image.
When modifying axis breaks, negative values need to be used if the area is located in either the
western or southern hemisphere. Use positive values for the northern and eastern hemisphere.
Value
(list of 2L containing (1) multiple ‘gg“ objects and (2) their corresponding labels.
CRS
The crs has to be suitable for the coordinates stored in the Task. For example, if the coordinates
are UTM, crs should be set to a UTM projection. Due to a limited axis space in the resulting grid
(especially on the x-axis), the data will by default projected into a lat/lon projection, specifically
EPSG 4326. If other projections are desired for the resulting map, please set argument datum
accordingly. This argument will be passed onto ggplot2::coord_sf.
Author(s)
Patrick Schratz
See Also
Other plot: plotBMRBoxplots(), plotBMRRanksAsBarChart(), plotBMRSummary(), plotCalibration(),
plotCritDifferences(), plotLearningCurve(), plotPartialDependence(), plotROCCurves(),
plotResiduals(), plotThreshVsPerf()
Examples
## -------------------------------------------------------------
## single unnamed resample input with 5 folds and 2 repetitions
createSpatialResamplingPlots 31
## -------------------------------------------------------------
## --------------------------------------------------------------------------
## single named resample input with 5 folds and 1 repetition and 32717 datum
## --------------------------------------------------------------------------
## -------------------------------------------------------------
## multiple named resample inputs with 5 folds and 1 repetition
## -------------------------------------------------------------
plots = createSpatialResamplingPlots(spatial.task,
list("SpRepCV" = r1, "RepCV" = r2), crs = 32717, repetitions = 1,
x.axis.breaks = c(-79.055, -79.085), y.axis.breaks = c(-3.975, -4))
cowplot::plot_grid(plotlist = plots[["Plots"]], ncol = 5, nrow = 2,
labels = plots[["Labels"]])
## -------------------------------------------------------------------------------------
## Complex arrangements of multiple named resample inputs with 5 folds and 1 repetition
## -------------------------------------------------------------------------------------
p1 = cowplot::plot_grid(plots[["Plots"]][[1]], plots[["Plots"]][[2]],
plots[["Plots"]][[3]], ncol = 3, nrow = 1, labels = plots[["Labels"]][1:3],
label_size = 18)
p12 = cowplot::plot_grid(plots[["Plots"]][[4]], plots[["Plots"]][[5]],
ncol = 2, nrow = 1, labels = plots[["Labels"]][4:5], label_size = 18)
p2 = cowplot::plot_grid(plots[["Plots"]][[6]], plots[["Plots"]][[7]],
plots[["Plots"]][[8]], ncol = 3, nrow = 1, labels = plots[["Labels"]][6:8],
label_size = 18)
p22 = cowplot::plot_grid(plots[["Plots"]][[9]], plots[["Plots"]][[10]],
ncol = 2, nrow = 1, labels = plots[["Labels"]][9:10], label_size = 18)
crossover Crossover.
Description
Takes two bit strings and creates a new one of the same size by selecting the items from the first
string or the second, based on a given rate (the probability of choosing an element from the first
string).
Arguments
x (logical)
First parent string.
y (logical)
Second parent string.
rate (numeric(1))
A number representing the probability of selecting an element of the first string.
Default is 0.5.
Value
(crossover).
Description
Decrease the observations in a task or a ResampleInstance to a given percentage of observations.
Usage
downsample(obj, perc = 1, stratify = FALSE)
Arguments
obj (Task | ResampleInstance)
Input data or a ResampleInstance.
perc (numeric(1))
Percentage from (0, 1). Default is 1.
stratify (logical(1))
Only for classification: Should the downsampled data be stratified according to
the target classes? Default is FALSE.
dropFeatures 33
Value
See Also
makeResampleInstance
Other downsample: makeDownsampleWrapper()
Description
Usage
dropFeatures(task, features)
Arguments
task (Task)
The task.
features (character)
Features to drop.
Value
Task.
See Also
estimateRelativeOverfitting
Estimate relative overfitting.
Description
Estimates the relative overfitting of a model as the ratio of the difference in test and train perfor-
mance to the difference of test performance in the no-information case and train performance. In
the no-information case the features carry no information with respect to the prediction. This is
simulated by permuting features and predictions.
Usage
estimateRelativeOverfitting(
predish,
measures,
task,
learner = NULL,
pred.train = NULL,
iter = 1
)
Arguments
predish (ResampleDesc | ResamplePrediction | Prediction)
Resampling strategy or resampling prediction or test predictions.
measures (Measure | list of Measure)
Performance measure(s) to evaluate. Default is the default measure for the task,
see here getDefaultMeasure.
task (Task)
The task.
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
pred.train (Prediction)
Training predictions. Only needed if test predictions are passed.
iter (integer)
Iteration number. Default 1, usually you don’t need to specify this. Only needed
if test predictions are passed.
Details
Currently only support for classification and regression tasks is implemented.
Value
(data.frame). Relative overfitting estimate(s), named by measure(s), for each resampling iteration.
estimateResidualVariance 35
References
Bradley Efron and Robert Tibshirani; Improvements on Cross-Validation: The .632+ Bootstrap
Method, Journal of the American Statistical Association, Vol. 92, No. 438. (Jun., 1997), pp.
548-560.
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(), measures, performance(),
setAggregation(), setMeasurePars()
Examples
task = makeClassifTask(data = iris, target = "Species")
rdesc = makeResampleDesc("CV", iters = 2)
estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.knn"))
estimateRelativeOverfitting(rdesc, acc, task, makeLearner("classif.lda"))
rpred = resample("classif.knn", task, rdesc)$pred
estimateRelativeOverfitting(rpred, acc, task)
estimateResidualVariance
Estimate the residual variance.
Description
Estimate the residual variance of a regression model on a given task. If a regression learner is
provided instead of a model, the model is trained (see train) first.
Usage
estimateResidualVariance(x, task, data, target)
Arguments
x (Learner or WrappedModel)
Learner or wrapped model.
task (RegrTask)
Regression task. If missing, data and target must be supplied.
data (data.frame)
A data frame containing the features and target variable. If missing, task must
be supplied.
target (character(1))
Name of the target variable. If missing, task must be supplied.
36 extractFDADTWKernel
Description
The function extracts features from functional data based on the Bspline fit. For more details refer
to FDboost::bsignal().
Usage
extractFDABsignal(bsignal.knots = 10L, bsignal.df = 3)
Arguments
bsignal.knots (integer(1))
The number of knots for bspline.
bsignal.df (numeric(1))
The effective degree of freedom of penalized bspline.
Value
(data.frame).
See Also
Other fda_featextractor: extractFDADTWKernel(), extractFDAFPCA(), extractFDAFourier(),
extractFDAMultiResFeatures(), extractFDATsfeatures(), extractFDAWavelets()
Description
The function extracts features from functional data based on the DTW distance with a reference
dataframe.
Usage
extractFDADTWKernel(
ref.method = "random",
n.refs = 0.05,
refs = NULL,
dtwwindow = 0.05
)
extractFDAFeatures 37
Arguments
ref.method (character(1))
How should the reference curves be obtained? Method random draws n.refs
random reference curves, while all uses all curves as references. In order to
use user-provided reference curves, this parameter is set to fixed.
n.refs (numeric(1))
Number of reference curves to be drawn (as a fraction of the number of obser-
vations in the training data).
refs (matrix|integer(n))
Integer vector of training set row indices or a matrix of reference curves with
the same length as the functionals in the training data. Overwrites ref.method
and n.refs.
dtwwindow (numeric(1))
Size of the warping window size (as a proportion of query length).
Value
(data.frame).
See Also
Description
Usage
Arguments
obj (Task | data.frame)
Task or data.frame to extract functional features from. Must contain functional
features as matrix columns.
target (character(1))
Task target column. Only necessary for data.frames Default is character(0).
feat.methods (named list)
List of functional features along with the desired methods for each functional
feature. “all” applies the extractFDAFeatures method to each functional fea-
ture. Names of feat.methods must match column names of functional features.
Available feature extraction methods are available under family fda_featextractor.
Specifying a functional feature multiple times with different extraction methods
allows for the extraction of different features from the same functional. Default
is list() which does nothing.
... (any)
Further hyperparameters passed on to the feat.methods specified above.
Details
The description object contains these slots:
• target (character): See argument.
• coln (character): Colum names of data.
• fd.cols (character): Functional feature names.
• extractFDAFeat (list): Contains feature.methods and relevant parameters for reextraction.
Value
(list)
• data | task (data.frame | Task): Extracted features, same type as obj.
• desc (extracFDAFeatDesc): Description object. See description for details.
See Also
Other fda: makeExtractFDAFeatMethod(), makeExtractFDAFeatsWrapper()
Examples
df = data.frame(x = matrix(rnorm(24), ncol = 8), y = factor(c("a", "a", "b")))
fdf = makeFunctionalData(df, fd.features = list(x1 = 1:4, x2 = 5:8), exclude.cols = "y")
task = makeClassifTask(data = fdf, target = "y")
extracted = extractFDAFeatures(task,
feat.methods = list("x1" = extractFDAFourier(), "x2" = extractFDAWavelets(filter = "haar")))
print(extracted$task)
reextractFDAFeatures(task, extracted$desc)
extractFDAFourier 39
Description
The function extracts features from functional data based on the fast fourier transform. For more
details refer to stats::fft.
Usage
extractFDAFourier(trafo.coeff = "phase")
Arguments
trafo.coeff (character(1))
Specifies which transformation of the complex frequency domain representation
should be calculated as a feature representation. Must be one of “amplitude” or
“phase”. Default is “phase”. The phase shift is returned in Rad, i.e. values lie in
[-180, 180].
Value
(data.frame).
See Also
Description
The function extracts the functional principal components from a data.frame containing functional
features. Uses stats::prcomp.
Usage
Arguments
rank. (integer(1))
Number of principal components to extract. Default is NULL
center (logical(1))
Should data be centered before applying PCA?
scale. (logical(1))
Should data be scaled before applying PCA?
Value
(data.frame).
See Also
Other fda_featextractor: extractFDABsignal(), extractFDADTWKernel(), extractFDAFourier(),
extractFDAMultiResFeatures(), extractFDATsfeatures(), extractFDAWavelets()
extractFDAMultiResFeatures
Multiresolution feature extraction.
Description
The function extracts currently the mean of multiple segments of each curve and stacks them as
features. The segments length are set in a hierachy way so the features cover different resolution
levels.
Usage
extractFDAMultiResFeatures(res.level = 3L, shift = 0.5, seg.lens = NULL)
Arguments
res.level (integer(1))
The number of resolution hierachy, each length is divided by a factor of 2.
shift (numeric(1))
The overlapping proportion when slide the window for one step.
seg.lens (integer(1))
Curve subsequence lengths. Needs to sum up to the length of the functional.
Value
(data.frame).
See Also
Other fda_featextractor: extractFDABsignal(), extractFDADTWKernel(), extractFDAFPCA(),
extractFDAFourier(), extractFDATsfeatures(), extractFDAWavelets()
extractFDATsfeatures 41
Description
The function extracts features from functional data based on known Heuristics. For more details re-
fer to tsfeatures::tsfeatures(). Under the hood this function uses the package tsfeatures::tsfeatures().
For more information see Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection,
ICDM 2015.
Note: Currently computes the following features:
"frequency", "stl_features", "entropy", "acf_features", "arch_stat", "crossing_points", "flat_spots",
"hurst", "holt_parameters", "lumpiness", "max_kl_shift", "max_var_shift", "max_level_shift", "sta-
bility", "nonlinearity"
Usage
extractFDATsfeatures(
scale = TRUE,
trim = FALSE,
trim_amount = 0.1,
parallel = FALSE,
na.action = na.pass,
feats = NULL,
...
)
Arguments
scale (logical(1))
If TRUE, time series are scaled to mean 0 and sd 1 before features are computed.
trim (logical(1))
If TRUE, time series are trimmed by trim_amount before features are com-
puted. Values larger than trim_amount in absolute value are set to NA.
trim_amount (numeric(1))
Default level of trimming if trim==TRUE.
parallel (logical(1))
If TRUE, multiple cores (or multiple sessions) will be used. This only speeds
things up when there are a large number of time series.
na.action (logical(1))
A function to handle missing values. Use na.interp to estimate missing values
feats (character)
A character vector of function names to apply to each time-series in order to
extract features.
Default:
42 extractFDAWavelets
Value
(data.frame)
References
Hyndman, Wang and Laptev, Large-Scale Unusual Time Series Detection, ICDM 2015.
See Also
Other fda_featextractor: extractFDABsignal(), extractFDADTWKernel(), extractFDAFPCA(),
extractFDAFourier(), extractFDAMultiResFeatures(), extractFDAWavelets()
Description
The function extracts discrete wavelet transform coefficients from the raw functional data. See
wavelets::dwt for more information.
Usage
extractFDAWavelets(filter = "la8", boundary = "periodic")
Arguments
filter (character(1))
Specifies which filter should be used. Must be one of d|la|bl|c followed by an
even number for the level of the filter. The level of the filter needs to be smaller
or equal then the time-series length. For more information and acceptable filters
see help(wt.filter). Defaults to la8.
boundary (character(1))
Boundary to be used. “periodic” assumes circular time series, for “reflection”
the series is extended to twice its length. Default is “periodic”.
Value
(data.frame).
FailureModel 43
See Also
Description
• if you set the respective option in configureMlr - when a model internally crashed during
training. The model always predicts NAs.
The if mlr option on.error.dump is TRUE, the FailureModel contains the debug trace of the error.
It can be accessed with getFailureModelDump and inspected with debugger.
Its encapsulated learner.model is simply a string: The error message that was generated when the
model crashed. The following code shows how to access the message.
See Also
Examples
configureMlr(on.learner.error = "warn")
data = iris
data$newfeat = 1 # will make LDA crash
task = makeClassifTask(data = data, target = "Species")
m = train("classif.lda", task) # LDA crashed, but mlr catches this
print(m)
print(m$learner.model) # the error message
p = predict(m, task) # this will predict NAs
print(p)
print(performance(p))
configureMlr(on.learner.error = "stop")
44 FeatSelControl
Description
Feature selection method used by selectFeatures.
The methods used here follow a wrapper approach, described in Kohavi and John (1997) (see ref-
erences).
The following optimization algorithms are available:
FeatSelControlExhaustive Exhaustive search. All feature sets (up to a certain number of features
max.features) are searched.
FeatSelControlRandom Random search. Features vectors are randomly drawn, up to a certain
number of features max.features. A feature is included in the current set with probabil-
ity prob. So we are basically drawing (0,1)-membership-vectors, where each element is
Bernoulli(prob) distributed.
FeatSelControlSequential Deterministic forward or backward search. That means extending (for-
ward) or shrinking (backward) a feature set. Depending on the given method different ap-
proaches are taken.
sfs Sequential Forward Search: Starting from an empty model, in each step the feature in-
creasing the performance measure the most is added to the model.
sbs Sequential Backward Search: Starting from a model with all features, in each step the
feature decreasing the performance measure the least is removed from the model.
sffs Sequential Floating Forward Search: Starting from an empty model, in each step the
algorithm chooses the best model from all models with one additional feature and from all
models with one feature less.
sfbs Sequential Floating Backward Search: Similar to sffs but starting with a full model.
FeatSelControlGA Search via genetic algorithm. The GA is a simple (mu, lambda) or (mu +
lambda) algorithm, depending on the comma setting. A comma strategy selects a new pop-
ulation of size mu out of the lambda > mu offspring. A plus strategy uses the joint pool of mu
parents and lambda offspring for selecting mu new candidates. Out of those mu features, the
new lambda features are generated by randomly choosing pairs of parents. These are crossed
over and crossover.rate represents the probability of choosing a feature from the first par-
ent instead of the second parent. The resulting offspring is mutated, i.e., its bits are flipped
with probability mutation.rate. If max.features is set, offspring are repeatedly generated
until the setting is satisfied.
Usage
makeFeatSelControlExhaustive(
same.resampling.instance = TRUE,
maxit = NA_integer_,
max.features = NA_integer_,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
FeatSelControl 45
makeFeatSelControlGA(
same.resampling.instance = TRUE,
impute.val = NULL,
maxit = NA_integer_,
max.features = NA_integer_,
comma = FALSE,
mu = 10L,
lambda,
crossover.rate = 0.5,
mutation.rate = 0.05,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
makeFeatSelControlRandom(
same.resampling.instance = TRUE,
maxit = 100L,
max.features = NA_integer_,
prob = 0.5,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
makeFeatSelControlSequential(
same.resampling.instance = TRUE,
impute.val = NULL,
method,
alpha = 0.01,
beta = -0.001,
maxit = NA_integer_,
max.features = NA_integer_,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
maxit (integer(1))
Maximal number of iterations. Note, that this is usually not equal to the number
of function evaluations.
46 FeatSelControl
max.features (integer(1))
Maximal number of features.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each feature set
evaluation, via tuneThreshold? Only works for classification if the predict type
is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
comma (logical(1))
Parameter of the GA feature selection, indicating whether to use a (mu, lambda)
or (mu + lambda) GA. The default is FALSE.
mu (integer(1))
Parameter of the GA feature selection. Size of the parent population.
lambda (integer(1))
Parameter of the GA feature selection. Size of the children population (should
be smaller or equal to mu).
crossover.rate (numeric(1))
Parameter of the GA feature selection. Probability of choosing a bit from the
first parent within the crossover mutation.
mutation.rate (numeric(1))
Parameter of the GA feature selection. Probability of flipping a feature bit, i.e.
switch between selecting / deselecting a feature.
FeatSelResult 47
prob (numeric(1))
Parameter of the random feature selection. Probability of choosing a feature.
method (character(1))
Parameter of the sequential feature selection. A character representing the method.
Possible values are sfs (forward search), sbs (backward search), sffs (floating
forward search) and sfbs (floating backward search).
alpha (numeric(1))
Parameter of the sequential feature selection. Minimal required value of im-
provement difference for a forward / adding step. Default is 0.01.
beta (numeric(1))
Parameter of the sequential feature selection. Minimal required value of im-
provement difference for a backward / removing step. Negative values imply
that you allow a slight decrease for the removal of a feature. Default is -0.001.
Value
(FeatSelControl). The specific subclass is one of FeatSelControlExhaustive, FeatSelControlRan-
dom, FeatSelControlSequential, FeatSelControlGA.
References
Ron Kohavi and George H. John, Wrappers for feature subset selection, Artificial Intelligence Vol-
ume 97, 1997, 273-324. http://ai.stanford.edu/~ronnyk/wrappersPrint.pdf.
See Also
Other featsel: analyzeFeatSelResult(), getFeatSelResult(), makeFeatSelWrapper(), selectFeatures()
Description
Container for results of feature selection. Contains the obtained features, their performance values
and the optimization path which lead there.
You can visualize it using analyzeFeatSelResult.
Details
Object members:
threshold (numeric) Vector of finally found and used thresholds if tune.threshold was enabled
in FeatSelControl, otherwise not present and hence NULL.
opt.path (ParamHelpers::OptPath) Optimization path which lead to x.
Description
First, calls generateFilterValuesData. Features are then selected via select and val.
Usage
filterFeatures(
task,
method = "FSelectorRcpp_information.gain",
fval = NULL,
perc = NULL,
abs = NULL,
threshold = NULL,
fun = NULL,
fun.args = NULL,
mandatory.feat = NULL,
select.method = NULL,
base.methods = NULL,
cache = FALSE,
...
)
Arguments
task (Task)
The task.
method (character(1))
See listFilterMethods. Default is “FSelectorRcpp_information.gain”.
fval (FilterValues)
Result of generateFilterValuesData. If you pass this, the filter values in the ob-
ject are used for feature filtering. method and ... are ignored then. Default is
NULL and not used.
perc (numeric(1))
If set, select perc*100 top scoring features. perc = 1 means to select all fea-
tures.Mutually exclusive with argumentsabs, thresholdandfun‘.
abs (numeric(1))
If set, select abs top scoring features. Mutually exclusive with arguments perc,
threshold and fun.
filterFeatures 49
threshold (numeric(1))
If set, select features whose score exceeds threshold. Mutually exclusive with
arguments perc, abs and fun.
fun (function)
If set, select features via a custom thresholding function, which must return the
number of top scoring features to select. Mutually exclusive with arguments
perc, abs and threshold.
fun.args (any)
Arguments passed to the custom thresholding function.
mandatory.feat (character)
Mandatory features which are always included regardless of their scores
select.method If multiple methods are supplied in argument method, specify the method that is
used for the final subsetting.
base.methods If method is an ensemble filter, specify the base filter methods which the ensem-
ble method will use.
cache (character(1) | logical)
Whether to use caching during filter value creation. See details.
... (any)
Passed down to selected filter method.
Value
Task.
Caching
If cache = TRUE, the default mlr cache directory is used to cache filter values. The directory is
operating system dependent and can be checked with getCacheDir().
The default cache can be cleared with deleteCacheDir(). Alternatively, a custom directory can
be passed to store the cache.
Note that caching is not thread safe. It will work for parallel computation on many systems, but
there is no guarantee.
Besides passing (multiple) simple filter methods you can also pass an ensemble filter method (in a
list). The ensemble method will use the simple methods to calculate its ranking. See listFilterEnsembleMethods()
for available ensemble methods.
See Also
Examples
# simple filter
filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2)
# ensemble filter
filterFeatures(iris.task, method = "E-min",
base.methods = c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain"), abs = 2)
friedmanPostHocTestBMR
Perform a posthoc Friedman-Nemenyi test.
Description
Performs a PMCMRplus::frdAllPairsNemenyiTest for a BenchmarkResult and a selected measure.
This means all pairwise comparisons of learners are performed. The null hypothesis of the
post hoc test is that each pair of learners is equal. If the null hypothesis of the included ad hoc
stats::friedman.test can be rejected an object of class pairwise.htest is returned. If not, the func-
tion returns the corresponding friedman.test.
Note that benchmark results for at least two learners on at least two tasks are required.
Usage
friedmanPostHocTestBMR(
bmr,
measure = NULL,
p.value = 0.05,
aggregation = "default"
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
p.value (numeric(1))
p-value for the tests. Default: 0.05
aggregation (character(1))
“mean” or “default”. See getBMRAggrPerformances for details on “default”.
friedmanTestBMR 51
Value
(pairwise.htest): See PMCMRplus::frdAllPairsNemenyiTest for details. Additionally two com-
ponents are added to the list:
• f.rejnull (logical(1)):
Whether the according friedman.test rejects the Null hypothesis at the selected p.value
• crit.difference (list(2)):
Minimal difference the mean ranks of two learners need to have in order to be significantly
different
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(), getBMRFeatSelResults(),
getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(), getBMRLearners(),
getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Examples
# see benchmark
Description
Performs a stats::friedman.test for a selected measure. The null hypothesis is that apart from an
effect of the different (Task), the location parameter (aggregated performance measure) is the same
for each Learner. Note that benchmark results for at least two learners on at least two tasks are
required.
Usage
friedmanTestBMR(bmr, measure = NULL, aggregation = "default")
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
aggregation (character(1))
“mean” or “default”. See getBMRAggrPerformances for details on “default”.
52 generateCalibrationData
Value
(htest): See stats::friedman.test for details.
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(),
plotBMRRanksAsBarChart(), plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Examples
# see benchmark
Description
Contains the task (fuelsubset.task). 2 functional covariates and 1 scalar covariate. You have
to predict the heat value of some fuel based on the ultraviolet radiation spectrum and infrared ray
radiation and one scalar column called h2o.
Details
The features and grids are scaled in the same way as in FDboost::FDboost.
References
See Brockhaus, S., Scheipl, F., Hothorn, T., & Greven, S. (2015). The functional linear array model.
Statistical Modelling, 15(3), 279–300.
generateCalibrationData
Generate classifier calibration data.
Description
A calibrated classifier is one where the predicted probability of a class closely matches the rate at
which that class occurs, e.g. for data points which are assigned a predicted probability of class
A of .8, approximately 80 percent of such points should belong to class A if the classifier is well
calibrated. This is estimated empirically by grouping data points with similar predicted probabilities
for each class, and plotting the rate of each class within each bin against the predicted probability
bins.
generateCalibrationData 53
Usage
generateCalibrationData(obj, breaks = "Sturges", groups = NULL, task.id = NULL)
Arguments
obj (list of Prediction | list of ResampleResult | BenchmarkResult)
Single prediction object, list of them, single resample result, list of them, or a
benchmark result. In case of a list probably produced by different learners you
want to compare, then name the list with the names you want to see in the plots,
probably learner shortnames or ids.
breaks (character(1) | numeric)
If character(1), the algorithm to use in generating probability bins. See hist
for details. If numeric, the cut points for the bins. Default is “Sturges”.
groups (integer(1))
The number of bins to construct. If specified, breaks is ignored. Default is
NULL.
task.id (character(1))
Selected task in BenchmarkResult to do plots for, ignored otherwise. Default is
first task.
Value
CalibrationData. A list containing:
References
Vuk, Miha, and Curk, Tomaz. “ROC Curve, Lift Chart, and Calibration Plot.” Metodoloski zvezki.
Vol. 3. No. 1 (2006): 89-108.
54 generateCritDifferencesData
See Also
Other generate_plot_data: generateCritDifferencesData(), generateFeatureImportanceData(),
generateFilterValuesData(), generateLearningCurveData(), generatePartialDependenceData(),
generateThreshVsPerfData(), plotFilterValues()
Other calibration: plotCalibration()
generateCritDifferencesData
Generate data for critical-differences plot.
Description
Generates data that can be used to plot a critical differences plot. Computes the critical differences
according to either the "Bonferroni-Dunn" test or the "Nemenyi" test.
"Bonferroni-Dunn" usually yields higher power as it does not compare all algorithms to each
other, but all algorithms to a baseline instead.
Learners are drawn on the y-axis according to their average rank.
For test = "nemenyi" a bar is drawn, connecting all groups of not significantly different learners.
For test = "bd" an interval is drawn arround the algorithm selected as a baseline. All learners
within this interval are not signifcantly different from the baseline.
Calculation: s
k(k + 1)
CD = qα
6N
Where qα is based on the studentized range statistic. See references for details.
Usage
generateCritDifferencesData(
bmr,
measure = NULL,
p.value = 0.05,
baseline = NULL,
test = "bd"
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
p.value (numeric(1))
P-value for the critical difference. Default: 0.05
generateCritDifferencesData 55
Value
data (data.frame) containing the info for the descriptive part of the plot
friedman.nemenyi.test
(list) of class pairwise.htest
contains the calculated PMCMRplus::frdAllPairsNemenyiTest
cd.info (list) containing info on the critical difference and its positioning
baseline baseline chosen for plotting
p.value p.value used for the PMCMRplus::frdAllPairsNemenyiTest and for computation
of the critical difference
See Also
generateFeatureImportanceData
Generate feature importance.
Description
Estimate how important individual features or groups of features are by contrasting prediction per-
formances. For method “permutation.importance” compute the change in performance from per-
muting the values of a feature (or a group of features) and compare that to the predictions made on
the unmcuted data.
Usage
generateFeatureImportanceData(
task,
method = "permutation.importance",
learner,
features = getTaskFeatureNames(task),
interaction = FALSE,
measure,
contrast = function(x, y) x - y,
aggregation = mean,
nmc = 50L,
replace = TRUE,
local = FALSE,
show.info = FALSE
)
Arguments
task (Task)
The task.
method (character(1))
The method used to compute the feature importance. The only method available
is “permutation.importance”. Default is “permutation.importance”.
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
features (character)
The features to compute the importance of. The default is all of the features
contained in the Task.
interaction (logical(1))
Whether to compute the importance of the features argument jointly. For
method = "permutation.importance" this entails permuting the values of all
features together and then contrasting the performance with that of the perfor-
mance without the features being permuted. The default is FALSE.
generateFeatureImportanceData 57
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
contrast (function)
A difference function that takes a numeric vector and returns a numeric vector
of the same length. The default is element-wise difference between the vectors.
aggregation (function)
A function which aggregates the differences. This function must take a numeric
vector and return a numeric vector of length 1. The default is mean.
nmc (integer(1))
The number of Monte-Carlo iterations to use in computing the feature impor-
tance. If nmc == -1 and method = "permutation.importance" then all permu-
tations of the features are used. The default is 50.
replace (logical(1))
Whether or not to sample the feature values with or without replacement. The
default is TRUE.
local (logical(1))
Whether to compute the per-observation importance. The default is FALSE.
show.info (logical(1))
Whether progress output (feature name, time elapsed) should be displayed.
Value
(FeatureImportance). A named list which contains the computed feature importance and the input
arguments.
Object members:
res (data.frame)
Has columns for each feature or combination of features (colon separated) for
which the importance is computed. A row coresponds to importance of the
feature specified in the column for the target.
interaction (logical(1))
Whether or not the importance of the features was computed jointly rather
than individually.
measure (Measure)
nmc (integer(1))
The number of Monte-Carlo iterations used to compute the feature importance.
When nmc == -1 and method = "permutation.importance" all permutations
are used.
local (logical(1))
Whether observation-specific importance is computed for the features.
References
See Also
Examples
generateFilterValuesData
Calculates feature filter values.
Description
Calculates numerical filter values for features. For a list of features, use listFilterMethods.
Usage
generateFilterValuesData(
task,
method = "FSelectorRcpp_information.gain",
nselect = getTaskNFeats(task),
...,
more.args = list()
)
generateFilterValuesData 59
Arguments
task (Task)
The task.
method (character | list)
Filter method(s). In case of ensemble filters the list notation needs to be used.
See the examples for more information. Default is “FSelectorRcpp_information.gain”.
nselect (integer(1))
Number of scores to request. Scores are getting calculated for all features per
default.
... (any)
Passed down to selected method. Can only be use if method contains one ele-
ment.
more.args (named list)
Extra args passed down to filter methods. List elements are named with the filter
method name the args should be passed down to. A more general and flexible
option than .... Default is empty list.
Value
task.desc [TaskDesc)
Task description.
data (data.frame) with columns:
• name(character)
Name of feature.
• type(character)
Feature column type.
• method(numeric)
One column for each method with the feature importance values.
Besides passing (multiple) simple filter methods you can also pass an ensemble filter method (in a
list). The ensemble method will use the simple methods to calculate its ranking. See listFilterEnsembleMethods()
for available ensemble methods.
See Also
Examples
# two simple filter methods
fval = generateFilterValuesData(iris.task,
method = c("FSelectorRcpp_gain.ratio", "FSelectorRcpp_information.gain"))
# using ensemble method "E-mean"
fval = generateFilterValuesData(iris.task,
method = list("E-mean", c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain")))
generateHyperParsEffectData
Generate hyperparameter effect data.
Description
Generate cleaned hyperparameter effect data from a tuning result or from a nested cross-validation
tuning result. The object returned can be used for custom visualization or passed downstream to an
out of the box mlr method, plotHyperParsEffect.
Usage
generateHyperParsEffectData(
tune.result,
include.diagnostics = FALSE,
trafo = FALSE,
partial.dep = FALSE
)
Arguments
partial.dep (logical(1))
Should partial dependence be requested based on converting to reg task? This
sets a flag so that we know to use partial dependence downstream. This should
most likely be set to TRUE if 2 or more hyperparameters were tuned simultane-
ously. Partial dependence should always be requested when more than 2 hyper-
parameters were tuned simultaneously. Setting to TRUE will cause plotHyper-
ParsEffect to automatically plot partial dependence when called downstream.
Default is FALSE.
Value
(HyperParsEffectData) Object containing the hyperparameter effects dataframe, the tuning per-
formance measures used, the hyperparameters used, a flag for including diagnostic info, a flag for
whether nested cv was used, a flag for whether partial dependence should be generated, and the
optimization algorithm used.
Examples
## Not run:
# 3-fold cross validation
ps = makeParamSet(makeDiscreteParam("C", values = 2^(-4:4)))
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("CV", iters = 3L)
res = tuneParams("classif.ksvm", task = pid.task, resampling = rdesc,
par.set = ps, control = ctrl)
data = generateHyperParsEffectData(res)
plt = plotHyperParsEffect(data, x = "C", y = "mmce.test.mean")
plt + ylab("Misclassification Error")
## End(Not run)
generateLearningCurveData
Generates a learning curve.
Description
Observe how the performance changes with an increasing number of observations.
62 generateLearningCurveData
Usage
generateLearningCurveData(
learners,
task,
resampling = NULL,
percs = seq(0.1, 1, by = 0.1),
measures,
stratify = FALSE,
show.info = getMlrOption("show.info")
)
Arguments
learners [(list of) Learner)
Learning algorithms which should be compared.
task (Task)
The task.
resampling (ResampleDesc | ResampleInstance)
Resampling strategy to evaluate the performance measure. If no strategy is given
a default "Holdout" will be performed.
percs (numeric)
Vector of percentages to be drawn from the training split. These values represent
the x-axis. Internally makeDownsampleWrapper is used in combination with
benchmark. Thus for each percentage a different set of observations is drawn
resulting in noisy performance measures as the quality of the sample can differ.
measures [(list of) Measure)
Performance measures to generate learning curves for, representing the y-axis.
stratify (logical(1))
Only for classification: Should the downsampled data be stratified according to
the target classes?
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
Value
(LearningCurveData). A list containing:
• The Task
• List of Measure)
Performance measures
• data (data.frame) with columns:
– learner Names of learners.
– percentage Percentages drawn from the training split.
– One column for each Measure passed to generateLearningCurveData.
generatePartialDependenceData 63
See Also
Other generate_plot_data: generateCalibrationData(), generateCritDifferencesData(), generateFeatureImportan
generateFilterValuesData(), generatePartialDependenceData(), generateThreshVsPerfData(),
plotFilterValues()
Other learning_curve: plotLearningCurve()
Examples
r = generateLearningCurveData(list("classif.rpart", "classif.knn"),
task = sonar.task, percs = seq(0.2, 1, by = 0.2),
measures = list(tp, fp, tn, fn),
resampling = makeResampleDesc(method = "Subsample", iters = 5),
show.info = FALSE)
plotLearningCurve(r)
generatePartialDependenceData
Generate partial dependence.
Description
Estimate how the learned prediction function is affected by one or more features. For a learned
function f(x) where x is partitioned into x_s and x_c, the partial dependence of f on x_s can be
summarized by averaging over x_c and setting x_s to a range of values of interest, estimating
E_(x_c)(f(x_s, x_c)). The conditional expectation of f at observation i is estimated similarly. Addi-
tionally, partial derivatives of the marginalized function w.r.t. the features can be computed.
This function requires the mmpf package to be installed. It is currently not on CRAN, but can be
installed through GitHub using devtools::install_github('zmjones/mmpf/pkg').
Usage
generatePartialDependenceData(
obj,
input,
features = NULL,
interaction = FALSE,
derivative = FALSE,
individual = FALSE,
fun = mean,
bounds = c(qnorm(0.025), qnorm(0.975)),
uniform = TRUE,
n = c(10, NA),
...
)
64 generatePartialDependenceData
Arguments
obj (WrappedModel)
Result of train.
input (data.frame | Task)
Input data.
features character
A vector of feature names contained in the training data. If not specified all
features in the input will be used.
interaction (logical(1))
Whether the features should be interacted or not. If TRUE then the Cartesian
product of the prediction grid for each feature is taken, and the partial depen-
dence at each unique combination of values of the features is estimated. Note
that if the length of features is greater than two, plotPartialDependence cannot
be used. If FALSE each feature is considered separately. In this case features
can be much longer than two. Default is FALSE.
derivative (logical(1))
Whether or not the partial derivative of the learned function with respect to the
features should be estimated. If TRUE interaction must be FALSE. The partial
derivative of individual observations may be estimated. Note that computation
time increases as the learned prediction function is evaluated at gridsize points
* the number of points required to estimate the partial derivative. Additional
arguments may be passed to numDeriv::grad (for regression or survival tasks) or
numDeriv::jacobian (for classification tasks). Note that functions which are not
smooth may result in estimated derivatives of 0 (for points where the function
does not change within +/- epsilon) or estimates trending towards +/- infinity (at
discontinuities). Default is FALSE.
individual (logical(1))
Whether to plot the individual conditional expectation curves rather than the ag-
gregated curve, i.e., rather than aggregating (using fun) the partial dependences
of features, plot the partial dependences of all observations in data across all
values of the features. The algorithm is developed in Goldstein, Kapelner,
Bleich, and Pitkin (2015). Default is FALSE.
fun function
A function which operates on the output on the predictions made on the input
data. For regression this means a numeric vector, and, e.g., for a multiclass
classification problem, this migh instead be probabilities which are returned as
a numeric matrix. This argument can return vectors of arbitrary length, how-
ever, if their length is greater than one, they must by named, e.g., fun = mean
or fun = function(x) c("mean" = mean(x), "variance" = var(x)). The de-
fault is the mean, unless obj is classification with predict.type = "response"
in which case the default is the proportion of observations predicted to be in
each class.
bounds (numeric(2))
The value (lower, upper) the estimated standard error is multiplied by to es-
timate the bound on a confidence region for a partial dependence. Ignored if
generatePartialDependenceData 65
predict.type != "se" for the learner. Default is the 2.5 and 97.5 quantiles
(-1.96, 1.96) of the Gaussian distribution.
uniform (logical(1))
Whether or not the prediction grid for the features is a uniform grid of size
n[1] or sampled with replacement from the input. Default is TRUE.
n (integer21)
The first element of n gives the size of the prediction grid created for each fea-
ture. The second element of n gives the size of the sample to be drawn without
replacement from the input data. Setting n[2] less than the number of rows in
the input will decrease computation time. The default for n[1] is 10, and the
default for n[2] is the number of rows in the input.
... additional arguments to be passed to mmpf’s marginalPrediction.
Value
PartialDependenceData. A named list, which contains the partial dependence, input data, target,
features, task description, and other arguments controlling the type of partial dependences made.
Object members:
data data.frame
Has columns for the prediction: one column for regression and survival analysis,
and a column for class and the predicted probability for classification as well as
a a column for each element of features. If individual = TRUE then there is
an additional column idx which gives the index of the data that each prediction
corresponds to.
task.desc TaskDesc
Task description.
target Target feature for regression, target feature levels for classification, survival and
event indicator for survival.
features character
Features argument input.
interaction (logical(1))
Whether or not the features were interacted (i.e. conditioning).
derivative (logical(1))
Whether or not the partial derivative was estimated.
individual (logical(1))
Whether the partial dependences were aggregated or the individual curves are
retained.
References
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. “Peeking inside the black box:
Visualizing statistical learning with plots of individual conditional expectation.” Journal of Com-
putational and Graphical Statistics. Vol. 24, No. 1 (2015): 44-65.
Friedman, Jerome. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals
of Statistics. Vol. 29. No. 5 (2001): 1189-1232.
66 generateThreshVsPerfData
See Also
Other partial_dependence: plotPartialDependence()
Other generate_plot_data: generateCalibrationData(), generateCritDifferencesData(), generateFeatureImportan
generateFilterValuesData(), generateLearningCurveData(), generateThreshVsPerfData(),
plotFilterValues()
Examples
lrn = makeLearner("regr.svm")
fit = train(lrn, bh.task)
pd = generatePartialDependenceData(fit, bh.task, "lstat")
plotPartialDependence(pd, data = getTaskData(bh.task))
generateThreshVsPerfData
Generate threshold vs. performance(s) for 2-class classification.
Description
Generates data on threshold vs. performance(s) for 2-class classification that can be used for plot-
ting.
Usage
generateThreshVsPerfData(
obj,
measures,
gridsize = 100L,
aggregate = TRUE,
task.id = NULL
)
Arguments
obj (list of Prediction | list of ResampleResult | BenchmarkResult)
Single prediction object, list of them, single resample result, list of them, or a
benchmark result. In case of a list probably produced by different learners you
want to compare, then name the list with the names you want to see in the plots,
probably learner shortnames or ids.
getBMRAggrPerformances 67
Value
(ThreshVsPerfData). A named list containing the measured performance across the threshold grid,
the measures, and whether the performance estimates were aggregated (only applicable for (list of)
ResampleResults).
See Also
Other generate_plot_data: generateCalibrationData(), generateCritDifferencesData(), generateFeatureImportan
generateFilterValuesData(), generateLearningCurveData(), generatePartialDependenceData(),
plotFilterValues()
Other thresh_vs_perf: plotROCCurves(), plotThreshVsPerf()
getBMRAggrPerformances
Extract the aggregated performance values from a benchmark result.
Description
Either a list of lists of “aggr” numeric vectors, as returned by resample, or these objects are rbind-ed
with extra columns “task.id” and “learner.id”.
Usage
getBMRAggrPerformances(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
68 getBMRFeatSelResults
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
(list | data.frame). See above.
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRFeatSelResults(),
getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(), getBMRLearners(),
getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Description
Returns a nested list of FeatSelResults. The first level of nesting is by data set, the second by learner,
the third for the benchmark resampling iterations. If as.df is TRUE, a data frame with “task.id”,
“learner.id”, the resample iteration and the selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple
rows for the same dataset-learner-iteration; one for each selected feature.
getBMRFeatSelResults 69
Usage
getBMRFeatSelResults(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
See Also
getBMRFilteredFeatures
Extract the feature selection results from a benchmark result.
Description
Returns a nested list of characters The first level of nesting is by data set, the second by learner,
the third for the benchmark resampling iterations. The list at the lowest level is the list of selected
features. If as.df is TRUE, a data frame with “task.id”, “learner.id”, the resample iteration and the
selected features is returned.
Note that if more than one feature is selected and a data frame is requested, there will be multiple
rows for the same dataset-learner-iteration; one for each selected feature.
Usage
getBMRFilteredFeatures(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
getBMRLearnerIds 71
Value
See Also
Description
Usage
getBMRLearnerIds(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
(character).
See Also
Description
Gets the learners used in a benchmark experiment.
Usage
getBMRLearners(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
(list).
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
getBMRLearnerShortNames
Return learner short.names used in benchmark.
Description
Gets the learner short.names of the learners used in a benchmark experiment.
Usage
getBMRLearnerShortNames(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
getBMRMeasureIds 73
Value
(character).
See Also
Description
Usage
getBMRMeasureIds(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
See Also
Description
Usage
getBMRMeasures(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
See Also
Description
Usage
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
(list).
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRPerformances(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Description
Either a list of lists of “measure.test” data.frames, as returned by resample, or these objects are
rbind-ed with extra columns “task.id” and “learner.id”.
Usage
getBMRPerformances(
bmr,
task.ids = NULL,
learner.ids = NULL,
76 getBMRPredictions
as.df = FALSE,
drop = FALSE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
(list | data.frame). See above.
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPredictions(),
getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Description
Either a list of lists of ResamplePrediction objects, as returned by resample, or these objects are
rbind-ed with extra columns “task.id” and “learner.id”.
If predict.type is “prob”, the probabilities for each class are returned in addition to the response.
If keep.pred is FALSE in the call to benchmark, the function will return NULL.
getBMRPredictions 77
Usage
getBMRPredictions(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
See Also
getBMRTaskDescriptions
Extract all task descriptions from benchmark result (DEPRECATED).
Description
A list containing all TaskDescs for each task contained in the benchmark experiment.
Usage
getBMRTaskDescriptions(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
(list).
Description
A list containing all TaskDescs for each task contained in the benchmark experiment.
Usage
getBMRTaskDescs(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
(list).
getBMRTaskIds 79
See Also
Description
Usage
getBMRTaskIds(bmr)
Arguments
bmr (BenchmarkResult)
Benchmark result.
Value
(character).
See Also
Description
Returns a nested list of TuneResults. The first level of nesting is by data set, the second by learner,
the third for the benchmark resampling iterations. If as.df is TRUE, a data frame with the “task.id”,
“learner.id”, the resample iteration, the parameter values and the performances is returned.
Usage
getBMRTuneResults(
bmr,
task.ids = NULL,
learner.ids = NULL,
as.df = FALSE,
drop = FALSE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
task.ids (character(1))
Restrict result to certain tasks. Default is all.
learner.ids (character(1))
Restrict result to certain learners. Default is all.
as.df (character(1))
Return one data.frame as result - or a list of lists of objects?. Default is FALSE.
drop (logical(1))
If drop is FALSE (the default), a nested list with the following structure is re-
turned:
res[task.ids][learner.ids].
If drop is set to TRUE it is checked if the list structure can be simplified.
If only one learner was passed, a list with entries for each task is returned.
If only one task was passed, the entries are named after the corresponding
learner.
For an experiment with both one task and learner, the whole list structure is re-
moved.
Note that the name of the task/learner will be dropped from the return object.
Value
(list | data.frame). See above.
getCaretParamSet 81
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Description
Constructs a grid of tuning parameters from a learner of the caret R-package. These values are then
converted into a list of non-tunable parameters (par.vals) and a tunable ParamHelpers::ParamSet
(par.set), which can be used by tuneParams for tuning the learner. Numerical parameters will
either be specified by their lower and upper bounds or they will be discretized into specific values.
Usage
getCaretParamSet(learner, length = 3L, task, discretize = TRUE)
Arguments
learner (character(1))
The name of the learner from caret (cf. https://topepo.github.io/caret/
available-models.html). Note that the names in caret often differ from the
ones in mlr.
length (integer(1))
A length / precision parameter which is used by caret for generating the grid of
tuning parameters. caret generates either as many values per tuning parameter
/ dimension as defined by length or only a single value (in case of non-tunable
par.vals).
task (Task)
Learning task, which might be requested for creating the tuning grid.
discretize (logical(1))
Should the numerical parameters be discretized? Alternatively, they will be de-
fined by their lower and upper bounds. The default is TRUE.
Value
(list(2)). A list of parameters:
Examples
if (requireNamespace("caret") && requireNamespace("mlbench")) {
library(caret)
classifTask = makeClassifTask(data = iris, target = "Species")
Description
Gets the class weight parameter of a learner.
Usage
getClassWeightParam(learner, lrn.id = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
lrn.id (character)
Only used for BaseEnsembles. It is possible that multiple learners in a base
ensemble have a class weight param. Specify the learner from which the class
weight should be extracted.
Value
numeric LearnerParam: A numeric parameter object, containing the class weight parameter of the
given learner.
See Also
Other learner: LearnerProperties, getHyperPars(), getLearnerId(), getLearnerNote(), getLearnerPackages(),
getLearnerParVals(), getLearnerParamSet(), getLearnerPredictType(), getLearnerShortName(),
getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(), makeLearner(),
makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(), setPredictThreshold(),
setPredictType()
getConfMatrix 83
Description
Calculates confusion matrix for (possibly resampled) prediction. Rows indicate true classes, columns
predicted classes.
The marginal elements count the number of classification errors for the respective row or col-
umn, i.e., the number of errors when you condition on the corresponding true (rows) or predicted
(columns) class. The last element in the margin diagonal displays the total amount of errors.
Note that for resampling no further aggregation is currently performed. All predictions on all test
sets are joined to a vector yhat, as are all labels joined to a vector y. Then yhat is simply tabulated
vs y, as if both were computed on a single test set. This probably mainly makes sense when cross-
validation is used for resampling.
Usage
Arguments
pred (Prediction)
Prediction object.
relative (logical(1))
If TRUE rows are normalized to show relative frequencies. Default is FALSE.
Value
See Also
predict.WrappedModel
84 getFailureModelDump
Description
Get the default measure for a task type, task, task description or a learner. Currently these are:
classif: mmce
regr: mse
cluster: db
surv: cindex
costsen: mcp
multilabel: multilabel.hamloss
Usage
getDefaultMeasure(x)
Arguments
x ([character(1)‘ | Task | TaskDesc | Learner)
Task type, task, task description, learner name, a learner, or a type of learner
(e.g. "classif").
Value
(Measure).
Description
Returns the error dump that can be used with debugger() to evaluate errors. If configureMlr
configuration on.error.dump is FALSE, this returns NULL.
Usage
getFailureModelDump(model)
Arguments
model (WrappedModel)
The model.
Value
(last.dump).
getFailureModelMsg 85
Description
Such a model is created when one sets the corresponding option in configureMlr. If no failure
occurred, NA is returned.
For complex wrappers this getter returns the first error message encountered in ANY model that
failed.
Usage
getFailureModelMsg(model)
Arguments
model (WrappedModel)
The model.
Value
(character(1)).
getFeatSelResult Returns the selected feature set and optimization path after training.
Description
Returns the selected feature set and optimization path after training.
Usage
getFeatSelResult(object)
Arguments
object (WrappedModel)
Trained Model created with makeFeatSelWrapper.
Value
(FeatSelResult).
See Also
Other featsel: FeatSelControl, analyzeFeatSelResult(), makeFeatSelWrapper(), selectFeatures()
86 getFeatureImportance
Description
For some learners it is possible to calculate a feature importance measure. getFeatureImportance
extracts those values from trained models. See below for a list of supported learners.
Usage
getFeatureImportance(object, ...)
Arguments
object (WrappedModel)
Wrapped model, result of train().
... (any)
Additional parameters, which are passed to the underlying importance value
generating function.
Details
• boosting
Measure which accounts the gain of Gini index given by a feature in a tree and the weight of
that tree.
• cforest
Permutation principle of the ’mean decrease in accuracy’ principle in randomForest. If auc=TRUE
(only for binary classification), area under the curve is used as measure. The algorithm used
for the survival learner is ’extremely slow and experimental; use at your own risk’. See
party::varimp() for details and further parameters.
• gbm
Estimation of relative influence for each feature. See gbm::relative.influence() for de-
tails and further parameters.
• h2o
Relative feature importances as returned by h2o::h2o.varimp().
• randomForest
For type = 2 (the default) the ’MeanDecreaseGini’ is measured, which is based on the Gini
impurity index used for the calculation of the nodes. Alternatively, you can set type to 1, then
the measure is the mean decrease in accuracy calculated on OOB data. Note, that in this case
the learner’s parameter importance needs to be set to be able to compute feature importance
values. See randomForest::importance() for details.
• RRF
This is identical to randomForest.
getFilteredFeatures 87
• ranger
Supports both measures mentioned above for the randomForest learner. Note, that you need to
specifically set the learners parameter importance, to be able to compute feature importance
measures. See ranger::importance() and ranger::ranger() for details.
• rpart
Sum of decrease in impurity for each of the surrogate variables at each node
• xgboost
The value implies the relative contribution of the corresponding feature to the model calculated
by taking each feature’s contribution for each tree in the model. The exact computation of the
importance in xgboost is undocumented.
Value
Description
Usage
getFilteredFeatures(model)
Arguments
model (WrappedModel)
Trained Model created with makeFilterWrapper.
Value
(character).
See Also
Description
The parameters “subset”, “features”, and “recode.target” are ignored for the data.frame method.
Usage
getFunctionalFeatures(object, subset = NULL, features, recode.target = "no")
Arguments
object (Task/data.frame)
Object to check on.
subset (integer | logical | NULL)
Selected cases. Either a logical or an index vector. By default NULL if all obser-
vations are used.
features (character | integer | logical)
Vector of selected inputs. You can either pass a character vector with the feature
names, a vector of indices, or a logical vector.
In case of an index vector each element denotes the position of the feature name
returned by getTaskFeatureNames.
Note that the target feature is always included in the resulting task, you should
not pass it here. Default is to use all features.
recode.target (character(1))
Should target classes be recoded? Supported are binary and multilabel classi-
fication and survival. Possible values for binary classification are “01”, “-1+1”
and “drop.levels”. In the two latter cases the target vector is converted into a
numeric vector. The positive class is coded as “+1” and the negative class either
as “0” or “-1”. “drop.levels” will remove empty factor levels in the target col-
umn. In the multilabel case the logical targets can be converted to factors with
“multilabel.factor”. For survival, you may choose to recode the survival times to
“left”, “right” or “interval2” censored times using “lcens”, “rcens” or “icens”,
respectively. See survival::Surv for the format specification. Default for both
binary classification and survival is “no” (do nothing).
Value
Returns a data.frame containing only the functional features.
getHomogeneousEnsembleModels 89
getHomogeneousEnsembleModels
Deprecated, use getLearnerModel instead.
Description
Deprecated, use getLearnerModel instead.
Usage
getHomogeneousEnsembleModels(model, learner.models = FALSE)
Arguments
model Deprecated.
learner.models Deprecated.
Description
Retrieves the current hyperparameter settings of a learner.
Usage
getHyperPars(learner, for.fun = c("train", "predict", "both"))
Arguments
learner (Learner)
The learner.
for.fun (character(1))
Restrict the returned settings to hyperparameters corresponding to when the are
used (see ParamHelpers::LearnerParam). Must be a subset of: “train”, “predict”
or “both”. Default is c("train", "predict", "both").
Details
This function only shows hyperparameters that differ from the learner default (because mlr changed
the default) or if the user set hyperparameters manually during learner creation. If you want to have
an overview of all available hyperparameters use getParamSet().
Value
(list). A named list of values.
90 getLearnerId
See Also
Examples
getHyperPars(makeLearner("classif.ranger"))
Description
Usage
getLearnerId(learner)
Arguments
Value
(character(1)).
See Also
Description
Get underlying R model of learner integrated into mlr.
Usage
getLearnerModel(model, more.unwrap = FALSE)
Arguments
model (WrappedModel)
The model, returned by e.g., train.
more.unwrap (logical(1))
Some learners are not basic learners from R, but implemented in mlr as meta-
techniques. Examples are everything that inherits from HomogeneousEnsemble.
In these cases, the learner.model is often a list of mlr WrappedModels. This
option allows to strip them further to basic R models. The option is simply
ignored for basic learner models. Default is FALSE.
Value
(any). A fitted model, depending the learner / wrapped package. E.g., a model of class rpart::rpart
for learner “classif.rpart”.
Description
Get the note for the learner.
Usage
getLearnerNote(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
(character).
92 getLearnerParamSet
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(), getLearnerPredictType(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Description
Get the R packages the learner requires.
Usage
getLearnerPackages(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
(character).
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerParVals(), getLearnerParamSet(), getLearnerPredictType(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Description
Alias for getParamSet.
Usage
getLearnerParamSet(learner)
getLearnerParVals 93
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
ParamSet.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerPredictType(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Description
Alias for getHyperPars.
Usage
getLearnerParVals(learner, for.fun = c("train", "predict", "both"))
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
for.fun (character(1))
Restrict the returned settings to hyperparameters corresponding to when the are
used (see ParamHelpers::LearnerParam). Must be a subset of: “train”, “predict”
or “both”. Default is c("train", "predict", "both").
Value
(list). A named list of values.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParamSet(), getLearnerPredictType(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
94 getLearnerShortName
Description
Get the predict type of the learner.
Usage
getLearnerPredictType(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
(character(1)).
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Description
For an ordinary learner simply its short name is returned. For wrapped learners, the wrapper id is
successively attached to the short name of the base learner. E.g: “rf.bagged.imputed”
Usage
getLearnerShortName(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
getLearnerType 95
Value
(character(1)).
See Also
Description
Usage
getLearnerType(learner)
Arguments
Value
(character(1)).
See Also
Description
Gets the options for mlr.
Usage
getMlrOptions()
Value
(list).
See Also
Other configure: configureMlr()
getMultilabelBinaryPerformances
Retrieve binary classification measures for multilabel classification
predictions.
Description
Measures the quality of each binary label prediction w.r.t. some binary classification performance
measure.
Usage
getMultilabelBinaryPerformances(pred, measures)
Arguments
pred (Prediction)
Multilabel Prediction object.
measures (Measure | list of Measure)
Performance measure(s) to evaluate, must be applicable to binary classification
performance. Default is mmce.
Value
(named matrix). Performance value(s), column names are measure(s), row names are labels.
getNestedTuneResultsOptPathDf 97
See Also
Other multilabel: makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper()
Examples
# see makeMultilabelBinaryRelevanceWrapper
getNestedTuneResultsOptPathDf
Get the opt.paths from each tuning step from the outer resampling.
Description
After you resampled a tuning wrapper (see makeTuneWrapper) with resample(..., extract =
getTuneResult) this helper returns a data.frame with with all opt.paths combined by rbind.
An additional column iter indicates to what resampling iteration the row belongs.
Usage
getNestedTuneResultsOptPathDf(r, trafo = FALSE)
Arguments
r (ResampleResult)
The result of resampling of a tuning wrapper.
trafo (logical(1))
Should the units of the hyperparameter path be converted to the transformed
scale? This is only necessary when trafo was used to create the opt.paths.
Note that opt.paths are always stored on the untransformed scale. Default is
FALSE.
Value
(data.frame). See above.
See Also
Other tune: TuneControl, getNestedTuneResultsX(), getResamplingIndices(), getTuneResult(),
makeModelMultiplexer(), makeModelMultiplexerParamSet(), makeTuneControlCMAES(), makeTuneControlDesign()
makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(), makeTuneControlMBO(),
makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Examples
# see example of makeTuneWrapper
98 getOOBPreds
Description
After you resampled a tuning wrapper (see makeTuneWrapper) with resample(..., extract =
getTuneResult) this helper returns a data.frame with the best found hyperparameter settings for
each resampling iteration.
Usage
getNestedTuneResultsX(r)
Arguments
r (ResampleResult)
The result of resampling of a tuning wrapper.
Value
(data.frame). One column for each tuned hyperparameter and one row for each outer resampling
iteration.
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getResamplingIndices(), getTuneResult(),
makeModelMultiplexer(), makeModelMultiplexerParamSet(), makeTuneControlCMAES(), makeTuneControlDesign()
makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(), makeTuneControlMBO(),
makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Examples
# see example of makeTuneWrapper
Description
Learners like randomForest produce out-of-bag predictions. getOOBPreds extracts this informa-
tion from trained models and builds a prediction object as provided by predict (with prediction time
set to NA). In the classification case: What is stored exactly in the (Prediction) object depends on
the predict.type setting of the Learner.
You can call listLearners(properties = "oobpreds") to get a list of learners which provide
this.
getParamSet 99
Usage
getOOBPreds(model, task)
Arguments
model (WrappedModel)
The model.
task (Task)
The task.
Value
(Prediction).
Examples
training.set = sample(1:150, 50)
lrn = makeLearner("classif.ranger", predict.type = "prob", predict.threshold = 0.6)
mod = train(lrn, sonar.task, subset = training.set)
oob = getOOBPreds(mod, sonar.task)
oob
performance(oob, measures = list(auc, mmce))
Description
Returns the ParamHelpers::ParamSet from a Learner.
Value
ParamSet.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
100 getPredictionProbabilities
Description
Returns the error dump that can be used with debugger() to evaluate errors. If configureMlr
configuration on.error.dump is FALSE or if the prediction did not fail, this returns NULL.
Usage
getPredictionDump(pred)
Arguments
pred (Prediction)
Prediction object.
Value
(last.dump).
See Also
Other debug: FailureModel, ResampleResult, getRRDump()
getPredictionProbabilities
Get probabilities for some classes.
Description
Get probabilities for some classes.
Usage
getPredictionProbabilities(pred, cl)
Arguments
pred (Prediction)
Prediction object.
cl (character)
Names of classes. Default is either all classes for multi-class / multilabel prob-
lems or the positive class for binary classification.
getPredictionResponse 101
Value
(data.frame) with numerical columns or a numerical vector if length of cl is 1. Order of columns is
defined by cl.
See Also
Other predict: asROCRPrediction(), getPredictionResponse(), getPredictionTaskDesc(),
predict.WrappedModel(), setPredictThreshold(), setPredictType()
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.lda", predict.type = "prob")
mod = train(lrn, task)
# predict probabilities
pred = predict(mod, newdata = iris)
Description
The following types are returned, depending on task type:
classif factor
regr numeric
se numeric
cluster integer
surv numeric
multilabel logical matrix, columns named with labels
Usage
getPredictionResponse(pred)
getPredictionSE(pred)
getPredictionTruth(pred)
102 getPredictionTaskDesc
Arguments
pred (Prediction)
Prediction object.
Value
See above.
See Also
Description
See title.
Usage
getPredictionTaskDesc(pred)
Arguments
pred (Prediction)
Prediction object.
Value
ret_taskdesc
See Also
Description
Deprecated, use getPredictionProbabilities instead.
Usage
getProbabilities(pred, cl)
Arguments
pred Deprecated.
cl Deprecated.
getResamplingIndices Get the resampling indices from a tuning or feature selection wrapper..
Description
After you resampled a tuning or feature selection wrapper (see makeTuneWrapper) with resample(...,
extract = getTuneResult) or resample(..., extract = getFeatSelResult) this helper returns
a list with the resampling indices used for the respective method.
Usage
getResamplingIndices(object, inner = FALSE)
Arguments
object (ResampleResult)
The result of resampling of a tuning or feature selection wrapper.
inner (logical)
If TRUE, returns the inner indices of a nested resampling setting.
Value
(list). One list for each outer resampling fold.
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(), makeTuneControlCMAES(),
makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
104 getRRDump
Examples
task = makeClassifTask(data = iris, target = "Species")
lrn = makeLearner("classif.rpart")
# stupid mini grid
ps = makeParamSet(
makeDiscreteParam("cp", values = c(0.05, 0.1)),
makeDiscreteParam("minsplit", values = c(10, 20))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Holdout")
outer = makeResampleDesc("CV", iters = 2)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl)
# nested resampling for evaluation
# we also extract tuned hyper pars in each iteration and by that the resampling indices
r = resample(lrn, task, outer, extract = getTuneResult)
# get tuning indices
getResamplingIndices(r, inner = TRUE)
Description
Returns the error dumps generated during resampling, which can be used with debugger() to debug
errors. These dumps are saved if configureMlr configuration on.error.dump, or the corresponding
learner config, is TRUE.
The returned object is a list with as many entries as the resampling being used has folds. Each of
these entries can have a subset of the following slots, depending on which step in the resampling
iteration failed: “train” (error during training step), “predict.train” (prediction on training subset),
“predict.test” (prediction on test subset).
Usage
getRRDump(res)
Arguments
res (ResampleResult)
The result of resample.
Value
list.
See Also
Other debug: FailureModel, ResampleResult, getPredictionDump()
getRRPredictionList 105
getRRPredictionList Get list of predictions for train and test set of each single resample
iteration.
Description
This function creates a list with two slots train and test where each slot is again a list of Predic-
tion objects for each single resample iteration. In case that predict = "train" was used for the
resample description (see makeResampleDesc), the slot test will be NULL and in case that predict
= "test" was used, the slot train will be NULL.
Usage
getRRPredictionList(res, ...)
Arguments
res (ResampleResult)
The result of resample run with keep.pred = TRUE.
... (any)
Further options passed to makePrediction.
Value
list.
See Also
Other resample: ResamplePrediction, ResampleResult, addRRMeasure(), getRRPredictions(),
getRRTaskDesc(), getRRTaskDescription(), makeResampleDesc(), makeResampleInstance(),
resample()
Description
Very simple getter.
Usage
getRRPredictions(res)
Arguments
res (ResampleResult)
The result of resample run with keep.pred = TRUE.
106 getRRTaskDesc
Value
(ResamplePrediction).
See Also
Description
Usage
getRRTaskDesc(res)
Arguments
res (ResampleResult)
The result of resample.
Value
(TaskDesc).
See Also
Description
Get a summarizing task description.
Usage
getRRTaskDescription(res)
Arguments
res (ResampleResult)
The result of resample.
Value
(TaskDesc).
See Also
Other resample: ResamplePrediction, ResampleResult, addRRMeasure(), getRRPredictionList(),
getRRPredictions(), getRRTaskDesc(), makeResampleDesc(), makeResampleInstance(), resample()
getStackedBaseLearnerPredictions
Returns the predictions for each base learner.
Description
Returns the predictions for each base learner.
Usage
getStackedBaseLearnerPredictions(model, newdata = NULL)
Arguments
model (WrappedModel)
Wrapped model, result of train.
newdata (data.frame)
New observations, for which the predictions using the specified base learners
should be returned. Default is NULL and extracts the base learner predictions
that were made during the training.
108 getTaskCosts
Details
None.
getTaskClassLevels Get the class levels for classification and multilabel tasks.
Description
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
Usage
getTaskClassLevels(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
(character).
See Also
Other task: getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(), getTaskFormula(),
getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
Description
Returns “NULL” if the task is not of type “costsens”.
Usage
getTaskCosts(task, subset = NULL)
Arguments
task (CostSensTask)
The task.
subset (integer | logical | NULL)
Selected cases. Either a logical or an index vector. By default NULL if all obser-
vations are used.
getTaskData 109
Value
(matrix | NULL).
See Also
Other task: getTaskClassLevels(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(),
getTaskTargets(), getTaskType(), subsetTask()
Description
Useful in trainLearner when you add a learning machine to the package.
Usage
getTaskData(
task,
subset = NULL,
features,
target.extra = FALSE,
recode.target = "no",
functionals.as = "dfcols"
)
Arguments
task (Task)
The task.
subset (integer | logical | NULL)
Selected cases. Either a logical or an index vector. By default NULL if all obser-
vations are used.
features (character | integer | logical)
Vector of selected inputs. You can either pass a character vector with the feature
names, a vector of indices, or a logical vector.
In case of an index vector each element denotes the position of the feature name
returned by getTaskFeatureNames.
Note that the target feature is always included in the resulting task, you should
not pass it here. Default is to use all features.
target.extra (logical(1))
Should target vector be returned separately? If not, a single data.frame including
the target columns is returned, otherwise a list with the input data.frame and an
extra vector or data.frame for the targets. Default is FALSE.
110 getTaskDesc
recode.target (character(1))
Should target classes be recoded? Supported are binary and multilabel classi-
fication and survival. Possible values for binary classification are “01”, “-1+1”
and “drop.levels”. In the two latter cases the target vector is converted into a
numeric vector. The positive class is coded as “+1” and the negative class either
as “0” or “-1”. “drop.levels” will remove empty factor levels in the target col-
umn. In the multilabel case the logical targets can be converted to factors with
“multilabel.factor”. For survival, you may choose to recode the survival times to
“left”, “right” or “interval2” censored times using “lcens”, “rcens” or “icens”,
respectively. See survival::Surv for the format specification. Default for both
binary classification and survival is “no” (do nothing).
functionals.as (character(1))
How to represents functional features? Option “matrix”: Keep them as matrix
columns in the data.frame. Option “dfcols”: Convert them to individual numeric
data.frame columns. Default is “dfcols”.
Value
Either a data.frame or a list with data.frame data and vector target.
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(),
getTaskTargets(), getTaskType(), subsetTask()
Examples
library("mlbench")
data(BreastCancer)
df = BreastCancer
df$Id = NULL
task = makeClassifTask(id = "BreastCancer", data = df, target = "Class", positive = "malignant")
head(getTaskData)
head(getTaskData(task, features = c("Cell.size", "Cell.shape"), recode.target = "-1+1"))
head(getTaskData(task, subset = 1:100, recode.target = "01"))
Description
See title.
Usage
getTaskDesc(x)
getTaskDescription 111
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
ret_taskdesc
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(),
getTaskTargets(), getTaskType(), subsetTask()
Description
Deprecated, use getTaskDesc instead.
Usage
getTaskDescription(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Description
Target column name is not included.
Usage
getTaskFeatureNames(task)
Arguments
task (Task)
The task.
112 getTaskFormula
Value
(character).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFormula(),
getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
Description
This is usually simply <target> ~ . For multilabel it is <target_1> + ... + <target_k> ~.
Usage
getTaskFormula(
x,
target = getTaskTargetNames(x),
explicit.features = FALSE,
env = parent.frame()
)
Arguments
x (Task | TaskDesc)
Task or its description object.
target (character(1))
Left hand side of the formula. Default is defined by task x.
explicit.features
(logical(1))
Should the features (right hand side of the formula) be explicitly listed? Default
is FALSE, i.e., they will be represented as ".".
env (environment)
Environment of the formula. Default is parent.frame().
Value
(formula).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
getTaskId 113
Description
See title.
Usage
getTaskId(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
(character(1)).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
Description
See title.
Usage
getTaskNFeats(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
(integer(1)).
114 getTaskTargetNames
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskSize(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
Description
See title.
Usage
getTaskSize(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
(integer(1)).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskTargetNames(), getTaskTargets(),
getTaskType(), subsetTask()
Description
NB: For multilabel, getTaskTargetNames and getTaskClassLevels actually return the same thing.
Usage
getTaskTargetNames(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
getTaskTargets 115
Value
(character).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargets(), getTaskType(),
subsetTask()
Description
Get target data of task.
Usage
getTaskTargets(task, recode.target = "no")
Arguments
task (Task)
The task.
recode.target (character(1))
Should target classes be recoded? Supported are binary and multilabel classi-
fication and survival. Possible values for binary classification are “01”, “-1+1”
and “drop.levels”. In the two latter cases the target vector is converted into a
numeric vector. The positive class is coded as “+1” and the negative class either
as “0” or “-1”. “drop.levels” will remove empty factor levels in the target col-
umn. In the multilabel case the logical targets can be converted to factors with
“multilabel.factor”. For survival, you may choose to recode the survival times to
“left”, “right” or “interval2” censored times using “lcens”, “rcens” or “icens”,
respectively. See survival::Surv for the format specification. Default for both
binary classification and survival is “no” (do nothing).
Value
A factor for classification or a numeric for regression, a data.frame of logical columns for multil-
abel.
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(),
getTaskType(), subsetTask()
116 getTuneResult
Examples
task = makeClassifTask(data = iris, target = "Species")
getTaskTargets(task)
Description
See title.
Usage
getTaskType(x)
Arguments
x (Task | TaskDesc)
Task or its description object.
Value
(character(1)).
See Also
Other task: getTaskClassLevels(), getTaskCosts(), getTaskData(), getTaskDesc(), getTaskFeatureNames(),
getTaskFormula(), getTaskId(), getTaskNFeats(), getTaskSize(), getTaskTargetNames(),
getTaskTargets(), subsetTask()
Description
Returns the optimal hyperparameters and optimization path after training.
Usage
getTuneResult(object)
Arguments
object (WrappedModel)
Trained Model created with makeTuneWrapper.
getTuneResultOptPath 117
Value
(TuneResult).
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), makeModelMultiplexer(), makeModelMultiplexerParamSet(), makeTuneControlCMAES(),
makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Description
Returns the opt.path from a (TuneResult) object.
Usage
getTuneResultOptPath(tune.result, as.df = TRUE)
Arguments
tune.result (TuneResult)
A tuning result of the (tuneParams) function.
as.df (logical(1))
Should the optimization path be returned as a data frame? Default is TRUE.
Value
(ParamHelpers::OptPath) or (data.frame).
Description
Contains the task (gunpoint.task). You have to classify whether a person raises up a gun or just
an empty hand.
References
See Ratanamahatana, C. A. & Keogh. E. (2004). Everything you know about Dynamic Time
Warping is Wrong. Proceedings of SIAM International Conference on Data Mining (SDM05),
506-510.
118 hasProperties
Description
See title.
Usage
hasFunctionalFeatures(obj)
Arguments
Value
(logical(1))
Description
Usage
hasProperties(learner, props)
Arguments
learner Deprecated.
props Deprecated.
helpLearner 119
Description
Interactive function that gives the user quick access to the help pages associated with various func-
tions involved in the given learner.
Usage
helpLearner(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Other help: helpLearnerParam()
Description
Print the description of parameters of a given learner. The description is automatically extracted
from the help pages of the learner, so it may be incomplete.
Usage
helpLearnerParam(learner, param = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
param (character | NULL)
Parameter(s) to describe. Defaults to NULL, which prints information on the
documentation status of all parameters.
120 imputations
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Other help: helpLearner()
Description
The built-ins are:
Usage
imputeConstant(const)
imputeMedian()
imputeMean()
imputeMode()
imputeMin(multiplier = 1)
imputeMax(multiplier = 1)
Arguments
const (any)
Constant valued use for imputation.
multiplier (numeric(1))
Value that stored minimum or maximum is multiplied with when imputation is
done.
min (numeric(1))
Lower bound for uniform distribution. If NA (default), it will be estimated from
the data.
max (numeric(1))
Upper bound for uniform distribution. If NA (default), it will be estimated from
the data.
mu (numeric(1))
Mean of normal distribution. If missing it will be estimated from the data.
sd (numeric(1))
Standard deviation of normal distribution. If missing it will be estimated from
the data.
breaks (numeric(1))
Number of breaks to use in graphics::hist. If missing, defaults to auto-detection
via “Sturges”.
use.mids (logical(1))
If x is numeric and a histogram is used, impute with bin mids (default) or instead
draw uniformly distributed samples within bin range.
learner (Learner | character(1))
Supervised learner. Its predictions will be used for imputations. If you pass a
string the learner will be created via makeLearner. Note that the target column
is not available for this operation.
features (character)
Features to use in learner for prediction. Default is NULL which uses all avail-
able features except the target column of the original task.
See Also
Description
Allows imputation of missing feature values through various techniques. Note that you have the
possibility to re-impute a data set in the same way as the imputation was performed during training.
This especially comes in handy during resampling when one wants to perform the same imputation
on the test set as on the training set.
The function impute performs the imputation on a data set and returns, alongside with the imputed
data set, an “ImputationDesc” object which can contain “learned” coefficients and helpful data. It
can then be passed together with a new data set to reimpute.
The imputation techniques can be specified for certain features or for feature classes, see function
arguments.
You can either provide an arbitrary object, use a built-in imputation method listed under imputations
or create one yourself using makeImputeMethod.
Usage
impute(
obj,
target = character(0L),
classes = list(),
cols = list(),
dummy.classes = character(0L),
dummy.cols = character(0L),
dummy.type = "factor",
force.dummies = FALSE,
impute.new.levels = TRUE,
recode.factor.levels = TRUE
)
Arguments
obj (data.frame | Task)
Input data.
target (character)
Name of the column(s) specifying the response. Default is character(0).
classes (named list)
Named list containing imputation techniques for classes of columns. E.g. list(numeric
= imputeMedian()).
cols (named list)
Named list containing names of imputation methods to impute missing values
in the data column referenced by the list element’s name. Overrules imputation
set via classes.
impute 123
dummy.classes (character)
Classes of columns to create dummy columns for. Default is character(0).
dummy.cols (character)
Column names to create dummy columns (containing binary missing indicator)
for. Default is character(0).
dummy.type (character(1))
How dummy columns are encoded. Either as 0/1 with type “numeric” or as
“factor”. Default is “factor”.
force.dummies (logical(1))
Force dummy creation even if the respective data column does not contain any
NAs. Note that (a) most learners will complain about constant columns created
this way but (b) your feature set might be stochastic if you turn this off. Default
is FALSE.
impute.new.levels
(logical(1))
If new, unencountered factor level occur during reimputation, should these be
handled as NAs and then be imputed the same way? Default is TRUE.
recode.factor.levels
(logical(1))
Recode factor levels after reimputation, so they match the respective element
of lvls (in the description object) and therefore match the levels of the feature
factor in the training data after imputation?. Default is TRUE.
Details
The description object contains these slots
• target (character): See argument
• features (character): Feature names (column names of data)
• classes (character): Feature classes (storage type of data)
• lvls (named list): Mapping of column names of factor features to their levels, including newly
created ones during imputation
• impute (named list): Mapping of column names to imputation functions
• dummies (named list): Mapping of column names to imputation functions
• impute.new.levels (logical(1)): See argument
• recode.factor.levels (logical(1)): See argument
Value
(list)
• data (data.frame): Imputed data.
• desc (ImputationDesc): Description object.
See Also
Other impute: imputations, makeImputeMethod(), makeImputeWrapper(), reimpute()
124 isFailureModel
Examples
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3)
imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode()))
print(imputed$data)
reimpute(data.frame(x = NA_real_), imputed$desc)
Description
References
See datasets::iris.
Description
Such a model is created when one sets the corresponding option in configureMlr.
For complex wrappers this getter returns TRUE if ANY model contained in it failed.
Usage
isFailureModel(model)
Arguments
model (WrappedModel)
The model.
Value
(logical(1)).
joinClassLevels 125
joinClassLevels Join some class existing levels to new, larger class levels for classifi-
cation problems.
Description
Join some class existing levels to new, larger class levels for classification problems.
Usage
joinClassLevels(task, new.levels)
Arguments
task (Task)
The task.
new.levels (list of character)
Element names specify the new class levels to create, while the corresponding
element character vector specifies the existing class levels which will be joined
to the new one.
Value
Task.
Examples
joinClassLevels(iris.task, new.levels = list(foo = c("setosa", "virginica")))
Description
Find all elements in ... which are not missing and call control on them.
Usage
learnerArgsToControl(control, ...)
Arguments
control (function)
Function that creates control structure.
... (any)
Arguments for control structure function.
126 LearnerProperties
Value
Control structure for learner.
Description
Properties can be accessed with getLearnerProperties(learner), which returns a character vec-
tor.
The learner properties are defined as follows:
numerics, factors, ordered Can numeric, factor or ordered factor features be handled?
functionals Can an arbitrary number of functional features be handled?
single.functional Can exactly one functional feature be handled?
missings Can missing values in features be handled?
weights Can observations be weighted during fitting?
oneclas, twoclass, multiclass Only for classif: Can one-class, two-class or multi-class classifica-
tion problems be handled?
class.weights Only for classif: Can class weights be handled?
rcens, lcens, icens Only for surv: Can right, left, or interval censored data be handled?
prob For classif, cluster, multilabel, surv: Can probabilites be predicted?
se Only for regr: Can standard errors be predicted?
oobpreds Only for classif, regr and surv: Can out of bag predictions be extracted from the trained
model?
featimp For classif, regr, surv: Does the model support extracting information on feature impor-
tance?
Usage
getLearnerProperties(learner)
hasLearnerProperties(learner, props)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
props (character)
Vector of properties to query.
learners 127
Value
getLearnerProperties returns a character vector with learner properties. hasLearnerProperties
returns a logical vector of the same length as props.
See Also
Other learner: getClassWeightParam(), getHyperPars(), getLearnerId(), getLearnerNote(),
getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(), getLearnerPredictType(),
getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(), helpLearnerParam(),
makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Description
All supported learners can be found by listLearners or as a table in the tutorial appendix: https:
//mlr.mlr-org.com/articles/tutorial/integrated_learners.html.
listFilterEnsembleMethods
List ensemble filter methods.
Description
Returns a subset-able dataframe with filter information.
Usage
listFilterEnsembleMethods(desc = TRUE)
Arguments
desc (logical(1))
Provide more detailed information about filters. Default is TRUE.
Value
(data.frame).
See Also
Other filter: filterFeatures(), generateFilterValuesData(), getFilteredFeatures(), listFilterMethods(),
makeFilter(), makeFilterEnsemble(), makeFilterWrapper(), plotFilterValues()
128 listFilterMethods
Description
Usage
listFilterMethods(
desc = TRUE,
tasks = FALSE,
features = FALSE,
include.deprecated = FALSE
)
Arguments
desc (logical(1))
Provide more detailed information about filters. Default is TRUE.
tasks (logical(1))
Provide information on supported tasks. Default is FALSE.
features (logical(1))
Provide information on supported features. Default is FALSE.
include.deprecated
(logical(1))
Should deprecated filter methods be included in the list. Default is FALSE.
Value
(data.frame).
See Also
Description
This is useful for determining which learner properties are available.
Usage
listLearnerProperties(type = "any")
Arguments
type (character(1))
Only return properties for a specified task type. Default is “any”.
Value
(character).
Description
Returns learning algorithms which have specific characteristics, e.g. whether they support missing
values, case weights, etc.
Note that the packages of all learners are loaded during the search if you create them. This can be a
lot. If you do not create them we only inspect properties of the S3 classes. This will be a lot faster.
Note that for general cost-sensitive learning, mlr currently supports mainly “wrapper” approaches
like CostSensWeightedPairsWrapper, which are not listed, as they are not basic R learning al-
gorithms. The same applies for many multilabel methods, see, e.g., makeMultilabelBinaryRele-
vanceWrapper.
Usage
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = FALSE,
create = FALSE
)
130 listLearners
## Default S3 method:
listLearners(
obj = NA_character_,
properties = character(0L),
quiet = TRUE,
warn.missing.packages = TRUE,
check.packages = FALSE,
create = FALSE
)
Arguments
obj (character(1) | Task)
Either character(1) task or the type of the task, in the latter case one of: “clas-
sif” “regr” “surv” “costsens” “cluster” “multilabel”. Default is NA matching all
types.
properties (character)
Set of required properties to filter for. Default is character(0).
quiet (logical(1))
Construct learners quietly to check their properties, shows no package startup
messages. Turn off if you suspect errors. Default is TRUE.
warn.missing.packages
(logical(1))
If some learner cannot be constructed because its package is missing, should a
warning be shown? Default is TRUE.
check.packages (logical(1))
Check if required packages are installed. Calls find.package(). If create
is TRUE, this is done implicitly and the value of this parameter is ignored. If
listMeasureProperties 131
create is FALSE and check.packages is TRUE the returned table only contains
learners whose dependencies are installed. If check.packages set to FALSE,
learners that cannot actually be constructed because of missing packages may
be returned. Default is FALSE.
create (logical(1))
Instantiate objects (or return info table)? Packages are loaded if and only if this
option is TRUE. Default is FALSE.
Value
([data.frame|list‘ of Learner). Either a descriptive data.frame that allows access to all properties of
the learners or a list of created learner objects (named by ids of listed learners).
Examples
## Not run:
listLearners("classif", properties = c("multiclass", "prob"))
data = iris
task = makeClassifTask(data = data, target = "Species")
listLearners(task)
## End(Not run)
Description
Usage
listMeasureProperties()
Value
(character).
132 listTaskTypes
Description
Returns the matching measures which have specific characteristics, e.g. whether they supports
classification or regression.
Usage
listMeasures(obj, properties = character(0L), create = FALSE)
## Default S3 method:
listMeasures(obj, properties = character(0L), create = FALSE)
Arguments
obj (character(1) | Task)
Either character(1) task or the type of the task, in the latter case one of: “clas-
sif” “regr” “surv” “costsens” “cluster” “multilabel”. Default is NA matching all
types.
properties (character)
Set of required properties to filter for. See Measure for some standardized prop-
erties. Default is character(0).
create (logical(1))
Instantiate objects (or return strings)? Default is FALSE.
Value
([character|list‘ of Measure). Class names of matching measures or instantiated objects.
Description
Returns a character vector with each of the supported task types in mlr.
lung.task 133
Usage
listTaskTypes()
Value
(character).
Description
Contains the task (lung.task).
References
See survival::lung. Incomplete cases have been removed from the task.
Description
This is an advanced feature of mlr. It gives access to some inner workings so the result might not
be compatible with everything!
Usage
makeAggregation(id, name = id, properties, fun)
Arguments
id (character(1))
Name of the aggregation method (preferably the same name as the generated
function).
name (character(1))
Long name of the aggregation method. Default is id.
properties (character)
Set of aggregation properties.
req.train Are prediction or train sets required to calculate the aggregation?
req.test Are prediction or test sets required to calculate the aggregation?
fun (function(task, perf.test, perf.train, measure, group, pred))
Calculates the aggregated performance. In most cases you will only need the
performances perf.test and optionally perf.train on the test and training
data sets.
134 makeBaggingWrapper
Value
(Aggregation).
See Also
aggregations, setAggregation
Examples
# computes the interquartile range on all performance values
test.iqr = makeAggregation(
id = "test.iqr", name = "Test set interquartile range",
properties = "req.test",
fun = function(task, perf.test, perf.train, measure, group, pred) IQR(perf.test)
)
Description
Fuses a learner with the bagging method (i.e., similar to what a randomForest does). Creates a
learner object, which can be used like any other learner object. Models can easily be accessed via
getLearnerModel.
Bagging is implemented as follows: For each iteration a random data subset is sampled (with or
without replacement) and potentially the number of features is also restricted to a random subset.
Note that this is usually handled in a slightly different way in the random forest where features are
sampled at each tree split).
Prediction works as follows: For classification we do majority voting to create a discrete label and
probabilities are predicted by considering the proportions of all predicted labels. For regression the
mean value and the standard deviations across predictions is computed.
Note that the passed base learner must always have predict.type = 'response', while the Bag-
gingWrapper can estimate probabilities and standard errors, so it can be set, e.g., to predict.type
= 'prob'. For this reason, when you call setPredictType, the type is only set for the BaggingWrap-
per, not passed down to the inner learner.
makeClassificationViaRegressionWrapper 135
Usage
makeBaggingWrapper(
learner,
bw.iters = 10L,
bw.replace = TRUE,
bw.size,
bw.feats = 1
)
Arguments
Value
Learner.
See Also
makeClassificationViaRegressionWrapper
Classification via regression wrapper.
136 makeClassificationViaRegressionWrapper
Description
Builds regression models that predict for the positive class whether a particular example belongs to
it (1) or not (-1).
Probabilities are generated by transforming the predictions with a softmax.
Inspired by WEKA’s ClassificationViaRegression (http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/ClassificationV
Usage
Arguments
Value
Learner.
See Also
Examples
lrn = makeLearner("regr.rpart")
lrn = makeClassificationViaRegressionWrapper(lrn)
mod = train(lrn, sonar.task, subset = 1:140)
predictions = predict(mod, newdata = getTaskData(sonar.task)[141:208, 1:60])
makeClassifTask 137
Description
Create a classification task.
Usage
makeClassifTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
positive = NA_character_,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of
the survival time and event columns, so it has length 2. For multilabel classifi-
cation it contains the names of the logical columns that encode whether a label
is present or not and its length corresponds to the number of classes.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
138 makeClusterTask
positive (character(1))
Positive class for binary classification (otherwise ignored and set to NA). Default
is the first factor level of the target attribute.
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
See Also
Task CostSensTask ClusterTask MultilabelTask RegrTask SurvTask
Description
Create a cluster task.
Usage
makeClusterTask(
id = deparse(substitute(data)),
data,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
makeConstantClassWrapper 139
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
See Also
Task ClassifTask CostSensTask MultilabelTask RegrTask SurvTask
makeConstantClassWrapper
Wraps a classification learner to support problems where the class
label is (almost) constant.
Description
If the training data contains only a single class (or almost only a single class), this wrapper creates a
model that always predicts the constant class in the training data. In all other cases, the underlying
learner is trained and the resulting model used for predictions.
Probabilities can be predicted and will be 1 or 0 depending on whether the label matches the major-
ity class or not.
Usage
makeConstantClassWrapper(learner, frac = 0)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
frac numeric(1)
The fraction of labels in [0, 1) that can be different from the majority label.
Default is 0, which means that constant labels are only predicted if there is
exactly one label in the data.
140 makeCostMeasure
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeCostSensClassifWrappe
makeCostSensRegrWrapper(), makeDownsampleWrapper(), makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrap
makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(), makeMulticlassWrapper(),
makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrap
makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(), makeOverBaggingWrapper(),
makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWrapper(),
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Description
Creates a cost measure for non-standard classification error costs.
Usage
makeCostMeasure(
id = "costs",
minimize = TRUE,
costs,
combine = mean,
best = NULL,
worst = NULL,
name = id,
note = ""
)
Arguments
id (character(1))
Name of measure. Default is “costs”.
minimize (logical(1))
Should the measure be minimized? Otherwise you are effectively specifying a
benefits matrix. Default is TRUE.
costs (matrix)
Matrix of misclassification costs. Rows and columns have to be named with
class labels, order does not matter. Rows indicate true classes, columns pre-
dicted classes.
makeCostSensClassifWrapper 141
combine (function)
How to combine costs over all cases for a SINGLE test set? Note this is not the
same as the aggregate argument in makeMeasure You can set this as well via
setAggregation, as for any measure. Default is mean.
best (numeric(1))
Best obtainable value for measure. Default is -Inf or Inf, depending on minimize.
worst (numeric(1))
Worst obtainable value for measure. Default is Inf or -Inf, depending on
minimize.
name (character)
Name of the measure. Default is id.
note (character)
Description and additional notes for the measure. Default is “”.
Value
Measure.
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
estimateRelativeOverfitting(), makeCustomResampledMeasure(), makeMeasure(), measures,
performance(), setAggregation(), setMeasurePars()
makeCostSensClassifWrapper
Wraps a classification learner for use in cost-sensitive learning.
Description
Creates a wrapper, which can be used like any other learner object. The classification model can
easily be accessed via getLearnerModel.
This is a very naive learner, where the costs are transformed into classification labels - the label
for each case is the name of class with minimal costs. (If ties occur, the label which is better on
average w.r.t. costs over all training data is preferred.) Then the classifier is fitted to that data and
subsequently used for prediction.
Usage
makeCostSensClassifWrapper(learner)
Arguments
learner (Learner | character(1))
The classification learner. If you pass a string the learner will be created via
makeLearner.
142 makeCostSensRegrWrapper
Value
Learner.
See Also
Other costsens: makeCostSensRegrWrapper(), makeCostSensTask(), makeCostSensWeightedPairsWrapper()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensRegrWrapper(), makeDownsampleWrapper(), makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrap
makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(), makeMulticlassWrapper(),
makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrap
makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(), makeOverBaggingWrapper(),
makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWrapper(),
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
makeCostSensRegrWrapper
Wraps a regression learner for use in cost-sensitive learning.
Description
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed
via getLearnerModel.
For each class in the task, an individual regression model is fitted for the costs of that class. During
prediction, the class with the lowest predicted costs is selected.
Usage
makeCostSensRegrWrapper(learner)
Arguments
learner (Learner | character(1))
The regression learner. If you pass a string the learner will be created via make-
Learner.
Value
Learner.
See Also
Other costsens: makeCostSensClassifWrapper(), makeCostSensTask(), makeCostSensWeightedPairsWrapper()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeDownsampleWrapper(), makeDummyFeaturesWrapper(),
makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(),
makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper
makeCostSensTask 143
Description
Create a cost-sensitive classification task.
Usage
makeCostSensTask(
id = deparse(substitute(data)),
data,
costs,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
costs (data.frame)
A numeric matrix or data frame containing the costs of misclassification. We
assume the general case of observation specific costs. This means we have n
rows, corresponding to the observations, in the same order as data. The columns
correspond to classes and their names are the class labels (if unnamed we use
y1 to yk as labels). Each entry (i,j) of the matrix specifies the cost of predicting
class j for observation i.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
144 makeCostSensWeightedPairsWrapper
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
See Also
Task ClassifTask ClusterTask MultilabelTask RegrTask SurvTask
Other costsens: makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeCostSensWeightedPairsWrapper
makeCostSensWeightedPairsWrapper
Wraps a classifier for cost-sensitive learning to produce a weighted
pairs model.
Description
Creates a wrapper, which can be used like any other learner object. Models can easily be accessed
via getLearnerModel.
For each pair of labels, we fit a binary classifier. For each observation we define the label to be
the element of the pair with minimal costs. During fitting, we also weight the observation with the
absolute difference in costs. Prediction is performed by simple voting.
This approach is sometimes called cost-sensitive one-vs-one (CS-OVO), because it is obviously
very similar to the one-vs-one approach where one reduces a normal multi-class problem to multiple
binary ones and aggregates by voting.
Usage
makeCostSensWeightedPairsWrapper(learner)
Arguments
learner (Learner | character(1))
The classification learner. If you pass a string the learner will be created via
makeLearner.
Value
(Learner).
makeCustomResampledMeasure 145
References
Lin, HT.: Reduction from Cost-sensitive Multiclass Classification to One-versus-one Binary Clas-
sification. In: Proceedings of the Sixth Asian Conference on Machine Learning. JMLR Workshop
and Conference Proceedings, vol 39, pp. 371-386. JMLR W&CP (2014). https://proceedings.
mlr.press/v39/lin14.pdf
See Also
Other costsens: makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeCostSensTask()
makeCustomResampledMeasure
Construct your own resampled performance measure.
Description
Construct your own performance measure, used after resampling. Note that individual training / test
set performance values will be set to NA, you only calculate an aggregated value. If you can define
a function that makes sense for every single training / test set, implement your own Measure.
Usage
makeCustomResampledMeasure(
measure.id,
aggregation.id,
minimize = TRUE,
properties = character(0L),
fun,
extra.args = list(),
best = NULL,
worst = NULL,
measure.name = measure.id,
aggregation.name = aggregation.id,
note = ""
)
Arguments
measure.id (character(1))
Short name of measure.
aggregation.id (character(1))
Short name of aggregation.
minimize (logical(1))
Should the measure be minimized? Default is TRUE.
properties (character)
Set of measure properties. For a list of values see Measure. Default is character(0).
146 makeDownsampleWrapper
Value
Measure.
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
estimateRelativeOverfitting(), makeCostMeasure(), makeMeasure(), measures, performance(),
setAggregation(), setMeasurePars()
Description
Creates a learner object, which can be used like any other learner object. It will only be trained on
a subset of the original data to save computational time.
Usage
makeDownsampleWrapper(learner, dw.perc = 1, dw.stratify = FALSE)
makeDummyFeaturesWrapper 147
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
dw.perc (numeric(1))
See downsample. Default is 1.
dw.stratify (logical(1))
See downsample. Default is FALSE.
Value
Learner.
See Also
Other downsample: downsample()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDummyFeaturesWrapper(),
makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(),
makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
makeDummyFeaturesWrapper
Fuse learner with dummy feature creator.
Description
Fuses a base learner with the dummy feature creator (see createDummyFeatures). Returns a learner
which can be used like any other learner.
Usage
makeDummyFeaturesWrapper(learner, method = "1-of-n", cols = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
method (character(1))
Available are:
"1-of-n": For n factor levels there will be n dummy variables.
"reference": There will be n-1 dummy variables leaving out the first factor
level of each variable.
148 makeExtractFDAFeatMethod
Default is “1-of-n”.
cols (character)
Columns to create dummy features for. Default is to use all columns.
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(),
makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
makeExtractFDAFeatMethod
Constructor for FDA feature extraction methods.
Description
This can be used to implement custom FDA feature extraction. Takes a learn and a reextract
function along with some optional parameters to those as argument.
Usage
makeExtractFDAFeatMethod(learn, reextract, args = list(), par.set = NULL)
Arguments
learn (function(data, target, col, ...))
Function to learn and extract information on functional column col. Arguments
are:
• data data.frame
Data.frame containing matricies with one row per observation of a single
functional or time series and one column per meahttps://github.com/mlr-
org/mlr/pull/2005/conflict?name=R%252FextractFDAFeatures.R&ancestor_oid=bdc5d882cc86adac
time point. All entries need to be numeric.
• target (character(1))
Name of the target variable. Default: “NULL”. The variable is only set to
be consistent with the API.
• col (character(1) | numeric(1))
column names or indices, the extraction should be performed on. The func-
tion has to return a named list of values.
makeExtractFDAFeatsWrapper 149
See Also
makeExtractFDAFeatsWrapper
Fuse learner with an extractFDAFeatures method.
Description
Fuses a base learner with an extractFDAFeatures method. Creates a learner object, which can be
used like any other learner object. Internally uses extractFDAFeatures before training the learner
and reextractFDAFeatures before predicting.
Usage
Arguments
Value
Learner.
150 makeFeatSelWrapper
See Also
Other fda: extractFDAFeatures(), makeExtractFDAFeatMethod()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeFeatSelWrapper(), makeFilterWrapper(), makeImputeWrapper(),
makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Description
Fuses a base learner with a search strategy to select variables. Creates a learner object, which can
be used like any other learner object, but which internally uses selectFeatures. If the train function
is called on it, the search strategy and resampling are invoked to select an optimal set of variables.
Finally, a model is fitted on the complete training data with these variables and returned. See
selectFeatures for more details.
After training, the optimal features (and other related information) can be retrieved with getFeat-
SelResult.
Usage
makeFeatSelWrapper(
learner,
resampling,
measures,
bit.names,
bits.to.features,
control,
show.info = getMlrOption("show.info")
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
resampling (ResampleInstance | ResampleDesc)
Resampling strategy for feature selection. If you pass a description, it is instan-
tiated once at the beginning by default, so all points are evaluated on the same
training/test sets. If you want to change that behavior, look at FeatSelControl.
makeFeatSelWrapper 151
Value
Learner.
See Also
Examples
# nested resampling with feature selection (with a nonsense algorithm for selection)
outer = makeResampleDesc("CV", iters = 2L)
inner = makeResampleDesc("Holdout")
ctrl = makeFeatSelControlRandom(maxit = 1)
lrn = makeFeatSelWrapper("classif.ksvm", resampling = inner, control = ctrl)
# we also extract the selected features for all iteration here
r = resample(lrn, iris.task, outer, extract = getFeatSelResult)
152 makeFilter
Description
Creates and registers custom feature filters. Implemented filters can be listed with listFilterMeth-
ods. Additional documentation for the fun parameter specific to each filter can be found in the
description.
Usage
makeFilter(name, desc, pkg, supported.tasks, supported.features, fun)
Arguments
name (character(1))
Identifier for the filter.
desc (character(1))
Short description of the filter.
pkg (character(1))
Source package where the filter is implemented.
supported.tasks
(character)
Task types supported.
supported.features
(character)
Feature types supported.
fun (function(task, nselect, ...)
Function which takes a task and returns a named numeric vector of scores, one
score for each feature of task. Higher scores mean higher importance of the
feature. At least nselect features must be calculated, the remaining may be set
to NA or omitted, and thus will not be selected. the original order will be restored
if necessary.
Value
Object of class “Filter”.
References
Kira, Kenji and Rendell, Larry (1992). The Feature Selection Problem: Traditional Methods and a
New Algorithm. AAAI-92 Proceedings.
Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF
(1997), Applied Intelligence, 7(1), p39-55.
makeFilterEnsemble 153
See Also
Description
Creates and registers custom ensemble feature filters. Implemented ensemble filters can be listed
with listFilterEnsembleMethods. Additional documentation for the fun parameter specific to each
filter can be found in the description.
Usage
Arguments
name (character(1))
Identifier for the filter.
base.methods the base filter methods which the ensemble method will use.
desc (character(1))
Short description of the filter.
fun (function(task, nselect, ...)
Function which takes a task and returns a named numeric vector of scores, one
score for each feature of task. Higher scores mean higher importance of the
feature. At least nselect features must be calculated, the remaining may be set
to NA or omitted, and thus will not be selected. the original order will be restored
if necessary.
Value
See Also
Description
Fuses a base learner with a filter method. Creates a learner object, which can be used like any other
learner object. Internally uses filterFeatures before every model fit.
Usage
makeFilterWrapper(
learner,
fw.method = "FSelectorRcpp_information.gain",
fw.base.methods = NULL,
fw.perc = NULL,
fw.abs = NULL,
fw.threshold = NULL,
fw.fun = NULL,
fw.fun.args = NULL,
fw.mandatory.feat = NULL,
cache = FALSE,
...
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
fw.method (character(1))
Filter method. See listFilterMethods. Default is “FSelectorRcpp_information.gain”.
fw.base.methods
(character(1))
Simple Filter methods for ensemble filters. See listFilterMethods. Can only be
used in combination with ensemble filters. See listFilterEnsembleMethods.
fw.perc (numeric(1))
If set, select fw.perc*100 top scoring features. Mutually exclusive with argu-
ments fw.abs, fw.threshold and ‘fw.fun.
fw.abs (numeric(1))
If set, select fw.abs top scoring features. Mutually exclusive with arguments
fw.perc, fw.threshold and fw.fun.
fw.threshold (numeric(1))
If set, select features whose score exceeds fw.threshold. Mutually exclusive
with arguments fw.perc, fw.abs and fw.fun.
fw.fun (function))
If set, select features via a custom thresholding function, which must return the
makeFilterWrapper 155
Details
If ensemble = TRUE, ensemble feature selection using all methods specified in fw.method is per-
formed. At least two methods need to be selected.
After training, the selected features can be retrieved with getFilteredFeatures.
Note that observation weights do not influence the filtering and are simply passed down to the next
learner.
Value
Learner.
Caching
If cache = TRUE, the default mlr cache directory is used to cache filter values. The directory is
operating system dependent and can be checked with getCacheDir(). Alternatively a custom
directory can be passed to store the cache. The cache can be cleared with deleteCacheDir().
Caching is disabled by default. Care should be taken when operating on large clusters due to
possible write conflicts to disk if multiple workers try to write the same cache at the same time.
See Also
Other filter: filterFeatures(), generateFilterValuesData(), getFilteredFeatures(), listFilterEnsembleMethod
listFilterMethods(), makeFilter(), makeFilterEnsemble(), plotFilterValues()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeImputeWrapper(),
makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
156 makeFilterWrapper
Examples
lrn = makeLearner("classif.lda")
lrn = makeFilterWrapper(lrn, fw.method = "FSelectorRcpp_information.gain",
fw.fun = biggest_gap, fw.fun.args = list("diff" = 1))
r = resample(lrn, task, outer, extract = function(model) {
getFilteredFeatures(model)
})
print(r$extract)
makeFixedHoldoutInstance 157
makeFixedHoldoutInstance
Generate a fixed holdout instance for resampling.
Description
Generate a fixed holdout instance for resampling.
Usage
makeFixedHoldoutInstance(train.inds, test.inds, size)
Arguments
train.inds (integer)
Indices for training set.
test.inds (integer)
Indices for test set.
size (integer(1))
Size of the data set to resample. The function needs to know the largest possible
index of the whole data set.
Value
(ResampleInstance).
Description
To work with functional features, those features need to be stored as a matrix column in the
data.frame, so mlr can automatically recognize them as functional features. This function allows
for an easy conversion from a data.frame with numeric columns to the required format. If the data
already contains matrix columns, they are left as-is if not specified otherwise in fd.features. See
Examples for the structure of the generated output.
Usage
makeFunctionalData(data, fd.features = NULL, exclude.cols = NULL)
158 makeImputeMethod
Arguments
data (data.frame)
A data.frame that contains the functional features as numeric columns.
fd.features (list)
Named list containing integer column indices or character column names.
Each element defines a functional feature, in the given order of the indices or
column names. The name of the list element defines the name of the functional
feature. All selected columns have to correspond to numeric data.frame entries.
The default is NULL, which means all numeric features are considered to be a
single functional “fd1”.
exclude.cols (character | integer)
Column names or indices to exclude from conversion to functionals, even if they
are in included in fd.features. Default is not to exclude anything.
Value
(data.frame).
Examples
# data.frame where columns 1:6 and 8:10 belong to a functional feature
d1 = data.frame(matrix(rnorm(100), nrow = 10), "target" = seq_len(10))
# Transform to functional data
d2 = makeFunctionalData(d1, fd.features = list("fd1" = 1:6, "fd2" = 8:10))
# Create a regression task
makeRegrTask(data = d2, target = "target")
Description
This is a constructor to create your own imputation methods.
Usage
makeImputeMethod(learn, impute, args = list())
Arguments
learn (function(data, target, col, ...))
Function to learn and extract information on column col out of data frame data.
Argument target specifies the target column of the learning task. The function
has to return a named list of values.
impute (function(data, target, col, ...))
Function to impute missing values in col using information returned by learn
on the same column. All list elements of the return values o learn are passed to
this function into ....
makeImputeWrapper 159
args (list)
Named list of arguments to pass to learn via ....
See Also
Other impute: imputations, impute(), makeImputeWrapper(), reimpute()
Description
Fuses a base learner with an imputation method. Creates a learner object, which can be used like
any other learner object. Internally uses impute before training the learner and reimpute before
predicting.
Usage
makeImputeWrapper(
learner,
classes = list(),
cols = list(),
dummy.classes = character(0L),
dummy.cols = character(0L),
dummy.type = "factor",
force.dummies = FALSE,
impute.new.levels = TRUE,
recode.factor.levels = TRUE
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
classes (named list)
Named list containing imputation techniques for classes of columns. E.g. list(numeric
= imputeMedian()).
cols (named list)
Named list containing names of imputation methods to impute missing values
in the data column referenced by the list element’s name. Overrules imputation
set via classes.
dummy.classes (character)
Classes of columns to create dummy columns for. Default is character(0).
dummy.cols (character)
Column names to create dummy columns (containing binary missing indicator)
for. Default is character(0).
160 makeLearner
dummy.type (character(1))
How dummy columns are encoded. Either as 0/1 with type “numeric” or as
“factor”. Default is “factor”.
force.dummies (logical(1))
Force dummy creation even if the respective data column does not contain any
NAs. Note that (a) most learners will complain about constant columns created
this way but (b) your feature set might be stochastic if you turn this off. Default
is FALSE.
impute.new.levels
(logical(1))
If new, unencountered factor level occur during reimputation, should these be
handled as NAs and then be imputed the same way? Default is TRUE.
recode.factor.levels
(logical(1))
Recode factor levels after reimputation, so they match the respective element
of lvls (in the description object) and therefore match the levels of the feature
factor in the training data after imputation?. Default is TRUE.
Value
Learner.
See Also
Description
For a classification learner the predict.type can be set to “prob” to predict probabilities and the
maximum value selects the label. The threshold used to assign the label can later be changed using
the setThreshold function.
To see all possible properties of a learner, go to: LearnerProperties.
makeLearner 161
Usage
makeLearner(
cl,
id = cl,
predict.type = "response",
predict.threshold = NULL,
fix.factors.prediction = FALSE,
...,
par.vals = list(),
config = list()
)
Arguments
cl (character(1))
Class of learner. By convention, all classification learners start with “classif.”
all regression learners with “regr.” all survival learners start with “surv.” all
clustering learners with “cluster.” and all multilabel classification learners start
with “multilabel.”. A list of all integrated learners is available on the learners
help page.
id (character(1))
Id string for object. Used to display object. Default is cl.
predict.type (character(1))
Classification: “response” (= labels) or “prob” (= probabilities and labels by
selecting the ones with maximal probability). Regression: “response” (= mean
response) or “se” (= standard errors and mean response). Survival: “response”
(= some sort of orderable risk) or “prob” (= time dependent probabilities). Clus-
tering: “response” (= cluster IDS) or “prob” (= fuzzy cluster membership prob-
abilities), Multilabel: “response” (= logical matrix indicating the predicted class
labels) or “prob” (= probabilities and corresponding logical matrix indicating
class labels). Default is “response”.
predict.threshold
(numeric)
Threshold to produce class labels. Has to be a named vector, where names corre-
spond to class labels. Only for binary classification it can be a single numerical
threshold for the positive class. See setThreshold for details on how it is applied.
Default is NULL which means 0.5 / an equal threshold for each class.
fix.factors.prediction
(logical(1))
In some cases, problems occur in underlying learners for factor features during
prediction. If the new features have LESS factor levels than during training
(a strict subset), the learner might produce an error like “type of predictors in
new data do not match that of the training data”. In this case one can repair
this problem by setting this option to TRUE. We will simply add the missing
factor levels missing from the test feature (but present in training) to that feature.
Default is FALSE.
162 makeLearner
... (any)
Optional named (hyper)parameters. If you want to set specific hyperparameters
for a learner during model creation, these should go here. You can get a list
of available hyperparameters using getParamSet(<learner>). Alternatively
hyperparameters can be given using the par.vals argument but ... should be
preferred!
par.vals (list)
Optional list of named (hyper)parameters. The arguments in ... take prece-
dence over values in this list. We strongly encourage you to use ... for passing
hyperparameters.
config (named list)
Named list of config option to overwrite global settings set via configureMlr for
this specific learner.
Value
(Learner).
regr.randomForest
For this learner we added additional uncertainty estimation functionality (predict.type = "se")
for the randomForest, which is not provided by the underlying package.
Currently implemented methods are:
For both “jackknife” and “bootstrap”, a Monte-Carlo bias correction is applied and, in the case that
this results in a negative variance estimate, the values are truncated at 0.
Note that when using the “jackknife” procedure for se estimation, using a small number of trees can
lead to training data observations that are never out-of-bag. The current implementation ignores
these observations, but in the original definition, the resulting se estimation would be undefined.
Please note that all of the mentioned se.method variants do not affect the computation of the pos-
terior mean “response” value. This is always the same as from the underlying randomForest.
regr.featureless
A very basic baseline method which is useful for model comparisons (if you don’t beat this, you very
likely have a problem). Does not consider any features of the task and only uses the target feature
of the training data to make predictions. Using observation weights is currently not supported.
Methods “mean” and “median” always predict a constant value for each new observation which
corresponds to the observed mean or median of the target feature in training data, respectively.
The default method is “mean” which corresponds to the ZeroR algorithm from WEKA.
classif.featureless
Method “majority” predicts always the majority class for each new observation. In the case of ties,
one randomly sampled, constant class is predicted for all observations in the test set. This method
is used as the default. It is very similar to the ZeroR classifier from WEKA. The only difference is
that ZeroR always predicts the first class of the tied class values instead of sampling them randomly.
Method “sample-prior” always samples a random class for each individual test observation accord-
ing to the prior probabilities observed in the training data.
If you opt to predict probabilities, the class probabilities always correspond to the prior probabilities
observed in the training data.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(),
helpLearnerParam(), makeLearners(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Examples
makeLearner("classif.rpart")
makeLearner("classif.lda", predict.type = "prob")
lrn = makeLearner("classif.lda", method = "t", nu = 10)
getHyperPars(lrn)
164 makeLearners
Description
Small helper function that can save some typing when creating mutiple learner objects. Calls make-
Learner multiple times internally.
Usage
makeLearners(cls, ids = NULL, type = NULL, ...)
Arguments
cls (character)
Classes of learners.
ids (character)
Id strings. Must be unique. Default is cls.
type (character(1))
Shortcut to prepend type string to cls so one can set cls = "rpart". Default is
NULL, i.e., this is not used.
... (any)
Optional named (hyper)parameters. If you want to set specific hyperparameters
for a learner during model creation, these should go here. You can get a list
of available hyperparameters using getParamSet(<learner>). Alternatively
hyperparameters can be given using the par.vals argument but ... should be
preferred!
Value
(named list of Learner). Named by ids.
See Also
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(),
helpLearnerParam(), makeLearner(), removeHyperPars(), setHyperPars(), setId(), setLearnerId(),
setPredictThreshold(), setPredictType()
Examples
Description
A measure object encapsulates a function to evaluate the performance of a prediction. Information
about already implemented measures can be obtained here: measures.
A learner is trained on a training set d1, results in a model m and predicts another set d2 (which
may be a different one or the training set) resulting in the prediction. The performance measure can
now be defined using all of the information of the original task, the fitted model and the prediction.
Usage
makeMeasure(
id,
minimize,
properties = character(0L),
fun,
extra.args = list(),
aggr = test.mean,
best = NULL,
worst = NULL,
name = id,
note = ""
)
Arguments
id (character(1))
Name of measure.
minimize (logical(1))
Should the measure be minimized? Default is TRUE.
properties (character)
Set of measure properties. Some standard property names include: - classif: Is
the measure applicable for classification? - classif.multi: Is the measure appli-
cable for multi-class classification? - multilabel: Is the measure applicable for
multilabel classification? - regr: Is the measure applicable for regression? -
surv: Is the measure applicable for survival? - cluster: Is the measure applicable
for cluster? - costsens: Is the measure applicable for cost-sensitive learning?
- req.pred: Is prediction object required in calculation? Usually the case. -
req.truth: Is truth column required in calculation? Usually the case. - req.task:
Is task object required in calculation? Usually not the case - req.model: Is model
object required in calculation? Usually not the case. - req.feats: Are feature val-
ues required in calculation? Usually not the case. - req.prob: Are predicted
probabilities required in calculation? Usually not the case, example would be
AUC.
Default is character(0).
166 makeMeasure
Value
Measure.
See Also
Examples
makeModelMultiplexer Create model multiplexer for model selection to tune over multiple
possible models.
Description
Usage
makeModelMultiplexer(base.learners)
Arguments
Value
Note
Note that logging output during tuning is somewhat shortened to make it more readable. I.e., the
artificial prefix before parameter names is suppressed.
See Also
Examples
set.seed(123)
library(BBmisc)
bls = list(
makeLearner("classif.ksvm"),
makeLearner("classif.randomForest")
)
lrn = makeModelMultiplexer(bls)
# simple way to contruct param set for tuning
# parameter names are prefixed automatically and the 'requires'
# element is set, too, to make all paramaters subordinate to 'selected.learner'
ps = makeModelMultiplexerParamSet(lrn,
makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x) 2^x),
makeIntegerParam("ntree", lower = 1L, upper = 500L)
)
print(ps)
rdesc = makeResampleDesc("CV", iters = 2L)
# to save some time we use random search. but you probably want something like this:
# ctrl = makeTuneControlIrace(maxExperiments = 500L)
ctrl = makeTuneControlRandom(maxit = 10L)
res = tuneParams(lrn, iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
df = as.data.frame(res$opt.path)
print(head(df[, -ncol(df)]))
# this is how you would construct the param set manually, works too
ps = makeParamSet(
makeDiscreteParam("selected.learner", values = extractSubList(bls, "id")),
makeNumericParam("classif.ksvm.sigma", lower = -10, upper = 10, trafo = function(x) 2^x,
requires = quote(selected.learner == "classif.ksvm")),
makeIntegerParam("classif.randomForest.ntree", lower = 1L, upper = 500L,
requires = quote(selected.learner == "classif.randomForst"))
)
makeModelMultiplexerParamSet
Creates a parameter set for model multiplexer tuning.
Description
Handy way to create the param set with less typing.
The following is done automatically:
• The selected.learner param is created
• Parameter names are prefixed.
• The requires field of each param is set. This makes all parameters subordinate to selected.learner
Usage
makeModelMultiplexerParamSet(multiplexer, ..., .check = TRUE)
Arguments
multiplexer (ModelMultiplexer)
The muliplexer learner.
... (ParamHelpers::ParamSet | ParamHelpers::Param)
(a) First option: Named param sets. Names must correspond to base learners.
You only need to enter the parameters you want to tune without reference to the
selected.learner field in any way.
(b) Second option. Just the params you would enter in the param sets. Even
shorter to create. Only works when it can be uniquely identified to which learner
each of your passed parameters belongs.
.check (logical)
Check that for each param in ... one param in found in the base learners. De-
fault is TRUE
Value
ParamSet.
See Also
Other multiplexer: makeModelMultiplexer()
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeTuneControlCMAES(),
makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Examples
# See makeModelMultiplexer
170 makeMulticlassWrapper
Description
Fuses a base learner with a multi-class method. Creates a learner object, which can be used like any
other learner object. This way learners which can only handle binary classification will be able to
handle multi-class problems, too.
We use a multiclass-to-binary reduction principle, where multiple binary problems are created from
the multiclass task. How these binary problems are generated is defined by an error-correcting-
output-code (ECOC) code book. This also allows the simple and well-known one-vs-one and one-
vs-rest approaches. Decoding is currently done via Hamming decoding, see e.g. here https:
//jmlr.org/papers/volume11/escalera10a/escalera10a.pdf.
Currently, the approach always operates on the discrete predicted labels of the binary base models
(instead of their probabilities) and the created wrapper cannot predict posterior probabilities.
Usage
makeMulticlassWrapper(learner, mcw.method = "onevsrest")
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
mcw.method (character(1) | function)
“onevsone” or “onevsrest”. You can also pass a function, with signature function(task)
and which returns a ECOC codematrix with entries +1,-1,0. Columns define new
binary problems, rows correspond to classes (rows must be named). 0 means
class is not included in binary problem. Default is “onevsrest”.
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMultilabelBinaryRelevanceWrapper(), makeMultilabelClassifierChainsWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
makeMultilabelBinaryRelevanceWrapper 171
makeMultilabelBinaryRelevanceWrapper
Use binary relevance method to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be con-
verted to a wrapped binary relevance multilabel learner. The multilabel classification problem is
converted into simple binary classifications for each label/target on which the binary learner is ap-
plied.
Models can easily be accessed via getLearnerModel.
Note that it does not make sense to set a threshold in the used base learner when you pre-
dict probabilities. On the other hand, it can make a lot of sense, to call setThreshold on the
MultilabelBinaryRelevanceWrapper for each label indvidually; Or to tune these thresholds with
tuneThreshold; especially when you face very unabalanced class distributions for each binary label.
Usage
makeMultilabelBinaryRelevanceWrapper(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
Learner.
References
Tsoumakas, G., & Katakis, I. (2006) Multi-label classification: An overview. Dept. of Informatics,
Aristotle University of Thessaloniki, Greece.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelClassifierChainsWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Other multilabel: getMultilabelBinaryPerformances(), makeMultilabelClassifierChainsWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper()
172 makeMultilabelClassifierChainsWrapper
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
makeMultilabelClassifierChainsWrapper
Use classifier chains method (CC) to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be con-
verted to a wrapped classifier chains multilabel learner. CC trains a binary classifier for each label
following a given order. In training phase, the feature space of each classifier is extended with true
label information of all previous labels in the chain. During the prediction phase, when true labels
are not available, they are replaced by predicted labels.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelClassifierChainsWrapper(learner, order = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
order (character)
Specifies the chain order using the names of the target labels. E.g. for m target
labels, this must be a character vector of length m that contains a permutation
of the target label names. Default is NULL which uses a random ordering of the
target label names.
Value
Learner.
makeMultilabelDBRWrapper 173
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artifi-
cial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper(),
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Other multilabel: getMultilabelBinaryPerformances(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStackingWrapper()
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
makeMultilabelDBRWrapper
Use dependent binary relevance method (DBR) to create a multilabel
learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be con-
verted to a wrapped DBR multilabel learner. The multilabel classification problem is converted
into simple binary classifications for each label/target on which the binary learner is applied. For
each target, actual information of all binary labels (except the target variable) is used as additional
features. During prediction these labels need are obtained by the binary relevance method using the
same binary learner.
Models can easily be accessed via getLearnerModel.
174 makeMultilabelDBRWrapper
Usage
makeMultilabelDBRWrapper(learner)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
Value
Learner.
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artifi-
cial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStacking
makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCaret(), makeRemoveConstantFeaturesWr
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Other multilabel: getMultilabelBinaryPerformances(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelNestedStackingWrapper(), makeMultilabelStacking
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
makeMultilabelNestedStackingWrapper 175
makeMultilabelNestedStackingWrapper
Use nested stacking method to create a multilabel learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be con-
verted to a wrapped nested stacking multilabel learner. Nested stacking trains a binary classifier for
each label following a given order. In training phase, the feature space of each classifier is extended
with predicted label information (by cross validation) of all previous labels in the chain. During
the prediction phase, predicted labels are obtained by the classifiers, which have been learned on all
training data.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelNestedStackingWrapper(learner, order = NULL, cv.folds = 2)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
order (character)
Specifies the chain order using the names of the target labels. E.g. for m target
labels, this must be a character vector of length m that contains a permutation
of the target label names. Default is NULL which uses a random ordering of the
target label names.
cv.folds (integer(1))
The number of folds for the inner cross validation method to predict labels for
the augmented feature space. Default is 2.
Value
Learner.
References
Montanes, E. et al. (2013), Dependent binary relevance models for multi-label classification Artifi-
cial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
176 makeMultilabelStackingWrapper
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
makeMultilabelStackingWrapper
Use stacking method (stacked generalization) to create a multilabel
learner.
Description
Every learner which is implemented in mlr and which supports binary classification can be con-
verted to a wrapped stacking multilabel learner. Stacking trains a binary classifier for each label
using predicted label information of all labels (including the target label) as additional features (by
cross validation). During prediction these labels need are obtained by the binary relevance method
using the same binary learner.
Models can easily be accessed via getLearnerModel.
Usage
makeMultilabelStackingWrapper(learner, cv.folds = 2)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
cv.folds (integer(1))
The number of folds for the inner cross validation method to predict labels for
the augmented feature space. Default is 2.
makeMultilabelTask 177
Value
Learner.
References
Montanes, E. et al. (2013) Dependent binary relevance models for multi-label classification Artifi-
cial Intelligence Center, University of Oviedo at Gijon, Spain.
See Also
Examples
if (requireNamespace("rpart")) {
d = getTaskData(yeast.task)
# drop some labels so example runs faster
d = d[seq(1, nrow(d), by = 20), c(1:2, 15:17)]
task = makeMultilabelTask(data = d, target = c("label1", "label2"))
lrn = makeLearner("classif.rpart")
lrn = makeMultilabelBinaryRelevanceWrapper(lrn)
lrn = setPredictType(lrn, "prob")
# train, predict and evaluate
mod = train(lrn, task)
pred = predict(mod, task)
performance(pred, measure = list(multilabel.hamloss, multilabel.subset01, multilabel.f1))
# the next call basically has the same structure for any multilabel meta wrapper
getMultilabelBinaryPerformances(pred, measures = list(mmce, auc))
# above works also with predictions from resample!
}
Description
Usage
makeMultilabelTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of
the survival time and event columns, so it has length 2. For multilabel classifi-
cation it contains the names of the logical columns that encode whether a label
is present or not and its length corresponds to the number of classes.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
makeOverBaggingWrapper 179
Details
For multilabel classification we assume that the presence of labels is encoded via logical columns
in data. The name of the column specifies the name of the label. target is then a char vector that
points to these columns.
Note
For multilabel classification we assume that the presence of labels is encoded via logical columns
in data. The name of the column specifies the name of the label. target is then a char vector that
points to these columns.
See Also
makeOverBaggingWrapper
Fuse learner with the bagging technique and oversampling for imbal-
ancy correction.
Description
Fuses a classification learner for binary classification with an over-bagging method for imbalancy
correction when we have strongly unequal class sizes. Creates a learner object, which can be used
like any other learner object. Models can easily be accessed via getLearnerModel.
OverBagging is implemented as follows: For each iteration a random data subset is sampled. Class
examples are oversampled with replacement with a given rate. Members of the other class are either
simply copied into each bag, or bootstrapped with replacement until we have as many majority class
examples as in the original training data. Features are currently not changed or sampled.
Prediction works as follows: For classification we do majority voting to create a discrete label and
probabilities are predicted by considering the proportions of all predicted labels.
Usage
makeOverBaggingWrapper(
learner,
obw.iters = 10L,
obw.rate = 1,
obw.maxcl = "boot",
obw.cl = NULL
)
180 makePreprocWrapper
Arguments
Value
Learner.
See Also
Description
Fuses a base learner with a preprocessing method. Creates a learner object, which can be used like
any other learner object, but which internally preprocesses the data as requested. If the train or
predict function is called on data / a task, the preprocessing is always performed automatically.
makePreprocWrapper 181
Usage
makePreprocWrapper(
learner,
train,
predict,
par.set = makeParamSet(),
par.vals = list()
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
train (function(data, target, args))
Function to preprocess the data before training. target is a string and denotes
the target variable in data. args is a list of further arguments and parameters to
influence the preprocessing. Must return a list(data, control), where data
is the preprocessed data and control stores all information necessary to do the
preprocessing before predictions.
predict (function(data, target, args, control))
Function to preprocess the data before prediction. target is a string and denotes
the target variable in data. args are the args that were passed to train. control
is the object you returned in train. Must return the processed data.
par.set (ParamHelpers::ParamSet)
Parameter set of ParamHelpers::LearnerParam objects to describe the parame-
ters in args. Default is empty set.
par.vals (list)
Named list of default values for params in args respectively par.set. Default
is empty list.
Value
(Learner).
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapperCaret(),
makeRemoveConstantFeaturesWrapper(), makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(),
makeWeightedClassesWrapper()
182 makePreprocWrapperCaret
makePreprocWrapperCaret
Fuse learner with preprocessing.
Description
Fuses a learner with preprocessing methods provided by caret::preProcess. Before training the
preprocessing will be performed and the preprocessing model will be stored. Before prediction the
preprocessing model will transform the test data according to the trained model.
After being wrapped the learner will support missing values although this will only be the case if
ppc.knnImpute, ppc.bagImpute or ppc.medianImpute is set to TRUE.
Usage
makePreprocWrapperCaret(learner, ...)
Arguments
Value
Learner.
See Also
Description
Create a regression task.
Usage
makeRegrTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of
the survival time and event columns, so it has length 2. For multilabel classifi-
cation it contains the names of the logical columns that encode whether a label
is present or not and its length corresponds to the number of classes.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
184 makeRemoveConstantFeaturesWrapper
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
See Also
Task ClassifTask CostSensTask ClusterTask MultilabelTask SurvTask
makeRemoveConstantFeaturesWrapper
Fuse learner with removal of constant features preprocessing.
Description
Fuses a base learner with the preprocessing implemented in removeConstantFeatures.
Usage
makeRemoveConstantFeaturesWrapper(
learner,
perc = 0,
dont.rm = character(0L),
na.ignore = FALSE,
wrap.tol = .Machine$double.eps^0.5
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
perc (numeric(1))
The percentage of a feature values in [0, 1) that must differ from the mode value.
Default is 0, which means only constant features with exactly one observed level
are removed.
dont.rm (character)
Names of the columns which must not be deleted. Default is no columns.
na.ignore (logical(1))
Should NAs be ignored in the percentage calculation? (Or should they be treated
as a single, extra level in the percentage calculation?) Note that if the feature
has only missing values, it is always removed. Default is FALSE.
wrap.tol (numeric(1))
Numerical tolerance to treat two numbers as equal. Variables stored as double
will get rounded accordingly before computing the mode. Default is sqrt(.Maschine$double.eps).
makeResampleDesc 185
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCare
makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper(), makeWeightedClassesWrapper()
Description
A description of a resampling algorithm contains all necessary information to create a ResampleIn-
stance, when given the size of the data set.
Usage
makeResampleDesc(
method,
predict = "test",
...,
stratify = FALSE,
stratify.cols = NULL,
fixed = FALSE,
blocking.cv = FALSE
)
Arguments
method (character(1))
“CV” for cross-validation, “LOO” for leave-one-out, “RepCV” for repeated
cross-validation, “Bootstrap” for out-of-bag bootstrap, “Subsample” for sub-
sampling, “Holdout” for holdout, “GrowingWindowCV” for growing window
cross-validation, “FixedWindowCV” for fixed window cross validation.
predict (character(1))
What to predict during resampling: “train”, “test” or “both” sets. Default is
“test”.
... (any)
Further parameters for strategies.
186 makeResampleDesc
= TRUE and cannot be combined. Please check the mlr online tutorial for more
details.
Details
Some notes on some special strategies:
Repeated cross-validation Use “RepCV”. Then you have to set the aggregation function for your
preferred performance measure to “testgroup.mean” via setAggregation.
B632 bootstrap Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the
aggregation function for your preferred performance measure to “b632” via setAggregation.
B632+ bootstrap Use “Bootstrap” for bootstrap and set predict to “both”. Then you have to set the
aggregation function for your preferred performance measure to “b632plus” via setAggrega-
tion.
Fixed Holdout set Use makeFixedHoldoutInstance.
Object slots:
Value
(ResampleDesc).
hout holdout a.k.a. test sample estimation (two-thirds training set, one-third testing set)
cv2 2-fold cross-validation
cv3 3-fold cross-validation
cv5 5-fold cross-validation
cv10 10-fold cross-validation
See Also
Other resample: ResamplePrediction, ResampleResult, addRRMeasure(), getRRPredictionList(),
getRRPredictions(), getRRTaskDesc(), getRRTaskDescription(), makeResampleInstance(),
resample()
188 makeResampleInstance
Examples
# Bootstraping
makeResampleDesc("Bootstrap", iters = 10)
makeResampleDesc("Bootstrap", iters = 10, predict = "both")
# Subsampling
makeResampleDesc("Subsample", iters = 10, split = 3 / 4)
makeResampleDesc("Subsample", iters = 10)
Description
This class encapsulates training and test sets generated from the data set for a number of iterations.
It mainly stores a set of integer vectors indicating the training and test examples for each iteration.
Usage
makeResampleInstance(desc, task, size, ...)
Arguments
desc (ResampleDesc | character(1))
Resampling description object or name of resampling strategy. In the latter case
makeResampleDesc will be called internally on the string.
task (Task)
Data of task to resample from. Prefer to pass this instead of size.
size (integer)
Size of the data set to resample. Can be used instead of task.
... (any)
Passed down to makeResampleDesc in case you passed a string in desc. Other-
wise ignored.
Details
Object slots:
desc (ResampleDesc) See argument.
size (integer(1)) See argument.
train.inds (list of integer) List of of training indices for all iterations.
test.inds (list of integer) List of of test indices for all iterations.
group (factor) Optional grouping of resampling iterations. This encodes whether specific itera-
tions ’belong together’ (e.g. repeated CV), and it can later be used to aggregate performance
values accordingly. Default is ’factor()’.
makeRLearner.classif.fdausc.glm 189
Value
(ResampleInstance).
See Also
Other resample: ResamplePrediction, ResampleResult, addRRMeasure(), getRRPredictionList(),
getRRPredictions(), getRRTaskDesc(), getRRTaskDescription(), makeResampleDesc(), resample()
Examples
rdesc = makeResampleDesc("Bootstrap", iters = 10)
rin = makeResampleInstance(rdesc, task = iris.task)
makeRLearner.classif.fdausc.glm
Classification of functional data by Generalized Linear Models.
Description
Learner for classification using Generalized Linear Models.
Usage
## S3 method for class 'classif.fdausc.glm'
makeRLearner()
makeRLearner.classif.fdausc.kernel
Learner for kernel classification for functional data.
Description
Learner for kernel Classification.
Usage
## S3 method for class 'classif.fdausc.kernel'
makeRLearner()
190 makeSMOTEWrapper
makeRLearner.classif.fdausc.np
Learner for nonparametric classification for functional data.
Description
Learner for Nonparametric Supervised Classification.
Usage
## S3 method for class 'classif.fdausc.np'
makeRLearner()
Description
Creates a learner object, which can be used like any other learner object. Internally uses smote
before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next
learner.
Usage
makeSMOTEWrapper(
learner,
sw.rate = 1,
sw.nn = 5L,
sw.standardize = TRUE,
sw.alt.logic = FALSE
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
sw.rate (numeric(1))
Factor to oversample the smaller class. Must be between 1 and Inf, where 1
means no oversampling and 2 would mean doubling the class size. Default is 1.
sw.nn (integer(1))
Number of nearest neighbors to consider. Default is 5.
makeStackedLearner 191
sw.standardize (logical(1))
Standardize input variables before calculating the nearest neighbors for data sets
with numeric input variables only. For mixed variables (numeric and factor) the
gower distance is used and variables are standardized anyway. Default is TRUE.
sw.alt.logic (logical(1))
Use an alternative logic for selection of minority class observations. Instead
of sampling a minority class element AND one of its nearest neighbors, each
minority class element is taken multiple times (depending on rate) for the in-
terpolation and only the corresponding nearest neighbor is sampled. Default is
FALSE.
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCare
makeRemoveConstantFeaturesWrapper(), makeTuneWrapper(), makeUndersampleWrapper(),
makeWeightedClassesWrapper()
Description
A stacked learner uses predictions of several base learners and fits a super learner using these pre-
dictions as features in order to predict the outcome. The following stacking methods are available:
• average
Averaging of base learner predictions without weights.
• stack.nocv
Fits the super learner, where in-sample predictions of the base learners are used.
• stack.cv
Fits the super learner, where the base learner predictions are computed by cross-validated
predictions (the resampling strategy can be set via the resampling argument).
• hill.climb
Select a subset of base learner predictions by hill climbing algorithm.
• compress
Train a neural network to compress the model from a collection of base learners.
192 makeStackedLearner
Usage
makeStackedLearner(
base.learners,
super.learner = NULL,
predict.type = NULL,
method = "stack.nocv",
use.feat = FALSE,
resampling = NULL,
parset = list()
)
Arguments
resampling (ResampleDesc)
Resampling strategy for method = 'stack.cv'. Currently only CV is allowed
for resampling. The default NULL uses 5-fold CV.
parset the parameters for hill.climb method, including
• replace
Whether a base learner can be selected more than once.
• init
Number of best models being included before the selection algorithm.
• bagprob
The proportion of models being considered in one round of selection.
• bagtime
The number of rounds of the bagging selection.
• metric
The result evaluation metric function taking two parameters pred and true,
the smaller the score the better.
the parameters for compress method, including
• k
the size multiplier of the generated data
• prob
the probability to exchange values
• s
the standard deviation of each numerical feature
Examples
# Classification
data(iris)
tsk = makeClassifTask(data = iris, target = "Species")
base = c("classif.rpart", "classif.lda", "classif.svm")
lrns = lapply(base, makeLearner)
lrns = lapply(lrns, setPredictType, "prob")
m = makeStackedLearner(base.learners = lrns,
predict.type = "prob", method = "hill.climb")
tmp = train(m, tsk)
res = predict(tmp, tsk)
# Regression
data(BostonHousing, package = "mlbench")
tsk = makeRegrTask(data = BostonHousing, target = "medv")
base = c("regr.rpart", "regr.svm")
lrns = lapply(base, makeLearner)
m = makeStackedLearner(base.learners = lrns,
predict.type = "response", method = "compress")
tmp = train(m, tsk)
res = predict(tmp, tsk)
194 makeSurvTask
Description
Create a survival task.
Usage
makeSurvTask(
id = deparse(substitute(data)),
data,
target,
weights = NULL,
blocking = NULL,
coordinates = NULL,
fixup.data = "warn",
check.data = TRUE
)
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of
the survival time and event columns, so it has length 2. For multilabel classifi-
cation it contains the names of the logical columns that encode whether a label
is present or not and its length corresponds to the number of classes.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
makeTuneControlCMAES 195
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
See Also
Task ClassifTask ClusterTask CostSensTask MultilabelTask RegrTask
Description
CMA Evolution Strategy with method cmaes::cma_es. Can handle numeric(vector) and integer(vector)
hyperparameters, but no dependencies. For integers the internally proposed numeric values are
automatically rounded. The sigma variance parameter is initialized to 1/4 of the span of box-
constraints per parameter dimension.
Usage
makeTuneControlCMAES(
same.resampling.instance = TRUE,
impute.val = NULL,
start = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
196 makeTuneControlCMAES
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
start (list)
Named list of initial parameter values.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
budget (integer(1))
Maximum budget for tuning. This value restricts the number of function eval-
uations. The budget corresponds to the product of the number of generations
(maxit) and the number of offsprings per generation (lambda).
... (any)
Further control parameters passed to the control arguments of cmaes::cma_es
or GenSA::GenSA, as well as towards the tunerConfig argument of irace::irace.
Value
(TuneControlCMAES)
makeTuneControlDesign 197
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
makeTuneControlDesign Create control object for hyperparameter tuning with predefined de-
sign.
Description
Completely pre-specifiy a data.frame of design points to be evaluated during tuning. All kinds of
parameter types can be handled.
Usage
makeTuneControlDesign(
same.resampling.instance = TRUE,
impute.val = NULL,
design = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default"
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
design (data.frame)
data.frame containing the different parameter settings to be evaluated. The
columns have to be named according to the ParamSet which will be used in
198 makeTuneControlGenSA
Value
(TuneControlDesign)
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Description
Generalized simulated annealing with method GenSA::GenSA. Can handle numeric(vector) and
integer(vector) hyperparameters, but no dependencies. For integers the internally proposed numeric
values are automatically rounded.
Usage
makeTuneControlGenSA(
same.resampling.instance = TRUE,
impute.val = NULL,
start = NULL,
makeTuneControlGenSA 199
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
start (list)
Named list of initial parameter values.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
200 makeTuneControlGrid
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
budget (integer(1))
Maximum budget for tuning. This value restricts the number of function evalu-
ations. GenSA::GenSA defines the budget via the argument max.call. How-
ever, one should note that this algorithm does not stop its local search before its
end. This behavior might lead to an extension of the defined budget and will
result in a warning.
... (any)
Further control parameters passed to the control arguments of cmaes::cma_es
or GenSA::GenSA, as well as towards the tunerConfig argument of irace::irace.
Value
(TuneControlGenSA).
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
makeTuneControlGrid Create control object for hyperparameter tuning with grid search.
Description
A basic grid search can handle all kinds of parameter types. You can either use their correct param
type and resolution, or discretize them yourself by always using ParamHelpers::makeDiscreteParam
in the par.set passed to tuneParams.
Usage
makeTuneControlGrid(
same.resampling.instance = TRUE,
impute.val = NULL,
resolution = 10L,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
makeTuneControlGrid 201
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
resolution (integer)
Resolution of the grid for each numeric/integer parameter in par.set. For vec-
tor parameters, it is the resolution per dimension. Either pass one resolution
for all parameters, or a named vector. See ParamHelpers::generateGridDesign.
Default is 10.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
budget (integer(1))
Maximum budget for tuning. This value restricts the number of function evalu-
ations. If set, must equal the size of the grid.
202 makeTuneControlIrace
Value
(TuneControlGrid)
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Description
Tuning with iterated F-Racing with method irace::irace. All kinds of parameter types can be han-
dled. We return the best of the final elite candidates found by irace in the last race. Its estimated
performance is the mean of all evaluations ever done for that candidate. More information on irace
can be found in package vignette: vignette("irace-package", package = "irace")
For resampling you have to pass a ResampleDesc, not a ResampleInstance. The resampling strat-
egy is randomly instantiated n.instances times and these are the instances in the sense of irace
(instances element of tunerConfig in irace::irace). Also note that irace will always store its tun-
ing results in a file on disk, see the package documentation for details on this and how to change
the file path.
Usage
makeTuneControlIrace(
impute.val = NULL,
n.instances = 100L,
show.irace.output = FALSE,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
Arguments
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
makeTuneControlIrace 203
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
n.instances (integer(1))
Number of random resampling instances for irace, see details. Default is 100.
show.irace.output
(logical(1))
Show console output of irace while tuning? Default is FALSE.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
budget (integer(1))
Maximum budget for tuning. This value restricts the number of function evalu-
ations. It is passed to maxExperiments.
... (any)
Further control parameters passed to the control arguments of cmaes::cma_es
or GenSA::GenSA, as well as towards the tunerConfig argument of irace::irace.
Value
(TuneControlIrace)
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
204 makeTuneControlMBO
Description
Model-based / Bayesian optimization with the function mlrMBO::mbo from the mlrMBO package.
Please refer to https://github.com/mlr-org/mlrMBO for further info.
Usage
makeTuneControlMBO(
same.resampling.instance = TRUE,
impute.val = NULL,
learner = NULL,
mbo.control = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
continue = FALSE,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
mbo.design = NULL
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
learner (Learner | NULL)
The surrogate learner: A regression learner to model performance landscape.
For the default, NULL, mlrMBO will automatically create a suitable learner
based on the rules described in mlrMBO::makeMBOLearner.
makeTuneControlMBO 205
Value
(TuneControlMBO)
References
Bernd Bischl, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas and Michel Lang; ml-
rMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions,
Preprint: https://arxiv.org/abs/1703.03373 (2017).
206 makeTuneControlRandom
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(),
makeTuneControlIrace(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
makeTuneControlRandom Create control object for hyperparameter tuning with random search.
Description
Random search. All kinds of parameter types can be handled.
Usage
makeTuneControlRandom(
same.resampling.instance = TRUE,
maxit = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
maxit (integer(1) | NULL)
Number of iterations for random search. Default is 100.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
makeTuneWrapper 207
Value
(TuneControlRandom)
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(),
makeTuneControlIrace(), makeTuneControlMBO(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Description
Fuses a base learner with a search strategy to select its hyperparameters. Creates a learner object,
which can be used like any other learner object, but which internally uses tuneParams. If the train
function is called on it, the search strategy and resampling are invoked to select an optimal set of
hyperparameter values. Finally, a model is fitted on the complete training data with these optimal
hyperparameters and returned. See tuneParams for more details.
After training, the optimal hyperparameters (and other related information) can be retrieved with
getTuneResult.
Usage
makeTuneWrapper(
learner,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info")
)
208 makeTuneWrapper
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
resampling (ResampleInstance | ResampleDesc)
Resampling strategy to evaluate points in hyperparameter space. If you pass a
description, it is instantiated once at the beginning by default, so all points are
evaluated on the same training/test sets. If you want to change that behavior,
look at TuneControl.
measures (list of Measure | Measure)
Performance measures to evaluate. The first measure, aggregated by the first
aggregation function is optimized, others are simply evaluated. Default is the
default measure for the task, see here getDefaultMeasure.
par.set (ParamHelpers::ParamSet)
Collection of parameters and their constraints for optimization. Dependent pa-
rameters with a requires field must use quote and not expression to define
it.
control (TuneControl)
Control object for search method. Also selects the optimization algorithm for
tuning.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
Value
Learner.
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(),
makeTuneControlIrace(), makeTuneControlMBO(), makeTuneControlRandom(), tuneParams(),
tuneThreshold()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCare
makeRemoveConstantFeaturesWrapper(), makeSMOTEWrapper(), makeUndersampleWrapper(),
makeWeightedClassesWrapper()
Examples
makeUndersampleWrapper
Fuse learner with simple ove/underrsampling for imbalancy correc-
tion in binary classification.
Description
Creates a learner object, which can be used like any other learner object. Internally uses oversample
or undersample before every model fit.
Note that observation weights do not influence the sampling and are simply passed down to the next
learner.
Usage
makeUndersampleWrapper(learner, usw.rate = 1, usw.cl = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
usw.rate (numeric(1))
Factor to downsample a class. Must be between 0 and 1, where 1 means no
downsampling, 0.5 implies reduction to 50 percent and 0 would imply reduction
to 0 observations. Default is 1.
210 makeWeightedClassesWrapper
usw.cl (character(1))
Class that should be undersampled. Default is NULL, which means the larger
one.
osw.rate (numeric(1))
Factor to oversample a class. Must be between 1 and Inf, where 1 means no
oversampling and 2 would mean doubling the class size. Default is 1.
osw.cl (character(1))
Class that should be oversampled. Default is NULL, which means the smaller
one.
Value
Learner.
See Also
Other imbalancy: makeOverBaggingWrapper(), oversample(), smote()
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCare
makeRemoveConstantFeaturesWrapper(), makeSMOTEWrapper(), makeTuneWrapper(), makeWeightedClassesWrapper(
makeWeightedClassesWrapper
Wraps a classifier for weighted fitting where each class receives a
weight.
Description
Creates a wrapper, which can be used like any other learner object.
Fitting is performed in a weighted fashion where each observation receives a weight, depending on
the class it belongs to, see wcw.weight. This might help to mitigate problems caused by imbalanced
class distributions.
This weighted fitting can be achieved in two ways:
a) The learner already has a parameter for class weighting, so one weight can directly be defined
per class. Example: “classif.ksvm” and parameter class.weights. In this case we don’t really
do anything fancy. We convert wcw.weight a bit, but basically simply bind its value to the class
weighting param. The wrapper in this case simply offers a convenient, consistent fashion for class
weighting - and tuning! See example below.
b) The learner does not have a direct parameter to support class weighting, but supports observation
weights, so hasLearnerProperties(learner, 'weights') is TRUE. This means that an individ-
ual, arbitrary weight can be set per observation during training. We set this weight depending on
the class internally in the wrapper. Basically we introduce something like a new “class.weights”
parameter for the learner via observation weights.
makeWeightedClassesWrapper 211
Usage
makeWeightedClassesWrapper(learner, wcw.param = NULL, wcw.weight = 1)
Arguments
learner (Learner | character(1))
The classification learner. If you pass a string the learner will be created via
makeLearner.
wcw.param (character(1))
Name of already existing learner parameter, which allows class weighting. The
default (wcw.param = NULL) will use the parameter defined in the learner (class.weights.param).
During training, the parameter must accept a named vector of class weights,
where length equals the number of classes.
wcw.weight (numeric)
Weight for each class. Must be a vector of the same number of elements as
classes are in task, and must also be in the same order as the class levels are in
getTaskDesc(task)$class.levels. For convenience, one must pass a single
number in case of binary classification, which is then taken as the weight of the
positive class, while the negative class receives a weight of 1. Default is 1.
Value
Learner.
See Also
Other wrapper: makeBaggingWrapper(), makeClassificationViaRegressionWrapper(), makeConstantClassWrapper(
makeCostSensClassifWrapper(), makeCostSensRegrWrapper(), makeDownsampleWrapper(),
makeDummyFeaturesWrapper(), makeExtractFDAFeatsWrapper(), makeFeatSelWrapper(), makeFilterWrapper(),
makeImputeWrapper(), makeMulticlassWrapper(), makeMultilabelBinaryRelevanceWrapper(),
makeMultilabelClassifierChainsWrapper(), makeMultilabelDBRWrapper(), makeMultilabelNestedStackingWrapp
makeMultilabelStackingWrapper(), makeOverBaggingWrapper(), makePreprocWrapper(), makePreprocWrapperCare
makeRemoveConstantFeaturesWrapper(), makeSMOTEWrapper(), makeTuneWrapper(), makeUndersampleWrapper()
Examples
set.seed(123)
# using the direct parameter of the SVM (which is already defined in the learner)
lrn = makeWeightedClassesWrapper("classif.ksvm", wcw.weight = 0.01)
res = holdout(lrn, sonar.task)
print(calculateConfusionMatrix(res$pred))
Description
Result from train.
It internally stores the underlying fitted model, the subset used for training, features used for train-
ing, levels of factors in the data set and computation time that was spent for training.
Object members: See arguments.
The constructor makeWrappedModel is mainly for internal use.
Usage
makeWrappedModel(
learner,
learner.model,
task.desc,
subset,
features,
factor.levels,
time
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
learner.model (any)
Underlying model.
task.desc TaskDesc
Task description object.
MeasureProperties 213
Value
WrappedModel.
Description
Properties can be accessed with getMeasureProperties(measure), which returns a character vec-
tor.
The measure properties are defined in Measure.
Usage
getMeasureProperties(measure)
hasMeasureProperties(measure, props)
Arguments
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
props (character)
Vector of properties to query.
Value
getMeasureProperties returns a character vector with measure properties. hasMeasureProperties
returns a logical vector of the same length as props.
214 measures
Description
A performance measure is evaluated after a single train/predict step and returns a single number to
assess the quality of the prediction (or maybe only the model, think AIC). The measure itself knows
whether it wants to be minimized or maximized and for what tasks it is applicable.
All supported measures can be found by listMeasures or as a table in the tutorial appendix: https:
//mlr.mlr-org.com/articles/tutorial/measures.html.
If you want a measure for a misclassification cost matrix, look at makeCostMeasure. If you want to
implement your own measure, look at makeMeasure.
Most measures can directly be accessed via the function named after the scheme measureX (e.g.
measureSSE).
For clustering measures, we compact the predicted cluster IDs such that they form a continuous
series starting with 1. If this is not the case, some of the measures will generate warnings.
Some measure have parameters. Their defaults are set in the constructor makeMeasure and can be
overwritten using setMeasurePars.
Usage
measureSSE(truth, response)
measureMSE(truth, response)
measureRMSE(truth, response)
measureMEDSE(truth, response)
measureSAE(truth, response)
measureMAE(truth, response)
measureMEDAE(truth, response)
measureRSQ(truth, response)
measureEXPVAR(truth, response)
measureRRSE(truth, response)
measureRAE(truth, response)
measureMAPE(truth, response)
measures 215
measureMSLE(truth, response)
measureRMSLE(truth, response)
measureKendallTau(truth, response)
measureSpearmanRho(truth, response)
measureMMCE(truth, response)
measureACC(truth, response)
measureBER(truth, response)
measureAUNU(probabilities, truth)
measureAUNP(probabilities, truth)
measureAU1U(probabilities, truth)
measureAU1P(probabilities, truth)
measureMulticlassBrier(probabilities, truth)
measureLogloss(probabilities, truth)
measureSSR(probabilities, truth)
measureQSR(probabilities, truth)
measureLSR(probabilities, truth)
measureKAPPA(truth, response)
measureWKAPPA(truth, response)
measureBAC(truth, response)
measureMultilabelHamloss(truth, response)
measureMultilabelSubset01(truth, response)
measureMultilabelF1(truth, response)
measureMultilabelACC(truth, response)
measureMultilabelPPV(truth, response)
measureMultilabelTPR(truth, response)
Arguments
truth (factor)
Vector of the true class.
response (factor)
Vector of the predicted class.
probabilities (numeric | matrix)
a) For purely binary classification measures: The predicted probabilities for the
positive class as a numeric vector. b) For multiclass classification measures:
The predicted probabilities for all classes, always as a numeric matrix, where
mergeBenchmarkResults 217
References
He, H. & Garcia, E. A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge
and Data Engineering, vol. 21, no. 9. pp. 1263-1284.
H. Uno et al. On the C-statistics for Evaluating Overall Adequacy of Risk Prediction Proce-
dures with Censored Survival Data Statistics in medicine. 2011;30(10):1105-1117. doi:10.1002/
sim.4154.
H. Uno et al. Evaluating Prediction Rules for T-Year Survivors with Censored Regression Models
Journal of the American Statistical Association 102, no. 478 (2007): 527-37.
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
estimateRelativeOverfitting(), makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(),
performance(), setAggregation(), setMeasurePars()
Description
The function automatically combines a list of BenchmarkResult objects into a single BenchmarkRe-
sult object as long as the full crossproduct of all task-learner combinations are available.
Usage
mergeBenchmarkResults(bmrs)
Arguments
bmrs (list of BenchmarkResult)
BenchmarkResult objects that should be merged.
Details
Note that if you want to merge several BenchmarkResult objects, you must ensure that all possible
learner and task combinations will be contained in the returned object. Otherwise, the user will be
notified which task-learner combinations are missing or duplicated.
When merging BenchmarkResult objects with different measures, all missing measures will auto-
matically be recomputed.
218 mergeSmallFactorLevels
Value
BenchmarkResult
mergeSmallFactorLevels
Merges small levels of factors into new level.
Description
Merges factor levels that occur only infrequently into combined levels with a higher frequency.
Usage
mergeSmallFactorLevels(
task,
cols = NULL,
min.perc = 0.01,
new.level = ".merged"
)
Arguments
task (Task)
The task.
cols (character) Which columns to convert. Default is all factor and character columns.
min.perc (numeric(1))
The smallest levels of a factor are merged until their combined proportion w.r.t.
the length of the factor exceeds min.perc. Must be between 0 and 1. Default is
0.01.
new.level (character(1))
New name of merged level. Default is “.merged”
Value
Task, where merged levels are combined into a new level of name new.level.
See Also
Description
List of all mlr documentation families with members.
Arguments
benchmark batchmark, reduceBatchmarkResults, benchmark, benchmarkParallel, getBM-
RTaskIds, getBMRLearners, getBMRLearnerIds, getBMRLearnerShortNames,
getBMRMeasures, getBMRMeasureIds, getBMRPredictions, getBMRPerfor-
mances, getBMRAggrPerformances, getBMRTuneResults, getBMRFeatSelRe-
sults, getBMRFilteredFeatures, getBMRModels, getBMRTaskDescs, convertBM-
RToRankMatrix, friedmanPostHocTestBMR, friedmanTestBMR, plotBMRBox-
plots, plotBMRRanksAsBarChart, generateCritDifferencesData, plotCritDiffer-
ences
calibration generateCalibrationData, plotCalibration
configure configureMlr, getMlrOptions
costsens makeCostSensTask, makeCostSensWeightedPairsWrapper
debug predictFailureModel, getPredictionDump, getRRDump, print.ResampleResult
downsample downsample
eda_and_preprocess
capLargeValues, createDummyFeatures, dropFeatures, mergeSmallFactorLevels,
normalizeFeatures, removeConstantFeatures, summarizeColumns, summarizeLevels
extractFDAFeatures
reextractFDAFeatures
fda_featextractor
extractFDAFourier, extractFDAWavelets, extractFDAFPCA, extractFDAMultiRes-
Features
fda makeExtractFDAFeatMethod, extractFDAFeatures
featsel analyzeFeatSelResult, makeFeatSelControl, getFeatSelResult, selectFeatures
filter filterFeatures, makeFilter, listFilterMethods, getFilteredFeatures, generateFilter-
ValuesData, getFilterValues
generate_plot_data
generateFeatureImportanceData, plotFilterValues, generatePartialDependence-
Data
help helpLearner, helpLearnerParam
imbalancy oversample, smote
impute makeImputeMethod, imputeConstant, impute, reimpute
220 mtcars.task
Description
Contains the task (mtcars.task).
normalizeFeatures 221
References
See datasets::mtcars.
Description
Normalize features by different methods. Internally BBmisc::normalize is used for every feature
column. Non numerical features will be left untouched and passed to the result. For constant
features most methods fail, special behaviour for this case is implemented.
Usage
normalizeFeatures(
obj,
target = character(0L),
method = "standardize",
cols = NULL,
range = c(0, 1),
on.constant = "quiet"
)
Arguments
obj (data.frame | Task)
Input data.
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). Only used when obj is a data.frame, otherwise
ignored. If survival analysis is applicable, these are the names of the survival
time and event columns, so it has length 2. For multilabel classification these
are the names of logical columns that indicate whether a class label is present
and the number of target variables corresponds to the number of classes.
method (character(1))
Normalizing method. Available are:
“center”: Subtract mean.
“scale”: Divide by standard deviation.
“standardize”: Center and scale.
“range”: Scale to a given range.
cols (character)
Columns to normalize. Default is to use all numeric columns.
range (numeric(2))
Range for method “range”. Default is c(0,1).
222 oversample
on.constant (character(1))
How should constant vectors be treated? Only used, of “method != center”,
since this methods does not fail for constant vectors. Possible actions are:
“quiet”: Depending on the method, treat them quietly:
“scale”: No division by standard deviation is done, input values. will be returned
untouched.
“standardize”: Only the mean is subtracted, no division is done.
“range”: All values are mapped to the mean of the given range.
“warn”: Same behaviour as “quiet”, but print a warning message.
“stop”: Stop with an error.
Value
data.frame | Task. Same type as obj.
See Also
BBmisc::normalize
Other eda_and_preprocess: capLargeValues(), createDummyFeatures(), dropFeatures(), mergeSmallFactorLevels()
removeConstantFeatures(), summarizeColumns(), summarizeLevels()
Description
Oversampling: For a given class (usually the smaller one) all existing observations are taken and
copied and extra observations are added by randomly sampling with replacement from this class.
Undersampling: For a given class (usually the larger one) the number of observations is reduced
(downsampled) by randomly sampling without replacement from this class.
Usage
oversample(task, rate, cl = NULL)
Arguments
task (Task)
The task.
rate (numeric(1))
Factor to upsample or downsample a class. For undersampling: Must be be-
tween 0 and 1, where 1 means no downsampling, 0.5 implies reduction to 50
percent and 0 would imply reduction to 0 observations. For oversampling: Must
parallelization 223
be between 1 and Inf, where 1 means no oversampling and 2 would mean dou-
bling the class size.
cl (character(1))
Which class should be over- or undersampled. If NULL, oversample will select
the smaller and undersample the larger class.
Value
Task.
See Also
Other imbalancy: makeOverBaggingWrapper(), makeUndersampleWrapper(), smote()
Description
mlr supports different methods to activate parallel computing capabilities through the integration
of the parallelMap::parallelMap package, which supports all major parallelization backends for R.
You can start parallelization with parallelStart*, where * should be replaced with the chosen
backend. parallelMap::parallelStop is used to stop all parallelization backends.
Parallelization is divided into different levels and will automatically be carried out for the first level
that occurs, e.g. if you call resample() after parallelMap::parallelStart, each resampling iteration
is a parallel job and possible underlying calls like parameter tuning won’t be parallelized further.
The supported levels of parallelization are:
Description
Usage
performance(
pred,
measures,
task = NULL,
model = NULL,
feats = NULL,
simpleaggr = FALSE
)
Arguments
pred (Prediction)
Prediction object.
measures (Measure | list of Measure)
Performance measure(s) to evaluate. Default is the default measure for the task,
see here getDefaultMeasure.
task (Task)
Learning task, might be requested by performance measure, usually not needed
except for clustering or survival.
model (WrappedModel)
Model built on training data, might be requested by performance measure, usu-
ally not needed except for survival.
feats (data.frame)
Features of predicted data, usually not needed except for clustering. If the pre-
diction was generated from a task, you can also pass this instead and the fea-
tures are extracted from it.
simpleaggr (logical)
If TRUE, aggregation of ResamplePrediction objects is skipped. This is used
internally for threshold tuning. Default is FALSE.
Value
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
estimateRelativeOverfitting(), makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(),
measures, setAggregation(), setMeasurePars()
Examples
training.set = seq(1, nrow(iris), by = 2)
test.set = seq(2, nrow(iris), by = 2)
Description
Contains the task (phoneme.task). The task contains a single functional covariate and 5 equally
big classes (aa, ao, dcl, iy, sh). The aim is to predict the class of the phoneme in the functional. The
dataset is contained in the package fda.usc.
References
F. Ferraty and P. Vieu (2003) "Curve discrimination: a nonparametric functional approach", Compu-
tational Statistics and Data Analysis, 44(1-2), 161-173. F. Ferraty and P. Vieu (2006) Nonparametric
functional data analysis, New York: Springer. T. Hastie and R. Tibshirani and J. Friedman (2009)
The elements of statistical learning: Data mining, inference and prediction, 2nd edn, New York:
Springer.
Description
Contains the task (pid.task).
References
See mlbench::PimaIndiansDiabetes. Note that this is the uncorrected version from mlbench.
226 plotBMRBoxplots
Description
Plots box or violin plots for a selected measure across all iterations of the resampling strategy,
faceted by the task.id.
Usage
plotBMRBoxplots(
bmr,
measure = NULL,
style = "box",
order.lrns = NULL,
order.tsks = NULL,
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
style (character(1))
Type of plot, can be “box” for a boxplot or “violin” for a violin plot. Default is
“box”.
order.lrns (character(n.learners))
Character vector with learner.ids in new order.
order.tsks (character(n.tasks))
Character vector with task.ids in new order.
pretty.names (logical(1))
Whether to use the Measure name and the Learner short name instead of the id.
Default is TRUE.
facet.wrap.nrow, facet.wrap.ncol
(integer)
Number of rows and columns for facetting. Default for both is NULL. In this case
ggplot’s facet_wrap will choose the layout itself.
Value
ggplot2 plot object.
plotBMRRanksAsBarChart 227
See Also
Other plot: createSpatialResamplingPlots(), plotBMRRanksAsBarChart(), plotBMRSummary(),
plotCalibration(), plotCritDifferences(), plotLearningCurve(), plotPartialDependence(),
plotROCCurves(), plotResiduals(), plotThreshVsPerf()
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Examples
# see benchmark
plotBMRRanksAsBarChart
Create a bar chart for ranks in a BenchmarkResult.
Description
Plots a bar chart from the ranks of algorithms. Alternatively, tiles can be plotted for every rank-
task combination, see pos for details. In all plot variants the ranks of the learning algorithms are
displayed on the x-axis. Areas are always colored according to the learner.id.
Usage
plotBMRRanksAsBarChart(
bmr,
measure = NULL,
ties.method = "average",
aggregation = "default",
pos = "stack",
order.lrns = NULL,
order.tsks = NULL,
pretty.names = TRUE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
228 plotBMRSummary
ties.method (character(1))
See rank for details.
aggregation (character(1))
“mean” or “default”. See getBMRAggrPerformances for details on “default”.
pos (character(1))
Optionally set how the bars are positioned in ggplot2. Ranks are plotted on the
x-axis. “tile” plots a heat map with task as the y-axis. Allows identification of
the performance in a special task. “stack” plots a stacked bar plot. Allows for
comparison of learners within and and across ranks. “dodge” plots a bar plot
with bars next to each other instead of stacked bars.
order.lrns (character(n.learners))
Character vector with learner.ids in new order.
order.tsks (character(n.tasks))
Character vector with task.ids in new order.
pretty.names (logical(1))
Whether to use the short name of the learner instead of its ID in labels. Defaults
to TRUE.
Value
ggplot2 plot object.
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRSummary(), plotCalibration(),
plotCritDifferences(), plotLearningCurve(), plotPartialDependence(), plotROCCurves(),
plotResiduals(), plotThreshVsPerf()
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(),
plotBMRSummary(), plotCritDifferences(), reduceBatchmarkResults()
Examples
# see benchmark
Description
Creates a scatter plot, where each line refers to a task. On that line the aggregated scores for all
learners are plotted, for that task. Optionally, you can apply a rank transformation or just use one
of ggplot2’s transformations like ggplot2::scale_x_log10.
plotBMRSummary 229
Usage
plotBMRSummary(
bmr,
measure = NULL,
trafo = "none",
order.tsks = NULL,
pointsize = 4L,
jitter = 0.05,
pretty.names = TRUE
)
Arguments
bmr (BenchmarkResult)
Benchmark result.
measure (Measure)
Performance measure. Default is the first measure used in the benchmark exper-
iment.
trafo (character(1))
Currently either “none” or “rank”, the latter performing a rank transformation
(with average handling of ties) of the scores per task. NB: You can add always
add ggplot2::scale_x_log10 to the result to put scores on a log scale. Default is
“none”.
order.tsks (character(n.tasks))
Character vector with task.ids in new order.
pointsize (numeric(1))
Point size for ggplot2 ggplot2::geom_point for data points. Default is 4.
jitter (numeric(1))
Small vertical jitter to deal with overplotting in case of equal scores. Default is
0.05.
pretty.names (logical(1))
Whether to use the short name of the learner instead of its ID in labels. Defaults
to TRUE.
Value
ggplot2 plot object.
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(),
plotBMRRanksAsBarChart(), plotCritDifferences(), reduceBatchmarkResults()
230 plotCalibration
Examples
# see benchmark
Description
Plots calibration data from generateCalibrationData.
Usage
plotCalibration(
obj,
smooth = FALSE,
reference = TRUE,
rag = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj (CalibrationData)
Result of generateCalibrationData.
smooth (logical(1))
Whether to use a loess smoother. Default is FALSE.
reference (logical(1))
Whether to plot a reference line showing perfect calibration. Default is TRUE.
rag (logical(1))
Whether to include a rag plot which shows a rug plot on the top which pertains
to positive cases and on the bottom which pertains to negative cases. Default is
TRUE.
facet.wrap.nrow, facet.wrap.ncol
(integer)
Number of rows and columns for facetting. Default for both is NULL. In this case
ggplot’s facet_wrap will choose the layout itself.
Value
ggplot2 plot object.
plotCritDifferences 231
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCritDifferences(), plotLearningCurve(), plotPartialDependence(),
plotROCCurves(), plotResiduals(), plotThreshVsPerf()
Other calibration: generateCalibrationData()
Examples
## Not run:
lrns = list(makeLearner("classif.rpart", predict.type = "prob"),
makeLearner("classif.nnet", predict.type = "prob"))
fit = lapply(lrns, train, task = iris.task)
pred = lapply(fit, predict, task = iris.task)
names(pred) = c("rpart", "nnet")
out = generateCalibrationData(pred, groups = 3)
plotCalibration(out)
## End(Not run)
Description
Plots a critical-differences diagram for all classifiers and a selected measure. If a baseline is selected
for the Bonferroni-Dunn test, the critical difference interval will be positioned around the baseline.
If not, the best performing algorithm will be chosen as baseline.
The positioning of some descriptive elements can be moved by modifying the generated data.
Usage
plotCritDifferences(obj, baseline = NULL, pretty.names = TRUE)
Arguments
obj (critDifferencesData) Result of generateCritDifferencesData().
232 plotFilterValues
Value
ggplot2 plot object.
References
Janez Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, JMLR, 2006
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotLearningCurve(), plotPartialDependence(),
plotROCCurves(), plotResiduals(), plotThreshVsPerf()
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(),
plotBMRRanksAsBarChart(), plotBMRSummary(), reduceBatchmarkResults()
Examples
# see benchmark
Description
Plot filter values using ggplot2.
Usage
plotFilterValues(
fvalues,
sort = "dec",
n.show = nrow(fvalues$data),
filter = NULL,
feat.type.cols = FALSE
)
plotHyperParsEffect 233
Arguments
fvalues (FilterValues)
Filter values.
sort (character(1))
Available options are:
• "dec"-> descending
• "inc" -> increasing
• "none" -> no sorting
Default is decreasing.
n.show (integer(1))
Number of features (maximal) to show. Default is to plot all features.
filter (character(1)) In case fvalues contains multiple filter methods, which method
should be plotted?
feat.type.cols (logical(1))
Whether to color different feature types (e.g. numeric | factor). Default is to use
no colors (feat.type.cols = FALSE).
Value
ggplot2 plot object.
See Also
Other filter: filterFeatures(), generateFilterValuesData(), getFilteredFeatures(), listFilterEnsembleMethod
listFilterMethods(), makeFilter(), makeFilterEnsemble(), makeFilterWrapper()
Other generate_plot_data: generateCalibrationData(), generateCritDifferencesData(), generateFeatureImportan
generateFilterValuesData(), generateLearningCurveData(), generatePartialDependenceData(),
generateThreshVsPerfData()
Examples
fv = generateFilterValuesData(iris.task, method = "variance")
plotFilterValues(fv)
Description
Plot hyperparameter validation path. Automated plotting method for HyperParsEffectData ob-
ject. Useful for determining the importance or effect of a particular hyperparameter on some per-
formance measure and/or optimizer.
234 plotHyperParsEffect
Usage
plotHyperParsEffect(
hyperpars.effect.data,
x = NULL,
y = NULL,
z = NULL,
plot.type = "scatter",
loess.smooth = FALSE,
facet = NULL,
global.only = TRUE,
interpolate = NULL,
show.experiments = FALSE,
show.interpolated = FALSE,
nested.agg = mean,
partial.dep.learn = NULL
)
Arguments
hyperpars.effect.data
(HyperParsEffectData)
Result of generateHyperParsEffectData
x (character(1))
Specify what should be plotted on the x axis. Must be a column from HyperParsEffectData$data.
For partial dependence, this is assumed to be a hyperparameter.
y (character(1))
Specify what should be plotted on the y axis. Must be a column from HyperParsEffectData$data
z (character(1))
Specify what should be used as the extra axis for a particular geom. This could
be for the fill on a heatmap or color aesthetic for a line. Must be a column from
HyperParsEffectData$data. Default is NULL.
plot.type (character(1))
Specify the type of plot: “scatter” for a scatterplot, “heatmap” for a heatmap,
“line” for a scatterplot with a connecting line, or “contour” for a contour plot
layered ontop of a heatmap. Default is “scatter”.
loess.smooth (logical(1))
If TRUE, will add loess smoothing line to plots where possible. Note that this is
probably only useful when plot.type is set to either “scatter” or “line”. Must
be a column from HyperParsEffectData$data. Not used with partial depen-
dence. Default is FALSE.
facet (character(1))
Specify what should be used as the facet axis for a particular geom. When using
nested cross validation, set this to “nested_cv_run” to obtain a facet for each
outer loop. Must be a column from HyperParsEffectData$data. Please note
that facetting is not supported with partial dependence plots! Default is NULL.
global.only (logical(1))
If TRUE, will only plot the current global optima when setting x = "iteration" and
plotHyperParsEffect 235
Value
ggplot2 plot object.
Note
Any NAs incurred from learning algorithm crashes will be indicated in the plot (except in the case
of partial dependence) and the NA values will be replaced with the column min/max depending
on the optimal values for the respective measure. Execution time will be replaced with the max.
Interpolation by its nature will result in predicted values for the performance measure. Use inter-
polation with caution. If “partial.dep” is set to TRUE in generateHyperParsEffectData, only partial
dependence will be plotted.
Since a ggplot2 plot object is returned, the user can change the axis labels and other aspects of the
plot using the appropriate ggplot2 syntax.
236 plotLearnerPrediction
Examples
# see generateHyperParsEffectData
Description
Trains the model for 1 or 2 selected features, then displays it via ggplot2::ggplot. Good for teaching
or exploring models.
For classification and clustering, only 2D plots are supported. The data points, the classification
and potentially through color alpha blending the posterior probabilities are shown.
For regression, 1D and 2D plots are supported. 1D shows the data, the estimated mean and po-
tentially the estimated standard error. 2D does not show estimated standard error, but only the
estimated mean via background color.
The plot title displays the model id, its parameters, the training performance and the cross-validation
performance.
Usage
plotLearnerPrediction(
learner,
task,
features = NULL,
measures,
cv = 10L,
...,
gridsize,
pointsize = 2,
prob.alpha = TRUE,
se.band = TRUE,
err.mark = "train",
bg.cols = c("darkblue", "green", "darkred"),
err.col = "white",
err.size = pointsize,
greyscale = FALSE,
pretty.names = TRUE
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
plotLearnerPrediction 237
features (character)
Selected features for model. By default the first 2 features are used.
measures (Measure | list of Measure)
Performance measure(s) to evaluate. Default is the default measure for the task,
see here getDefaultMeasure.
cv (integer(1))
Do cross-validation and display in plot title? Number of folds. 0 means no CV.
Default is 10.
... (any)
Parameters for learner.
gridsize (integer(1))
Grid resolution per axis for background predictions. Default is 500 for 1D and
100 for 2D.
pointsize (numeric(1))
Pointsize for ggplot2 ggplot2::geom_point for data points. Default is 2.
prob.alpha (logical(1))
For classification: Set alpha value of background to probability for predicted
class? Allows visualization of “confidence” for prediction. If not, only a con-
stant color is displayed in the background for the predicted label. Default is
TRUE.
se.band (logical(1))
For regression in 1D: Show band for standard error estimation? Default is TRUE.
err.mark (character(1)): For classification: Either mark error of the model on the train-
ing data (“train”) or during cross-validation (“cv”) or not at all with “none”.
Default is “train”.
bg.cols (character(3))
Background colors for classification and regression. Sorted from low, medium
to high. Default is TRUE.
err.col (character(1))
For classification: Color of misclassified data points. Default is “white”
err.size (integer(1))
For classification: Size of misclassified data points. Default is pointsize.
greyscale (logical(1))
Should the plot be greyscale completely? Default is FALSE.
pretty.names (logical(1))
Whether to use the short name of the learner instead of its ID in labels. Defaults
to TRUE.
Value
The ggplot2 object.
238 plotLearningCurve
Description
Visualizes data size (percentage used for model) vs. performance measure(s).
Usage
plotLearningCurve(
obj,
facet = "measure",
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj (LearningCurveData)
Result of generateLearningCurveData, with class LearningCurveData.
facet (character(1))
Selects “measure” or “learner” to be the facetting variable. The variable mapped
to facet must have more than one unique value, otherwise it will be ignored.
The variable not chosen is mapped to color if it has more than one unique value.
The default is “measure”.
pretty.names (logical(1))
Whether to use the Measure name instead of the id in the plot. Default is TRUE.
facet.wrap.nrow, facet.wrap.ncol
(integer)
Number of rows and columns for facetting. Default for both is NULL. In this case
ggplot’s facet_wrap will choose the layout itself.
Value
ggplot2 plot object.
See Also
Other learning_curve: generateLearningCurveData()
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotCritDifferences(), plotPartialDependence(),
plotROCCurves(), plotResiduals(), plotThreshVsPerf()
plotPartialDependence 239
Description
Plot a partial dependence from generatePartialDependenceData using ggplot2.
Usage
plotPartialDependence(
obj,
geom = "line",
facet = NULL,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL,
p = 1,
data = NULL
)
Arguments
obj PartialDependenceData
Generated by generatePartialDependenceData.
geom (charater(1))
The type of geom to use to display the data. Can be “line” or “tile”. For tiling
at least two features must be used with interaction = TRUE in the call to gen-
eratePartialDependenceData. This may be used in conjuction with the facet
argument if three features are specified in the call to generatePartialDependence-
Data. Default is “line”.
facet (character(1))
The name of a feature to be used for facetting. This feature must have been
an element of the features argument to generatePartialDependenceData and is
only applicable when said argument had length greater than 1. The feature must
be a factor or an integer. If generatePartialDependenceData is called with the
interaction argument FALSE (the default) with argument features of length
greater than one, then facet is ignored and each feature is plotted in its own
facet. Default is NULL.
facet.wrap.nrow, facet.wrap.ncol
(integer)
Number of rows and columns for facetting. Default for both is NULL. In this case
ggplot’s facet_wrap will choose the layout itself.
p (numeric(1))
If individual = TRUE then sample allows the user to sample without replace-
ment from the output to make the display more readable. Each row is sampled
with probability p. Default is 1.
240 plotResiduals
data (data.frame)
Data points to plot. Usually the training data. For survival and binary classifica-
tion tasks a rug plot wherein ticks represent failures or instances of the positive
class are shown. For regression tasks points are shown. For multiclass clas-
sification tasks ticks are shown and colored according to their class. Both the
features and the target must be included. Default is NULL.
Value
ggplot2 plot object.
See Also
Other partial_dependence: generatePartialDependenceData()
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotCritDifferences(), plotLearningCurve(), plotROCCurves(),
plotResiduals(), plotThreshVsPerf()
Description
Plots for model diagnostics. Provides scatterplots of true vs. predicted values and histograms of the
model’s residuals.
Usage
plotResiduals(
obj,
type = "scatterplot",
loess.smooth = TRUE,
rug = TRUE,
pretty.names = TRUE
)
Arguments
obj (Prediction | BenchmarkResult)
Input data.
type Type of plot. Can be “scatterplot”, the default. Or “hist”, for a histogram, or in
case of classification problems a barplot, displaying the residuals.
loess.smooth (logical(1))
Should a loess smoother be added to the plot? Defaults to TRUE. Only applicable
for regression tasks and if type is set to scatterplot.
plotROCCurves 241
rug (logical(1))
Should marginal distributions be added to the plot? Defaults to TRUE. Only
applicable for regression tasks and if type is set to scatterplot.
pretty.names (logical(1))
Whether to use the short name of the learner instead of its ID in labels. Defaults
to TRUE.
Only applicable if a BenchmarkResult is passed to obj in the function call, ig-
nored otherwise.
Value
ggplot2 plot object.
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotCritDifferences(), plotLearningCurve(), plotPartialDependence()
plotROCCurves(), plotThreshVsPerf()
Description
Plots a ROC curve from predictions.
Usage
plotROCCurves(
obj,
measures,
diagonal = TRUE,
pretty.names = TRUE,
facet.learner = FALSE
)
Arguments
obj (ThreshVsPerfData)
Result of generateThreshVsPerfData.
measures ([list(2)‘ of Measure)
Default is the first 2 measures passed to generateThreshVsPerfData.
diagonal (logical(1))
Whether to plot a dashed diagonal line. Default is TRUE.
pretty.names (logical(1))
Whether to use the Measure name instead of the id in the plot. Default is TRUE.
242 plotThreshVsPerf
facet.learner (logical(1))
Weather to use facetting or different colors to compare multiple learners. Default
is FALSE.
Value
ggplot2 plot object.
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotCritDifferences(), plotLearningCurve(), plotPartialDependence()
plotResiduals(), plotThreshVsPerf()
Other thresh_vs_perf: generateThreshVsPerfData(), plotThreshVsPerf()
Examples
plotThreshVsPerf Plot threshold vs. performance(s) for 2-class classification using gg-
plot2.
Description
Plots threshold vs. performance(s) data that has been generated with generateThreshVsPerfData.
Usage
plotThreshVsPerf(
obj,
measures = obj$measures,
facet = "measure",
mark.th = NA_real_,
plotThreshVsPerf 243
pretty.names = TRUE,
facet.wrap.nrow = NULL,
facet.wrap.ncol = NULL
)
Arguments
obj (ThreshVsPerfData)
Result of generateThreshVsPerfData.
measures (Measure | list of Measure)
Performance measure(s) to plot. Must be a subset of those used in gener-
ateThreshVsPerfData. Default is all the measures stored in obj generated by
generateThreshVsPerfData.
facet (character(1))
Selects “measure” or “learner” to be the facetting variable. The variable mapped
to facet must have more than one unique value, otherwise it will be ignored.
The variable not chosen is mapped to color if it has more than one unique value.
The default is “measure”.
mark.th (numeric(1))
Mark given threshold with vertical line? Default is NA which means not to do it.
pretty.names (logical(1))
Whether to use the Measure name instead of the id in the plot. Default is TRUE.
facet.wrap.nrow, facet.wrap.ncol
(integer)
Number of rows and columns for facetting. Default for both is NULL. In this case
ggplot’s facet_wrap will choose the layout itself.
Value
ggplot2 plot object.
See Also
Other plot: createSpatialResamplingPlots(), plotBMRBoxplots(), plotBMRRanksAsBarChart(),
plotBMRSummary(), plotCalibration(), plotCritDifferences(), plotLearningCurve(), plotPartialDependence()
plotROCCurves(), plotResiduals()
Other thresh_vs_perf: generateThreshVsPerfData(), plotROCCurves()
Examples
lrn = makeLearner("classif.rpart", predict.type = "prob")
mod = train(lrn, sonar.task)
pred = predict(mod, sonar.task)
pvs = generateThreshVsPerfData(pred, list(acc, setAggregation(acc, train.mean)))
plotThreshVsPerf(pvs)
244 plotTuneMultiCritResult
plotTuneMultiCritResult
Plots multi-criteria results after tuning using ggplot2.
Description
Visualizes the pareto front and possibly the dominated points.
Usage
plotTuneMultiCritResult(
res,
path = TRUE,
col = NULL,
shape = NULL,
pointsize = 2,
pretty.names = TRUE
)
Arguments
res (TuneMultiCritResult)
Result of tuneParamsMultiCrit.
path (logical(1))
Visualize all evaluated points (or only the non-dominated pareto front)? For the
full path, the size of the points on the front is slightly increased. Default is TRUE.
col (character(1))
Which column of res$opt.path should be mapped to ggplot2 color? Default
is NULL, which means none.
shape (character(1))
Which column of res$opt.path should be mapped to ggplot2 shape? Default
is NULL, which means none.
pointsize (numeric(1))
Point size for ggplot2 ggplot2::geom_point for data points. Default is 2.
pretty.names (logical(1))
Whether to use the ID of the measures instead of their name in labels. Defaults
to TRUE.
Value
ggplot2 plot object.
See Also
Other tune_multicrit: TuneMultiCritControl, tuneParamsMultiCrit()
predict.WrappedModel 245
Examples
# see tuneParamsMultiCrit
Description
Predict the target variable of new data using a fitted model. What is stored exactly in the (Prediction)
object depends on the predict.type setting of the Learner. If predict.type was set to “prob”
probability thresholding can be done calling the setThreshold function on the prediction object.
The row names of the input task or newdata are preserved in the output.
Usage
Arguments
object (WrappedModel)
Wrapped model, result of train.
task (Task)
The task. If this is passed, data from this task is predicted.
newdata (data.frame)
New observations which should be predicted. Pass this alternatively instead of
task.
subset (integer | logical | NULL)
Selected cases. Either a logical or an index vector. By default NULL if all obser-
vations are used.
... (any)
Currently ignored.
Value
(Prediction).
See Also
Examples
# train and predict
train.set = seq(1, 150, 2)
test.set = seq(2, 150, 2)
model = train("classif.lda", iris.task, subset = train.set)
p = predict(model, newdata = iris, subset = test.set)
print(p)
predict(model, task = iris.task, subset = test.set)
Description
Mainly for internal use. Predict new data with a fitted model. You have to implement this method
if you want to add another learner to this package.
Usage
predictLearner(.learner, .model, .newdata, ...)
Arguments
.learner (RLearner)
Wrapped learner.
.model (WrappedModel)
Model produced by training.
.newdata (data.frame)
New data to predict. Does not include target column.
... (any)
Additional parameters, which need to be passed to the underlying predict func-
tion.
Details
Your implementation must adhere to the following: Predictions for the observations in .newdata
must be made based on the fitted model (.model$learner.model). All parameters in ... must be
passed to the underlying predict function.
reduceBatchmarkResults 247
Value
• For classification: Either a factor with class labels for type “response” or, if the learner sup-
ports this, a matrix of class probabilities for type “prob”. In the latter case the columns must
be named with the class labels.
• For regression: Either a numeric vector for type “response” or, if the learner supports this, a
matrix with two columns for type “se”. In the latter case the first column contains the estimated
response (mean value) and the second column the estimated standard errors.
• For survival: Either a numeric vector with some sort of orderable risk for type “response” or,
if supported, a numeric vector with time dependent probabilities for type “prob”.
• For clustering: Either an integer with cluster IDs for type “response” or, if supported, a matrix
of membership probabilities for type “prob”.
• For multilabel: A logical matrix that indicates predicted class labels for type “response” or, if
supported, a matrix of class probabilities for type “prob”. The columns must be named with
the class labels.
reduceBatchmarkResults
Reduce results of a batch-distributed benchmark.
Description
This creates a BenchmarkResult from a batchtools::ExperimentRegistry. To setup the benchmark
have a look at batchmark.
Usage
reduceBatchmarkResults(
ids = NULL,
keep.pred = TRUE,
keep.extract = FALSE,
show.info = getMlrOption("show.info"),
reg = batchtools::getDefaultRegistry()
)
Arguments
ids (data.frame or integer)
A base::data.frame (or data.table::data.table) with a column named “job.id”. Al-
ternatively, you may also pass a vector of integerish job ids. If not set, defaults
to all successfully terminated jobs (return value of batchtools::findDone.
keep.pred (logical(1))
Keep the prediction data in the pred slot of the result object. If you do many ex-
periments (on larger data sets) these objects might unnecessarily increase object
size / mem usage, if you do not really need them. The default is set to TRUE.
248 reextractFDAFeatures
keep.extract (logical(1))
Keep the extract slot of the result object. When creating a lot of benchmark
results with extensive tuning, the resulting R objects can become very large in
size. That is why the tuning results stored in the extract slot are removed by
default (keep.extract = FALSE). Note that when keep.extract = FALSE you
will not be able to conduct analysis in the tuning results.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
reg (batchtools::ExperimentRegistry)
Registry, created by batchtools::makeExperimentRegistry. If not explicitly passed,
uses the last created registry.
Value
(BenchmarkResult).
See Also
Other benchmark: BenchmarkResult, batchmark(), benchmark(), convertBMRToRankMatrix(),
friedmanPostHocTestBMR(), friedmanTestBMR(), generateCritDifferencesData(), getBMRAggrPerformances(),
getBMRFeatSelResults(), getBMRFilteredFeatures(), getBMRLearnerIds(), getBMRLearnerShortNames(),
getBMRLearners(), getBMRMeasureIds(), getBMRMeasures(), getBMRModels(), getBMRPerformances(),
getBMRPredictions(), getBMRTaskDescs(), getBMRTaskIds(), getBMRTuneResults(), plotBMRBoxplots(),
plotBMRRanksAsBarChart(), plotBMRSummary(), plotCritDifferences()
Description
This function accepts a data frame or a task and an extractFDAFeatDesc (a FDA feature extraction
description) as returned by extractFDAFeatures to extract features from previously unseen data.
Usage
reextractFDAFeatures(obj, desc, ...)
Arguments
obj (Task | data.frame)
Task or data.frame to extract functional features from. Must contain functional
features as matrix columns.
desc (extractFDAFeatDesc)
FDAFeature extraction description as returned by extractFDAFeatures
... (any)
Further args passed on to methods.
reimpute 249
Value
Description
This function accepts a data frame or a task and an imputation description as returned by impute to
perform the following actions:
Usage
reimpute(obj, desc)
Arguments
Value
See Also
removeConstantFeatures
Remove constant features from a data set.
Description
Constant features can lead to errors in some models and obviously provide no information in the
training set that can be learned from. With the argument “perc”, there is a possibility to also remove
features for which less than “perc” percent of the observations differ from the mode value.
Usage
removeConstantFeatures(
obj,
perc = 0,
dont.rm = character(0L),
na.ignore = FALSE,
wrap.tol = .Machine$double.eps^0.5,
show.info = getMlrOption("show.info"),
...
)
Arguments
obj (data.frame | Task)
Input data.
perc (numeric(1))
The percentage of a feature values in [0, 1) that must differ from the mode value.
Default is 0, which means only constant features with exactly one observed level
are removed.
dont.rm (character)
Names of the columns which must not be deleted. Default is no columns.
na.ignore (logical(1))
Should NAs be ignored in the percentage calculation? (Or should they be treated
as a single, extra level in the percentage calculation?) Note that if the feature
has only missing values, it is always removed. Default is FALSE.
wrap.tol (numeric(1))
Numerical tolerance to treat two numbers as equal. Variables stored as double
will get rounded accordingly before computing the mode. Default is sqrt(.Maschine$double.eps).
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
... To ensure backward compatibility with old argument tol
Value
data.frame | Task. Same type as obj.
removeHyperPars 251
See Also
Description
Remove settings (previously set through mlr) for some parameters. Which means that the default
behavior for that param will now be used.
Usage
Arguments
Value
Learner.
See Also
Description
The function resample fits a model specified by Learner on a Task and calculates predictions and
performance measures for all training and all test sets specified by a either a resampling description
(ResampleDesc) or resampling instance (ResampleInstance).
You are able to return all fitted models (parameter models) or extract specific parts of the models
(parameter extract) as returning all of them completely might be memory intensive.
The remaining functions on this page are convenience wrappers for the various existing resampling
strategies. Note that if you need to work with precomputed training and test splits (i.e., resampling
instances), you have to stick with resample.
Usage
resample(
learner,
task,
resampling,
measures,
weights = NULL,
models = FALSE,
extract,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
crossval(
learner,
task,
iters = 10L,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
repcv(
learner,
task,
folds = 10L,
reps = 10L,
resample 253
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
holdout(
learner,
task,
split = 2/3,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
subsample(
learner,
task,
iters = 30,
split = 2/3,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapOOB(
learner,
task,
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapB632(
learner,
task,
254 resample
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
bootstrapB632plus(
learner,
task,
iters = 30,
stratify = FALSE,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
growingcv(
learner,
task,
horizon = 1,
initial.window = 0.5,
skip = 0,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
fixedcv(
learner,
task,
horizon = 1L,
initial.window = 0.5,
skip = 0,
measures,
models = FALSE,
keep.pred = TRUE,
...,
show.info = getMlrOption("show.info")
)
resample 255
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
resampling (ResampleDesc or ResampleInstance)
Resampling strategy. If a description is passed, it is instantiated automatically.
measures (Measure | list of Measure)
Performance measure(s) to evaluate. Default is the default measure for the task,
see here getDefaultMeasure.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. If given,
must be of same length as observations in task and in corresponding order. Over-
writes weights specified in the task. By default NULL which means no weights
are used unless specified in the task.
models (logical(1))
Should all fitted models be returned? Default is FALSE.
extract (function)
Function used to extract information from a fitted model during resampling. Is
applied to every WrappedModel resulting from calls to train during resampling.
Default is to extract nothing.
keep.pred (logical(1))
Keep the prediction data in the pred slot of the result object. If you do many ex-
periments (on larger data sets) these objects might unnecessarily increase object
size / mem usage, if you do not really need them. The default is set to TRUE.
... (any)
Further hyperparameters passed to learner.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
iters (integer(1))
See ResampleDesc.
stratify (logical(1))
See ResampleDesc.
folds (integer(1))
See ResampleDesc.
reps (integer(1))
See ResampleDesc.
split (numeric(1))
See ResampleDesc.
horizon (numeric(1))
See ResampleDesc.
initial.window (numeric(1))
See ResampleDesc.
skip (integer(1))
See ResampleDesc.
256 ResamplePrediction
Value
(ResampleResult).
Note
If you would like to include results from the training data set, make sure to appropriately adjust the
resampling strategy and the aggregation for the measure. See example code below.
See Also
Other resample: ResamplePrediction, ResampleResult, addRRMeasure(), getRRPredictionList(),
getRRPredictions(), getRRTaskDesc(), getRRTaskDescription(), makeResampleDesc(), makeResampleInstance()
Examples
task = makeClassifTask(data = iris, target = "Species")
rdesc = makeResampleDesc("CV", iters = 2)
r = resample(makeLearner("classif.qda"), task, rdesc)
print(r$aggr)
print(r$measures.test)
print(r$pred)
Description
Contains predictions from resampling, returned (among other stuff) by function resample. Can
basically be used in the same way as Prediction, its super class. The main differences are: (a) The
internal data.frame (member data) contains an additional column iter, specifying the iteration of
the resampling strategy, and and additional columns set, specifying whether the prediction was
from an observation in the “train” or “test” set. (b) The prediction time is a numeric vector, its
length equals the number of iterations.
See Also
Other resample: ResampleResult, addRRMeasure(), getRRPredictionList(), getRRPredictions(),
getRRTaskDesc(), getRRTaskDescription(), makeResampleDesc(), makeResampleInstance(),
resample()
ResampleResult 257
Description
A container for resample results.
Details
Resample Result:
A resample result is created by resample and contains the following object members:
task.id (character(1)): Name of the Task.
learner.id (character(1)): Name of the Learner.
measures.test (data.frame): Gives you access to performance measurements on the individual test
sets. Rows correspond to sets in resampling iterations, columns to performance measures.
measures.train (data.frame): Gives you access to performance measurements on the individual
training sets. Rows correspond to sets in resampling iterations, columns to performance mea-
sures. Usually not available, only if specifically requested, see general description above.
aggr (numeric): Named vector of aggregated performance values. Names are coded like this
<measure>.<aggregation>.
err.msgs (data.frame): Number of rows equals resampling iterations and columns are: iter,
train, predict. Stores error messages generated during train or predict, if these were caught
via configureMlr.
err.dumps (list of list of dump.frames): List with length equal to number of resampling itera-
tions. Contains lists of dump.frames objects that can be fed to debugger() to inspect error
dumps generated on learner errors. One iteration can generate more than one error dump de-
pending on which of training, prediction on training set, or prediction on test set, operations
fail. Therefore the lists have named slots $train, $predict.train, or $predict.test if
relevant. The error dumps are only saved when option on.error.dump is TRUE.
pred (ResamplePrediction): Container for all predictions during resampling.
models [list of WrappedModel): List of fitted models or NULL.
extract (list): List of extracted parts from fitted models or NULL.
runtime (numeric(1)): Time in seconds it took to execute the resampling.
The print method of this object gives a short overview, including task and learner ids, aggregated
measures and runtime for the resampling.
See Also
Other resample: ResamplePrediction, addRRMeasure(), getRRPredictionList(), getRRPredictions(),
getRRTaskDesc(), getRRTaskDescription(), makeResampleDesc(), makeResampleInstance(),
resample()
Other debug: FailureModel, getPredictionDump(), getRRDump()
258 RLearner
Description
Wraps an already implemented learning method from R to make it accessible to mlr. Call this
method in your constructor. You have to pass an id (name), the required package(s), a description
object for all changeable parameters (you do not have to do this for the learner to work, but it is
strongly recommended), and use property tags to define features of the learner.
For a general overview on how to integrate a learning algorithm into mlr’s system, please read the
section in the online tutorial: https://mlr.mlr-org.com/articles/tutorial/create_learner.
html
To see all possible properties of a learner, go to: LearnerProperties.
Usage
makeRLearner()
makeRLearnerClassif(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
class.weights.param = NULL,
callees = character(0L)
)
makeRLearnerMultilabel(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerRegr(
cl,
package,
RLearner 259
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerSurv(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerCluster(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
makeRLearnerCostSens(
cl,
package,
par.set,
par.vals = list(),
properties = character(0L),
name = cl,
short.name = cl,
note = "",
callees = character(0L)
)
Arguments
cl (character(1))
Class of learner. By convention, all classification learners start with “classif.”
260 selectFeatures
all regression learners with “regr.” all survival learners start with “surv.” all
clustering learners with “cluster.” and all multilabel classification learners start
with “multilabel.”. A list of all integrated learners is available on the learners
help page.
package (character)
Package(s) to load for the implementation of the learner.
par.set (ParamHelpers::ParamSet)
Parameter set of (hyper)parameters and their constraints. Dependent parameters
with a requires field must use quote and not expression to define it.
par.vals (list)
Always set hyperparameters to these values when the object is constructed. Use-
ful when default values are missing in the underlying function. The values can
later be overwritten when the user sets hyperparameters. Default is empty list.
properties (character)
Set of learner properties. See above. Default is character(0).
name (character(1))
Meaningful name for learner. Default is id.
short.name (character(1))
Short name for learner. Should only be a few characters so it can be used in
plots and tables. Default is id.
note (character(1))
Additional notes regarding the learner and its integration in mlr. Default is “”.
class.weights.param
(character(1))
Name of the parameter, which can be used for providing class weights.
callees (character)
Character vector naming all functions of the learner’s package being called
which have a relevant R help page. Default is character(0).
Value
(RLearner). The specific subclass is one of RLearnerClassif, RLearnerCluster, RLearnerMultilabel,
RLearnerRegr, RLearnerSurv.
Description
Optimizes the features for a classification or regression problem by choosing a variable selection
wrapper approach. Allows for different optimization methods, such as forward search or a genetic
algorithm. You can select such an algorithm (and its settings) by passing a corresponding control
object. For a complete list of implemented algorithms look at the subclasses of (FeatSelControl).
All algorithms operate on a 0-1-bit encoding of candidate solutions. Per default a single bit corre-
sponds to a single feature, but you are able to change this by using the arguments bit.names and
bits.to.features. Thus allowing you to switch on whole groups of features with a single bit.
selectFeatures 261
Usage
selectFeatures(
learner,
task,
resampling,
measures,
bit.names,
bits.to.features,
control,
show.info = getMlrOption("show.info")
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
resampling (ResampleInstance | ResampleDesc)
Resampling strategy for feature selection. If you pass a description, it is instan-
tiated once at the beginning by default, so all points are evaluated on the same
training/test sets. If you want to change that behavior, look at FeatSelControl.
measures (list of Measure | Measure)
Performance measures to evaluate. The first measure, aggregated by the first
aggregation function is optimized, others are simply evaluated. Default is the
default measure for the task, see here getDefaultMeasure.
bit.names character
Names of bits encoding the solutions. Also defines the total number of bits in
the encoding. Per default these are the feature names of the task. Has to be used
together with bits.to.features.
bits.to.features
(function(x, task))
Function which transforms an integer-0-1 vector into a character vector of se-
lected features. Per default a value of 1 in the ith bit selects the ith feature to be
in the candidate solution. The vector x will correspond to the bit.names and
has to be of the same length.
control [see FeatSelControl) Control object for search method. Also selects the opti-
mization algorithm for feature selection.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
Value
(FeatSelResult).
See Also
Other featsel: FeatSelControl, analyzeFeatSelResult(), getFeatSelResult(), makeFeatSelWrapper()
262 setAggregation
Examples
rdesc = makeResampleDesc("Holdout")
ctrl = makeFeatSelControlSequential(method = "sfs", maxit = NA)
res = selectFeatures("classif.rpart", iris.task, rdesc, control = ctrl)
analyzeFeatSelResult(res)
Description
Set how this measure will be aggregated after resampling. To see possible aggregation functions:
aggregations.
Usage
setAggregation(measure, aggr)
Arguments
measure (Measure)
Performance measure.
aggr (Aggregation)
Aggregation function.
Value
See Also
Description
Usage
Arguments
Value
Learner.
Note
If a named (hyper)parameter can’t be found for the given learner, the 3 closest (hyper)parameter
names will be output in case the user mistyped.
See Also
Examples
cl1 = makeLearner("classif.ksvm", sigma = 1)
cl2 = setHyperPars(cl1, sigma = 10, par.vals = list(C = 2))
print(cl1)
# note the now set and altered hyperparameters:
print(cl2)
Description
Only exported for internal use.
Usage
setHyperPars2(learner, par.vals)
Arguments
learner (Learner)
The learner.
par.vals (list)
List of named (hyper)parameter settings.
Description
Deprecated, use setLearnerId instead.
Usage
setId(learner, id)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
id (character(1))
New id for learner.
setLearnerId 265
Value
Learner.
See Also
Description
Usage
setLearnerId(learner, id)
Arguments
Value
Learner.
See Also
Description
Sets hyperparameters of measures.
Usage
setMeasurePars(measure, ..., par.vals = list())
Arguments
measure (Measure)
Performance measure.
... (any)
Named (hyper)parameters with new settings. Alternatively these can be passed
using the par.vals argument.
par.vals (list)
Optional list of named (hyper)parameter settings. The arguments in ... take
precedence over values in this list.
Value
Measure.
See Also
Other performance: ConfusionMatrix, calculateConfusionMatrix(), calculateROCMeasures(),
estimateRelativeOverfitting(), makeCostMeasure(), makeCustomResampledMeasure(), makeMeasure(),
measures, performance(), setAggregation()
Description
See predict.threshold in makeLearner and setThreshold.
For complex wrappers only the top-level predict.type is currently set.
Usage
setPredictThreshold(learner, predict.threshold)
setPredictType 267
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
predict.threshold
(numeric)
Threshold to produce class labels. Has to be a named vector, where names corre-
spond to class labels. Only for binary classification it can be a single numerical
threshold for the positive class. See setThreshold for details on how it is applied.
Default is NULL which means 0.5 / an equal threshold for each class.
Value
Learner.
See Also
Other predict: asROCRPrediction(), getPredictionProbabilities(), getPredictionResponse(),
getPredictionTaskDesc(), predict.WrappedModel(), setPredictType()
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(),
helpLearnerParam(), makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(),
setId(), setLearnerId(), setPredictType()
Description
Possible prediction types are: Classification: Labels or class probabilities (including labels). Re-
gression: Numeric or response or standard errors (including numeric response). Survival: Linear
predictor or survival probability.
For complex wrappers the predict type is usually also passed down the encapsulated learner in a
recursive fashion.
Usage
setPredictType(learner, predict.type)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
predict.type (character(1))
Classification: “response” or “prob”. Regression: “response” or “se”. Survival:
“response” (linear predictor) or “prob”. Clustering: “response” or “prob”. De-
fault is “response”.
268 setThreshold
Value
Learner.
See Also
Other predict: asROCRPrediction(), getPredictionProbabilities(), getPredictionResponse(),
getPredictionTaskDesc(), predict.WrappedModel(), setPredictThreshold()
Other learner: LearnerProperties, getClassWeightParam(), getHyperPars(), getLearnerId(),
getLearnerNote(), getLearnerPackages(), getLearnerParVals(), getLearnerParamSet(),
getLearnerPredictType(), getLearnerShortName(), getLearnerType(), getParamSet(), helpLearner(),
helpLearnerParam(), makeLearner(), makeLearners(), removeHyperPars(), setHyperPars(),
setId(), setLearnerId(), setPredictThreshold()
Description
Set threshold of prediction object for classification or multilabel classification. Creates correspond-
ing discrete class response for the newly set threshold. For binary classification: The positive class
is predicted if the probability value exceeds the threshold. For multiclass: Probabilities are divided
by corresponding thresholds and the class with maximum resulting value is selected. The result of
both are equivalent if in the multi-threshold case the values are greater than 0 and sum to 1. For
multilabel classification: A label is predicted (with entry TRUE) if a probability matrix entry exceeds
the threshold of the corresponding label.
Usage
setThreshold(pred, threshold)
Arguments
pred (Prediction)
Prediction object.
threshold (numeric)
Threshold to produce class labels. Has to be a named vector, where names corre-
spond to class labels. Only for binary classification it can be a single numerical
threshold for the positive class.
Value
(Prediction) with changed threshold and corresponding response.
See Also
predict.WrappedModel
simplifyMeasureNames 269
Examples
Description
Clips aggregation names from character vector. E.g: ’mmce.test.mean’ becomes ’mmce’. Elements
that don’t contain a measure name are ignored and returned unchanged.
Usage
simplifyMeasureNames(xs)
Arguments
xs (character)
Character vector that (possibly) contains aggregated measure names.
Value
(character).
270 smote
Description
In each iteration, samples one minority class element x1, then one of x1’s nearest neighbors: x2.
Both points are now interpolated / convex-combined, resulting in a new virtual data point x3 for the
minority class.
The method handles factor features, too. The gower distance is used for nearest neighbor calcula-
tion, see cluster::daisy. For interpolation, the new factor level for x3 is sampled from the two given
levels of x1 and x2 per feature.
Usage
smote(task, rate, nn = 5L, standardize = TRUE, alt.logic = FALSE)
Arguments
task (Task)
The task.
rate (numeric(1))
Factor to upsample the smaller class. Must be between 1 and Inf, where 1
means no oversampling and 2 would mean doubling the class size.
nn (integer(1))
Number of nearest neighbors to consider. Default is 5.
standardize (integer(1))
Standardize input variables before calculating the nearest neighbors for data sets
with numeric input variables only. For mixed variables (numeric and factor) the
gower distance is used and variables are standardized anyway. Default is TRUE.
alt.logic (integer(1))
Use an alternative logic for selection of minority class observations. Instead
of sampling a minority class element AND one of its nearest neighbors, each
minority class element is taken multiple times (depending on rate) for the in-
terpolation and only the corresponding nearest neighbor is sampled. Default is
FALSE.
Value
Task.
References
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, P. (2000) SMOTE: Synthetic Minority Over-
sampling TEchnique. In International Conference of Knowledge Based Computer Systems, pp.
46-57. National Center for Software Technology, Mumbai, India, Allied Press.
sonar.task 271
See Also
Other imbalancy: makeOverBaggingWrapper(), makeUndersampleWrapper(), oversample()
Description
Contains the task (sonar.task).
References
See mlbench::Sonar.
Description
Contains the task (spam.task).
References
See kernlab::spam.
Description
Data set created by Jannes Muenchow, University of Erlangen-Nuremberg, Germany. These data
should be cited as Muenchow et al. (2012) (see reference below). This publication also contains
additional information on data collection and the geomorphology of the area. The data set provded
here is (a subset of) the one from the ’natural’ part of the RBSF area and corresponds to landslide
distribution in the year 2000.
Format
a data.frame with point samples of landslide and non-landslide locations in a study area in the
Andes of southern Ecuador.
References
Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of landslides along a
humidity gradient in the tropical Andes. Geomorphology, 139-140: 271-284.
Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and eval-
uation. Natural Hazards and Earth System Sciences, 5(6): 853-862.
272 subsetTask
Description
See title.
Usage
Arguments
task (Task)
The task.
subset (integer | logical | NULL)
Selected cases. Either a logical or an index vector. By default NULL if all obser-
vations are used.
features (character | integer | logical)
Vector of selected inputs. You can either pass a character vector with the feature
names, a vector of indices, or a logical vector.
In case of an index vector each element denotes the position of the feature name
returned by getTaskFeatureNames.
Note that the target feature is always included in the resulting task, you should
not pass it here. Default is to use all features.
Value
See Also
Examples
task = makeClassifTask(data = iris, target = "Species")
subsetTask(task, subset = 1:100)
summarizeColumns 273
Description
Summarizes a data.frame, somewhat differently than the normal summary function of R. The func-
tion is mainly useful as a basic EDA tool on data.frames before they are converted to tasks, but can
be used on tasks as well.
Columns can be of type numeric, integer, logical, factor, or character. Characters and logicals will
be treated as factors.
Usage
summarizeColumns(obj)
Arguments
obj (data.frame | Task)
Input data.
Value
(data.frame). With columns:
name Name of column.
type Data type of column.
na Number of NAs in column.
disp Measure of dispersion, for numerics and integers sd is used, for categorical
columns the qualitative variation.
mean Mean value of column, NA for categorical columns.
median Median value of column, NA for categorical columns.
mad MAD of column, NA for categorical columns.
min Minimal value of column, for categorical columns the size of the smallest cate-
gory.
max Maximal value of column, for categorical columns the size of the largest cate-
gory.
nlevs For categorical columns, the number of factor levels, NA else.
See Also
Other eda_and_preprocess: capLargeValues(), createDummyFeatures(), dropFeatures(), mergeSmallFactorLevels()
normalizeFeatures(), removeConstantFeatures(), summarizeLevels()
Examples
summarizeColumns(iris)
274 Task
Description
Characters and logicals will be treated as factors.
Usage
summarizeLevels(obj, cols = NULL)
Arguments
obj (data.frame | Task)
Input data.
cols (character)
Restrict result to columns in cols. Default is all factor, character and logical
columns of obj.
Value
(list). Named list of tables.
See Also
Other eda_and_preprocess: capLargeValues(), createDummyFeatures(), dropFeatures(), mergeSmallFactorLevels()
normalizeFeatures(), removeConstantFeatures(), summarizeColumns()
Examples
summarizeLevels(iris)
Description
The task encapsulates the data and specifies - through its subclasses - the type of the task. It also
contains a description object detailing further aspects of the data.
Useful operators are:
• getTaskFormula,
• getTaskFeatureNames,
• getTaskData,
Task 275
• getTaskTargets, and
• subsetTask.
Object members:
env (environment) Environment where data for the task are stored. Use getTaskData in order to
access it.
weights (numeric) See argument. NULL if not present.
blocking (factor) See argument. NULL if not present.
task.desc (TaskDesc) Encapsulates further information about the task.
Functional data can be added to a task via matrix columns. For more information refer to make-
FunctionalData.
Arguments
id (character(1))
Id string for object. Default is the name of the R variable passed to data.
data (data.frame)
A data frame containing the features and target variable(s).
target (character(1) | character(2) | character(n.classes))
Name(s) of the target variable(s). For survival analysis these are the names of
the survival time and event columns, so it has length 2. For multilabel classifi-
cation it contains the names of the logical columns that encode whether a label
is present or not and its length corresponds to the number of classes.
costs (data.frame)
A numeric matrix or data frame containing the costs of misclassification. We
assume the general case of observation specific costs. This means we have n
rows, corresponding to the observations, in the same order as data. The columns
correspond to classes and their names are the class labels (if unnamed we use
y1 to yk as labels). Each entry (i,j) of the matrix specifies the cost of predicting
class j for observation i.
weights (numeric)
Optional, non-negative case weight vector to be used during fitting. Cannot
be set for cost-sensitive learning. Default is NULL which means no (= equal)
weights.
blocking (factor)
An optional factor of the same length as the number of observations. Obser-
vations with the same blocking level “belong together”. Specifically, they are
either put all in the training or the test set during a resampling iteration. Default
is NULL which means no blocking.
positive (character(1))
Positive class for binary classification (otherwise ignored and set to NA). Default
is the first factor level of the target attribute.
fixup.data (character(1))
Should some basic cleaning up of data be performed? Currently this means
276 TaskDesc
removing empty factor levels for the columns. Possible choices are: “no” =
Don’t do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent.
Default is “warn”.
check.data (logical(1))
Should sanity of data be checked initially at task creation? You should have
good reasons to turn this off (one might be speed). Default is TRUE.
coordinates (data.frame)
Coordinates of a spatial data set that will be used for spatial partitioning of the
data in a spatial cross-validation resampling setting. Coordinates have to be
numeric values. Provided data.frame needs to have the same number of rows as
data and consist of at least two dimensions.
Value
Task.
See Also
Examples
if (requireNamespace("mlbench")) {
library(mlbench)
data(BostonHousing)
data(Ionosphere)
Description
Description object for task, encapsulates basic properties of the task without having to store the
complete data set.
train 277
Details
Object members:
Description
Given a Task, creates a model for the learning machine which can be used for predictions on new
data.
Usage
train(learner, task, subset = NULL, weights = NULL)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
278 trainLearner
Value
(WrappedModel).
See Also
predict.WrappedModel
Examples
Description
Mainly for internal use. Trains a wrapped learner on a given training set. You have to implement
this method if you want to add another learner to this package.
Usage
trainLearner(.learner, .task, .subset, .weights = NULL, ...)
TuneControl 279
Arguments
.learner (RLearner)
Wrapped learner.
.task (Task)
Task to train learner on.
.subset (integer)
Subset of cases for training set, index the task with this. You probably want to
use getTaskData for this purpose.
.weights (numeric)
Weights for each observation.
... (any)
Additional (hyper)parameters, which need to be passed to the underlying train
function.
Details
Your implementation must adhere to the following: The model must be fitted on the subset of .task
given by .subset. All parameters in ... must be passed to the underlying training function.
Value
(any). Model of the underlying learner.
Description
General tune control object.
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
280 TuneMultiCritControl
start (list)
Named list of initial parameter values.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
... (any)
Further control parameters passed to the control arguments of cmaes::cma_es
or GenSA::GenSA, as well as towards the tunerConfig argument of irace::irace.
See Also
Other tune: getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(), getResamplingIndices(),
getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(), makeTuneControlCMAES(),
makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(), makeTuneControlIrace(),
makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(), tuneParams(), tuneThreshold()
Description
The following tuners are available:
makeTuneMultiCritControlGrid Grid search. All kinds of parameter types can be handled. You
can either use their correct param type and resolution, or discretize them yourself by always
using ParamHelpers::makeDiscreteParam in the par.set passed to tuneParams.
TuneMultiCritControl 281
Usage
makeTuneMultiCritControlGrid(
same.resampling.instance = TRUE,
resolution = 10L,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
)
makeTuneMultiCritControlMBO(
n.objectives = mbo.control$n.objectives,
same.resampling.instance = TRUE,
impute.val = NULL,
learner = NULL,
mbo.control = NULL,
tune.threshold = FALSE,
tune.threshold.args = list(),
continue = FALSE,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
mbo.design = NULL
)
makeTuneMultiCritControlNSGA2(
same.resampling.instance = TRUE,
impute.val = NULL,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL,
...
)
makeTuneMultiCritControlRandom(
same.resampling.instance = TRUE,
maxit = 100L,
log.fun = "default",
final.dw.perc = NULL,
budget = NULL
282 TuneMultiCritControl
Arguments
same.resampling.instance
(logical(1))
Should the same resampling instance be used for all evaluations to reduce vari-
ance? Default is TRUE.
resolution (integer)
Resolution of the grid for each numeric/integer parameter in par.set. For vec-
tor parameters, it is the resolution per dimension. Either pass one resolution
for all parameters, or a named vector. See ParamHelpers::generateGridDesign.
Default is 10.
log.fun (function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design
points, the resulting performances, and the runtime will be reported. If set to
“memory” the memory usage for each evaluation will also be displayed, with
character(1) small increase in run time. Otherwise character(1) function
with arguments learner, resampling, measures, par.set, control, opt.path,
dob, x, y, remove.nas, stage and prev.stage is expected. The default dis-
plays the performance measures, the time needed for evaluating, the currently
used memory and the max memory ever used before (the latter two both taken
from gc). See the implementation for details.
final.dw.perc (boolean)
If a Learner wrapped by a makeDownsampleWrapper is used, you can define
the value of dw.perc which is used to train the Learner with the final parameter
setting found by the tuning. Default is NULL which will not change anything.
budget (integer(1))
Maximum budget for tuning. This value restricts the number of function evalua-
tions. In case of makeTuneMultiCritControlGrid this number must be identi-
cal to the size of the grid. For makeTuneMultiCritControlRandom the budget
equals the number of iterations (maxit) performed by the random search algo-
rithm. In case of makeTuneMultiCritControlNSGA2 the budget corresponds
to the product of the maximum number of generations (max(generations))
+ 1 (for the initial population) and the size of the population (popsize). For
makeTuneMultiCritControlMBO the budget equals the number of objective
function evaluations, i.e. the number of MBO iterations + the size of the ini-
tial design. If not NULL, this will overwrite existing stopping conditions in
mbo.control.
n.objectives (integer(1))
Number of objectives, i.e. number of Measures to optimize.
impute.val (numeric)
If something goes wrong during optimization (e.g. the learner crashes), this
value is fed back to the tuner, so the tuning algorithm does not abort. Imputation
is only active if on.learner.error is configured not to stop in configureMlr. It
is not stored in the optimization path, an NA and a corresponding error message
are logged instead. Note that this value is later multiplied by -1 for maximization
TuneMultiCritControl 283
measures internally, so you need to enter a larger positive value for maximization
here as well. Default is the worst obtainable value of the performance measure
you optimize for when you aggregate by mean value, or Inf instead. For multi-
criteria optimization pass a vector of imputation values, one for each of your
measures, in the same order as your measures.
learner (Learner | NULL)
The surrogate learner: A regression learner to model performance landscape.
For the default, NULL, mlrMBO will automatically create a suitable learner
based on the rules described in mlrMBO::makeMBOLearner.
mbo.control (mlrMBO::MBOControl | NULL)
Control object for model-based optimization tuning. For the default, NULL, the
control object will be created with all the defaults as described in mlrMBO::makeMBOControl.
tune.threshold (logical(1))
Should the threshold be tuned for the measure at hand, after each hyperparam-
eter evaluation, via tuneThreshold? Only works for classification if the predict
type is “prob”. Default is FALSE.
tune.threshold.args
(list)
Further arguments for threshold tuning that are passed down to tuneThreshold.
Default is none.
continue (logical(1))
Resume calculation from previous run using mlrMBO::mboContinue? Requires
“save.file.path” to be set. Note that the ParamHelpers::OptPath in the mlrMBO::OptResult
will only include the evaluations after the continuation. The complete OptPath
will be found in the slot $mbo.result$opt.path.
mbo.design (data.frame | NULL)
Initial design as data frame. If the parameters have corresponding trafo func-
tions, the design must not be transformed before it is passed! For the default,
NULL, a default design is created like described in mlrMBO::mbo.
... (any)
Further control parameters passed to the control arguments of cmaes::cma_es
or GenSA::GenSA, as well as towards the tunerConfig argument of irace::irace.
maxit (integer(1))
Number of iterations for random search. Default is 100.
Value
(TuneMultiCritControl). The specific subclass is one of TuneMultiCritControlGrid, TuneMultiCrit-
ControlRandom, TuneMultiCritControlNSGA2, TuneMultiCritControlMBO.
See Also
Other tune_multicrit: plotTuneMultiCritResult(), tuneParamsMultiCrit()
284 tuneParams
Description
Container for results of hyperparameter tuning. Contains the obtained pareto set and front and the
optimization path which lead there.
Object members:
learner (Learner) Learner that was optimized.
control (TuneControl) Control object from tuning.
x (list) List of lists of non-dominated hyperparameter settings in pareto set. Note that when you
have trafos on some of your params, x will always be on the TRANSFORMED scale so you
directly use it.
y (matrix) Pareto front for x.
threshold Currently NULL.
opt.path (ParamHelpers::OptPath) Optimization path which lead to x. Note that when you have
trafos on some of your params, the opt.path always contains the UNTRANSFORMED values
on the original scale. You can simply call trafoOptPath(opt.path) to transform them, or,
as.data.frame{trafoOptPath(opt.path)}
ind (integer(n)) Indices of Pareto optimal params in opt.path.
measures [(list of) Measure) Performance measures.
Description
Optimizes the hyperparameters of a learner. Allows for different optimization methods, such as
grid search, evolutionary strategies, iterated F-race, etc. You can select such an algorithm (and its
settings) by passing a corresponding control object. For a complete list of implemented algorithms
look at TuneControl.
Multi-criteria tuning can be done with tuneParamsMultiCrit.
Usage
tuneParams(
learner,
task,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info"),
resample.fun = resample
)
tuneParams 285
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
resampling (ResampleInstance | ResampleDesc)
Resampling strategy to evaluate points in hyperparameter space. If you pass a
description, it is instantiated once at the beginning by default, so all points are
evaluated on the same training/test sets. If you want to change that behavior,
look at TuneControl.
measures (list of Measure | Measure)
Performance measures to evaluate. The first measure, aggregated by the first
aggregation function is optimized, others are simply evaluated. Default is the
default measure for the task, see here getDefaultMeasure.
par.set (ParamHelpers::ParamSet)
Collection of parameters and their constraints for optimization. Dependent pa-
rameters with a requires field must use quote and not expression to define
it.
control (TuneControl)
Control object for search method. Also selects the optimization algorithm for
tuning.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
resample.fun (closure)
The function to use for resampling. Defaults to resample. If a user-given func-
tion is to be used instead, it should take the arguments “learner”, “task”, “re-
sampling”, “measures”, and “show.info”; see resample. Within this function, it
is easiest to call resample and possibly modify the result. However, it is pos-
sible to return a list with only the following essential slots: the “aggr” slot for
general tuning, additionally the “pred” slot if threshold tuning is performed (see
TuneControl), and the “err.msgs” and “err.dumps” slots for error reporting. This
parameter must be the default when mbo tuning is performed.
Value
(TuneResult).
Note
If you would like to include results from the training data set, make sure to appropriately adjust the
resampling strategy and the aggregation for the measure. See example code below.
See Also
generateHyperParsEffectData
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
286 tuneParams
Examples
set.seed(123)
# a grid search for an SVM (with a tiny number of points...)
# note how easily we can optimize on a log-scale
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x)
)
ctrl = makeTuneControlGrid(resolution = 2L)
rdesc = makeResampleDesc("CV", iters = 2L)
res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
# access data for all evaluated points
df = as.data.frame(res$opt.path)
df1 = as.data.frame(res$opt.path, trafo = TRUE)
print(head(df[, -ncol(df)]))
print(head(df1[, -ncol(df)]))
# access data for all evaluated points - alternative
df2 = generateHyperParsEffectData(res)
df3 = generateHyperParsEffectData(res, trafo = TRUE)
print(head(df2$data[, -ncol(df2$data)]))
print(head(df3$data[, -ncol(df3$data)]))
## Not run:
# we optimize the SVM over 3 kernels simultanously
# note how we use dependent params (requires = ...) and iterated F-racing here
ps = makeParamSet(
makeNumericParam("C", lower = -12, upper = 12, trafo = function(x) 2^x),
makeDiscreteParam("kernel", values = c("vanilladot", "polydot", "rbfdot")),
makeNumericParam("sigma", lower = -12, upper = 12, trafo = function(x) 2^x,
requires = quote(kernel == "rbfdot")),
makeIntegerParam("degree", lower = 2L, upper = 5L,
requires = quote(kernel == "polydot"))
)
print(ps)
ctrl = makeTuneControlIrace(maxExperiments = 5, nbIterations = 1, minNbSurvival = 1)
rdesc = makeResampleDesc("Holdout")
res = tuneParams("classif.ksvm", iris.task, rdesc, par.set = ps, control = ctrl)
print(res)
df = as.data.frame(res$opt.path)
print(head(df[, -ncol(df)]))
df2 = as.data.frame(res$opt.path)
print(head(df2[, -ncol(df2)]))
## End(Not run)
Description
Optimizes the hyperparameters of a learner in a multi-criteria fashion. Allows for different opti-
mization methods, such as grid search, evolutionary strategies, etc. You can select such an algorithm
(and its settings) by passing a corresponding control object. For a complete list of implemented al-
gorithms look at TuneMultiCritControl.
Usage
tuneParamsMultiCrit(
learner,
task,
resampling,
measures,
par.set,
control,
show.info = getMlrOption("show.info"),
resample.fun = resample
)
Arguments
learner (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
task (Task)
The task.
resampling (ResampleInstance | ResampleDesc)
Resampling strategy to evaluate points in hyperparameter space. If you pass a
description, it is instantiated once at the beginning by default, so all points are
evaluated on the same training/test sets. If you want to change that behavior,
look at TuneMultiCritControl.
measures [list of Measure)
Performance measures to optimize simultaneously.
par.set (ParamHelpers::ParamSet)
Collection of parameters and their constraints for optimization. Dependent pa-
rameters with a requires field must use quote and not expression to define
it.
288 TuneResult
control (TuneMultiCritControl)
Control object for search method. Also selects the optimization algorithm for
tuning.
show.info (logical(1))
Print verbose output on console? Default is set via configureMlr.
resample.fun (closure)
The function to use for resampling. Defaults to resample and should take the
same arguments as, and return the same result type as, resample.
Value
(TuneMultiCritResult).
See Also
Other tune_multicrit: TuneMultiCritControl, plotTuneMultiCritResult()
Examples
Description
Container for results of hyperparameter tuning. Contains the obtained point in search space, its
performance values and the optimization path which lead there.
Object members:
Description
Optimizes the threshold of predictions based on probabilities. Works for classification and multi-
label tasks. Uses BBmisc::optimizeSubInts for normal binary class problems and GenSA::GenSA
for multiclass and multilabel problems.
Usage
tuneThreshold(pred, measure, task, model, nsub = 20L, control = list())
Arguments
pred (Prediction)
Prediction object.
measure (Measure)
Performance measure to optimize. Default is the default measure for the task.
task (Task)
Learning task. Rarely neeeded, only when required for the performance mea-
sure.
model (WrappedModel)
Fitted model. Rarely neeeded, only when required for the performance measure.
nsub (integer(1))
Passed to BBmisc::optimizeSubInts for 2class problems. Default is 20.
control (list)
Control object for GenSA::GenSA when used. Default is empty list.
290 yeast.task
Value
(list). A named list with with the following components: th is the optimal threshold, perf the
performance value.
See Also
Other tune: TuneControl, getNestedTuneResultsOptPathDf(), getNestedTuneResultsX(),
getResamplingIndices(), getTuneResult(), makeModelMultiplexer(), makeModelMultiplexerParamSet(),
makeTuneControlCMAES(), makeTuneControlDesign(), makeTuneControlGenSA(), makeTuneControlGrid(),
makeTuneControlIrace(), makeTuneControlMBO(), makeTuneControlRandom(), makeTuneWrapper(),
tuneParams()
Description
Contains the task (wpbc.task).
References
See TH.data::wpbc. Incomplete cases have been removed from the task.
Description
Contains the task (yeast.task).
Source
https://archive.ics.uci.edu/ml/datasets/Yeast (In long instead of wide format)
References
Elisseeff, A., & Weston, J. (2001): A kernel method for multi-labelled classification. In Advances
in neural information processing systems (pp. 681-687).
Index
∗ benchmark ∗ datasets
batchmark, 13 aggregations, 11
benchmark, 15 measures, 214
BenchmarkResult, 17 spatial.task, 271
convertBMRToRankMatrix, 26 ∗ data
friedmanPostHocTestBMR, 50 agri.task, 12
friedmanTestBMR, 51 bc.task, 15
generateCritDifferencesData, 54 bh.task, 18
getBMRAggrPerformances, 67 costiris.task, 27
getBMRFeatSelResults, 68 fuelsubset.task, 52
getBMRFilteredFeatures, 70 gunpoint.task, 117
getBMRLearnerIds, 71 iris.task, 124
getBMRLearners, 72 lung.task, 133
getBMRLearnerShortNames, 72 mtcars.task, 220
getBMRMeasureIds, 73 phoneme.task, 225
getBMRMeasures, 74 pid.task, 225
getBMRModels, 74 sonar.task, 271
getBMRPerformances, 75 spam.task, 271
getBMRPredictions, 76 wpbc.task, 290
getBMRTaskDescs, 78 yeast.task, 290
getBMRTaskIds, 79 ∗ debug
getBMRTuneResults, 80 FailureModel, 43
plotBMRBoxplots, 226 getPredictionDump, 100
plotBMRRanksAsBarChart, 227 getRRDump, 104
plotBMRSummary, 228 ResampleResult, 257
plotCritDifferences, 231 ∗ downsample
reduceBatchmarkResults, 247 downsample, 32
∗ calibration makeDownsampleWrapper, 146
generateCalibrationData, 52 ∗ eda_and_preprocess
plotCalibration, 230 capLargeValues, 22
∗ configure createDummyFeatures, 28
configureMlr, 23 dropFeatures, 33
getMlrOptions, 96 mergeSmallFactorLevels, 218
∗ costsens normalizeFeatures, 221
makeCostSensClassifWrapper, 141 removeConstantFeatures, 250
makeCostSensRegrWrapper, 142 summarizeColumns, 273
makeCostSensTask, 143 summarizeLevels, 274
makeCostSensWeightedPairsWrapper, ∗ extractFDAFeatures
144 reextractFDAFeatures, 248
291
292 INDEX
224, 237, 255, 261, 285 202, 203, 206–208, 280, 285, 290
getFailureModelDump, 25, 84 getNestedTuneResultsX, 97, 98, 103, 117,
getFailureModelMsg, 85 167, 169, 197, 198, 200, 202, 203,
getFeatSelResult, 13, 47, 85, 150, 151, 261 206–208, 280, 285, 290
getFeatureImportance, 86 getOOBPreds, 98
getFilteredFeatures, 49, 59, 87, 127, 128, getParamSet, 82, 90, 92–95, 99, 119, 120,
153, 155, 233 127, 163, 164, 251, 263, 265, 267,
getFunctionalFeatures, 88 268
getHomogeneousEnsembleModels, 89 getParamSet(), 89
getHyperPars, 82, 89, 90, 92–95, 99, 119, getPredictionDump, 25, 43, 100, 104, 257
120, 127, 163, 164, 251, 263, 265, getPredictionProbabilities, 13, 100, 102,
267, 268 245, 267, 268
getLearnerId, 82, 90, 90, 92–95, 99, 119, getPredictionResponse, 13, 101, 101, 102,
120, 127, 163, 164, 251, 263, 265, 245, 267, 268
267, 268 getPredictionSE
getLearnerModel, 91, 134, 141, 142, 144, (getPredictionResponse), 101
171–173, 175, 176, 179 getPredictionTaskDesc, 13, 101, 102, 102,
getLearnerNote, 82, 90, 91, 92–95, 99, 119, 245, 267, 268
120, 127, 163, 164, 251, 263, 265, getPredictionTruth
267, 268 (getPredictionResponse), 101
getLearnerPackages, 82, 90, 92, 92, 93–95, getProbabilities, 103
99, 119, 120, 127, 163, 164, 251, getResamplingIndices, 97, 98, 103, 117,
263, 265, 267, 268 167, 169, 197, 198, 200, 202, 203,
getLearnerParamSet, 82, 90, 92, 92, 93–95, 206–208, 280, 285, 290
99, 119, 120, 127, 163, 164, 251, getRRDump, 25, 43, 100, 104, 257
263, 265, 267, 268 getRRPredictionList, 10, 105, 106, 107,
getLearnerParVals, 82, 90, 92, 93, 93, 94, 187, 189, 256, 257
95, 99, 119, 120, 127, 163, 164, 251, getRRPredictions, 10, 105, 105, 106, 107,
263, 265, 267, 268 187, 189, 256, 257
getLearnerPredictType, 82, 90, 92, 93, 94, getRRTaskDesc, 10, 105, 106, 106, 107, 187,
95, 99, 119, 120, 127, 163, 164, 251, 189, 256, 257
263, 265, 267, 268 getRRTaskDescription, 10, 105, 106, 107,
getLearnerProperties, 167 187, 189, 256, 257
getLearnerProperties getStackedBaseLearnerPredictions, 107
(LearnerProperties), 126 getTaskClassLevels, 108, 108, 109–116,
getLearnerShortName, 82, 90, 92–94, 94, 95, 272
99, 119, 120, 127, 163, 164, 251, getTaskCosts, 108, 108, 110–116, 272
263, 265, 267, 268 getTaskData, 108, 109, 109, 111–116, 272,
getLearnerType, 82, 90, 92–95, 95, 99, 119, 274, 275, 279
120, 127, 163, 164, 251, 263, 265, getTaskDesc, 108–110, 110, 111–116, 272
267, 268 getTaskDescription, 111
getMeasureProperties getTaskFeatureNames, 88, 108–111, 111,
(MeasureProperties), 213 112–116, 272, 274
getMlrOptions, 25, 96 getTaskFormula, 108–112, 112, 113–116,
getMultilabelBinaryPerformances, 96, 272, 274
171, 173, 174, 176, 177 getTaskId, 108–112, 113, 114–116, 272
getNestedTuneResultsOptPathDf, 97, 98, getTaskNFeats, 108–113, 113, 114–116, 272
103, 117, 167, 169, 197, 198, 200, getTaskSize, 108–114, 114, 115, 116, 272
298 INDEX
logical, 32, 49, 88, 103, 108, 109, 155, 169, 185, 191, 208, 210, 211
213, 224, 245, 272, 278 makeExtractFDAFeatMethod, 38, 148, 150
logloss (measures), 214 makeExtractFDAFeatsWrapper, 38, 135, 136,
lsr (measures), 214 140, 142, 147–149, 149, 151, 155,
lung.task, 133 160, 170, 171, 173–175, 177,
180–182, 185, 191, 208, 210, 211
mae (measures), 214 makeFeatSelControlExhaustive
makeAggregation, 11, 133 (FeatSelControl), 44
makeBaggingWrapper, 134, 136, 140, 142, makeFeatSelControlGA (FeatSelControl),
147, 148, 150, 151, 155, 160, 170, 44
171, 173–175, 177, 180–182, 185, makeFeatSelControlRandom
191, 208, 210, 211, 223 (FeatSelControl), 44
makeClassificationViaRegressionWrapper,
makeFeatSelControlSequential
135, 135, 140, 142, 147, 148, 150,
(FeatSelControl), 44
151, 155, 160, 170, 171, 173–175,
makeFeatSelWrapper, 13, 47, 85, 135, 136,
177, 180–182, 185, 191, 208, 210,
140, 142, 147, 148, 150, 150, 155,
211
160, 170, 171, 173–175, 177,
makeClassifTask, 137
180–182, 185, 191, 208, 210, 211,
makeClusterTask, 138
261
makeConstantClassWrapper, 135, 136, 139,
makeFilter, 49, 59, 87, 127, 128, 152, 153,
142, 147, 148, 150, 151, 155, 160,
155, 233
170, 171, 173–175, 177, 180–182,
185, 191, 208, 210, 211 makeFilterEnsemble, 49, 59, 87, 127, 128,
makeCostMeasure, 20, 22, 25, 35, 140, 146, 153, 153, 155, 233
166, 214, 217, 225, 262, 266 makeFilterWrapper, 49, 59, 87, 127, 128,
makeCostSensClassifWrapper, 135, 136, 135, 136, 140, 142, 147, 148, 150,
140, 141, 142, 144, 145, 147, 148, 151, 153, 154, 160, 170, 171,
150, 151, 155, 160, 170, 171, 173–175, 177, 180–182, 185, 191,
173–175, 177, 180–182, 185, 191, 208, 210, 211, 233
208, 210, 211 makeFixedHoldoutInstance, 157, 187
makeCostSensRegrWrapper, 135, 136, 140, makeFunctionalData, 157, 275
142, 142, 144, 145, 147, 148, 150, makeImputeMethod, 121–123, 158, 160, 249
151, 155, 160, 170, 171, 173–175, makeImputeWrapper, 121, 123, 135, 136, 140,
177, 180–182, 185, 191, 208, 210, 142, 147, 148, 150, 151, 155, 159,
211, 223 159, 170, 171, 173–175, 177,
makeCostSensTask, 142, 143, 145 180–182, 185, 191, 208, 210, 211,
makeCostSensWeightedPairsWrapper, 142, 249
144, 144 makeLearner, 14, 16, 34, 56, 82, 90–95, 99,
makeCustomResampledMeasure, 20, 22, 25, 119–121, 126, 127, 135, 136, 139,
35, 141, 145, 166, 217, 225, 262, 266 141, 142, 144, 147, 149, 150, 154,
makeDownsampleWrapper, 33, 62, 135, 136, 159, 160, 164, 170–172, 174–176,
140, 142, 146, 148, 150, 151, 155, 180–182, 184, 190, 208, 209, 211,
160, 170, 171, 173–175, 177, 212, 236, 251, 255, 261, 263–268,
180–182, 185, 191, 196, 199, 201, 277, 285, 287
203, 205, 207, 208, 210, 211, 280, makeLearners, 82, 90, 92–95, 99, 119, 120,
282 127, 163, 164, 251, 263, 265, 267,
makeDummyFeaturesWrapper, 135, 136, 140, 268
142, 147, 147, 150, 151, 155, 160, makeMeasure, 20, 22, 25, 35, 141, 146, 165,
170, 171, 173–175, 177, 180–182, 214, 217, 225, 262, 266
300 INDEX
169, 197, 198, 200, 202, 204, 204, measureAUC (measures), 214
207, 208, 280, 286, 290 measureAUNP (measures), 214
makeTuneControlRandom, 97, 98, 103, 117, measureAUNU (measures), 214
167, 169, 197, 198, 200, 202, 204, measureBAC (measures), 214
206, 206, 208, 280, 286, 290 measureBER (measures), 214
makeTuneMultiCritControlGrid measureBrier (measures), 214
(TuneMultiCritControl), 280 measureBrierScaled (measures), 214
makeTuneMultiCritControlMBO measureEXPVAR (measures), 214
(TuneMultiCritControl), 280 measureF1 (measures), 214
makeTuneMultiCritControlNSGA2 measureFDR (measures), 214
(TuneMultiCritControl), 280 measureFN (measures), 214
makeTuneMultiCritControlRandom measureFNR (measures), 214
(TuneMultiCritControl), 280 measureFP (measures), 214
makeTuneWrapper, 13, 15, 97, 98, 103, 116, measureFPR (measures), 214
117, 135, 136, 140, 142, 143, 147, measureGMEAN (measures), 214
148, 150, 151, 155, 160, 167, measureGPR (measures), 214
169–171, 173, 174, 176, 177, measureKAPPA (measures), 214
180–182, 185, 191, 197, 198, 200, measureKendallTau (measures), 214
202, 204, 206, 207, 207, 210, 211, measureLogloss (measures), 214
280, 286, 290 measureLSR (measures), 214
makeUndersampleWrapper, 135, 136, 140, measureMAE (measures), 214
142, 143, 147, 148, 150, 151, 155, measureMAPE (measures), 214
160, 170, 171, 173, 174, 176, 177, measureMCC (measures), 214
180–182, 185, 191, 208, 209, 211, measureMEDAE (measures), 214
223, 271 measureMEDSE (measures), 214
makeWeightedClassesWrapper, 135, 136, measureMMCE (measures), 214
140, 142, 143, 147, 148, 150, 151, measureMSE (measures), 214
155, 160, 170, 171, 173, 174, 176, measureMSLE (measures), 214
177, 180–182, 185, 191, 208, 210, measureMulticlassBrier (measures), 214
210 measureMultilabelACC (measures), 214
makeWrappedModel, 212 measureMultilabelF1 (measures), 214
mape (measures), 214 measureMultilabelHamloss (measures), 214
matrix, 25, 26, 83, 140, 216, 284 measureMultilabelPPV (measures), 214
mcc (measures), 214 measureMultilabelSubset01 (measures),
mco::nsga2, 281 214
mcp (measures), 214 measureMultilabelTPR (measures), 214
mean, 141 measureNPV (measures), 214
meancosts (measures), 214 measurePPV (measures), 214
Measure, 10, 14, 16, 17, 26, 34, 50, 51, 54, 57, MeasureProperties, 213
62, 67, 84, 96, 132, 134, 141, 145, measureQSR (measures), 214
146, 151, 166, 208, 213, 224, 226, measureRAE (measures), 214
227, 229, 237, 238, 241, 243, 255, measureRMSE (measures), 214
261, 262, 266, 282, 284, 285, 287, measureRMSLE (measures), 214
289 measureRRSE (measures), 214
Measure (makeMeasure), 165 measureRSQ (measures), 214
measureACC (measures), 214 measures, 20–22, 25, 35, 141, 146, 165, 166,
measureAU1P (measures), 214 214, 225, 252, 262, 266
measureAU1U (measures), 214 measureSAE (measures), 214
302 INDEX
228, 231, 232, 238, 240–243, 248 reimpute, 121–123, 159, 160, 249
plotCalibration, 30, 54, 227, 228, 230, 230, removeConstantFeatures, 23, 28, 33, 184,
232, 238, 240–243 218, 222, 250, 273, 274
plotCritDifferences, 15, 17, 18, 26, 30, 51, removeHyperPars, 82, 90, 92–95, 99, 119,
52, 55, 68, 69, 71–77, 79, 81, 120, 127, 163, 164, 251, 263, 265,
227–231, 231, 238, 240–243, 248 267, 268
plotFilterValues, 49, 54, 55, 58, 59, 63, 66, repcv (resample), 252
67, 87, 127, 128, 153, 155, 232 resample, 10, 29, 60, 67, 75, 76, 104–107,
plotHyperParsEffect, 60, 61, 233 187, 189, 252, 256, 257, 285, 288
plotLearnerPrediction, 236 ResampleDesc, 14, 16, 34, 62, 150, 187, 188,
plotLearningCurve, 30, 63, 227, 228, 193, 202, 208, 252, 255, 261, 285,
230–232, 238, 240–243 287
plotPartialDependence, 30, 64, 66, 227, ResampleDesc (makeResampleDesc), 185
228, 230–232, 238, 239, 241–243 ResampleInstance, 16, 32, 62, 150, 157, 185,
plotResiduals, 30, 227, 228, 230–232, 238, 189, 202, 208, 252, 255, 261, 285,
240, 240, 242, 243 287
plotROCCurves, 30, 67, 227, 228, 230–232, ResampleInstance
238, 240, 241, 241, 243 (makeResampleInstance), 188
plotThreshVsPerf, 30, 67, 227, 228, ResamplePrediction, 10, 34, 67, 76,
230–232, 238, 240–242, 242 105–107, 146, 187, 189, 256, 256,
plotTuneMultiCritResult, 244, 283, 288 257
PMCMRplus::frdAllPairsNemenyiTest, 50, ResampleResult, 10, 14, 16, 17, 29, 43, 53,
51, 55 60, 66, 67, 97, 98, 100, 103–107,
ppv (measures), 214 187, 189, 256, 257
predict.WrappedModel, 13, 83, 101, 102, RLearner, 246, 258, 260, 279
245, 267, 268, 278 RLearnerClassif, 260
Prediction, 13, 19, 21, 34, 53, 66, 83, 96, RLearnerClassif (RLearner), 258
98–100, 102, 105, 134, 146, 166, RLearnerCluster, 260
224, 240, 245, 256, 268, 289 RLearnerCluster (RLearner), 258
predictLearner, 246 RLearnerMultilabel, 260
print.ConfusionMatrix RLearnerMultilabel (RLearner), 258
(calculateConfusionMatrix), 19 RLearnerRegr, 260
print.ROCMeasures RLearnerRegr (RLearner), 258
(calculateROCMeasures), 20 RLearnerSurv, 260
RLearnerSurv (RLearner), 258
qsr (measures), 214 rmse (measures), 214
rmsle (measures), 214
rae (measures), 214 rpart::rpart, 91
randomForest::importance(), 86 rrse (measures), 214
ranger::importance(), 87 rsq (measures), 214
ranger::ranger, 162
ranger::ranger(), 87 sae (measures), 214
rank, 228 sd, 273
reduceBatchmarkResults, 15, 17, 18, 26, 51, selectFeatures, 12, 13, 44, 47, 85, 150, 151,
52, 55, 68, 69, 71–77, 79, 81, 260
227–229, 232, 247 setAggregation, 20, 22, 25, 35, 134, 141,
reextractFDAFeatures, 37, 149, 248 146, 166, 187, 217, 225, 262, 266
RegrTask, 35, 138, 139, 144, 179, 195, 276 setHyperPars, 82, 90, 92–95, 99, 119, 120,
RegrTask (makeRegrTask), 183 127, 163, 164, 251, 263, 265, 267,
304 INDEX
(makeTuneControlGenSA), 198
TuneControlGrid, 202
TuneControlGrid (makeTuneControlGrid),
200
TuneControlIrace, 203
TuneControlIrace
(makeTuneControlIrace), 202
TuneControlMBO, 205
TuneControlMBO (makeTuneControlMBO), 204
TuneControlRandom, 207
TuneControlRandom
(makeTuneControlRandom), 206
TuneMultiCritControl, 244, 280, 283, 287,
288
TuneMultiCritControlGrid, 283
TuneMultiCritControlGrid
(TuneMultiCritControl), 280
TuneMultiCritControlMBO, 283
TuneMultiCritControlMBO
(TuneMultiCritControl), 280
TuneMultiCritControlNSGA2, 283
TuneMultiCritControlNSGA2
(TuneMultiCritControl), 280
TuneMultiCritControlRandom, 283
TuneMultiCritControlRandom
(TuneMultiCritControl), 280
TuneMultiCritResult, 244, 284, 288
tuneParams, 60, 81, 97, 98, 103, 117, 167,
169, 197, 198, 200, 202, 204,
206–208, 280, 284, 290
tuneParamsMultiCrit, 244, 283, 284, 287
TuneResult, 60, 80, 117, 285, 288
tuneThreshold, 46, 97, 98, 103, 117, 167,
169, 171, 196–208, 280, 283, 286,
289
undersample, 209
undersample (oversample), 222
wavelets::dwt, 42
wkappa (measures), 214
wpbc.task, 290
WrappedModel, 35, 43, 64, 74, 84–87, 91, 99,
107, 116, 124, 166, 213, 224, 245,
246, 255, 257, 278, 289
WrappedModel (makeWrappedModel), 212
yeast.task, 290