Title: | Random Ferns Classifier |
---|---|
Description: | Provides the random ferns classifier by Ozuysal, Calonder, Lepetit and Fua (2009) <doi:10.1109/TPAMI.2009.23>, modified for generic and multi-label classification and featuring OOB error approximation and importance measure as introduced in Kursa (2014) <doi:10.18637/jss.v061.i10>. |
Authors: | Miron Bartosz Kursa [aut, cre] |
Maintainer: | Miron Bartosz Kursa <[email protected]> |
License: | GPL (>= 2) |
Version: | 5.0.0 |
Built: | 2024-11-19 06:43:33 UTC |
Source: | CRAN |
This function combines two compatible (same decision, same training data structure and same depth) models into a single ensemble. It can be used to distribute model training, perform it on batches of data, save checkouts or precisely investigate its course.
## S3 method for class 'rFerns' merge( x, y, dropModel = FALSE, ignoreObjectConsistency = FALSE, trueY = NULL, ... )
## S3 method for class 'rFerns' merge( x, y, dropModel = FALSE, ignoreObjectConsistency = FALSE, trueY = NULL, ... )
x |
Object of a class |
y |
Object of a class |
dropModel |
If |
ignoreObjectConsistency |
If |
trueY |
Copy of the training decision, used to re-construct OOB error and confusion matrix.
Can be omitted, OOB error and confusion matrix will disappear in that case; ignored when |
... |
Ignored, for S3 gerneric/method consistency. |
An object of class rFerns
, which is a list with the following components:
model |
The merged model in case both |
oobErr |
OOB approximation of accuracy, if can be computed.
Namely, when |
importance |
The merged importance scores in case both |
oobScores |
OOB scores, if can be computed; namely if both models had it calculated and |
oobPreds |
A vector of OOB predictions of class for each object in training set, if can be computed. |
oobConfusionMatrix |
OOB confusion matrix, if can be computed.
Namely, when |
timeTaken |
Time used to train the model, calculated as a sum of training times of |
parameters |
Numerical vector of three elements: |
classLabels |
Copy of |
isStruct |
Copy of the train set structure. |
merged |
Set to |
In case of different training object sets were used to build the merged models, merged importance is calculated but mileage may vary; for substantially different sets it may become biased. Your have been warned.
Shadow importance is only merged when both models have shadow importance and the same consistentSeed
value; otherwise shadow importance would be biased down.
The order of objects in x
and y
is not important; the only exception is merging with NULL
, in which case x
must be an rFerns
object for R to use proper merge method.
set.seed(77) #Fetch Iris data data(iris) #Build models rFerns(Species~.,data=iris)->modelA rFerns(Species~.,data=iris)->modelB modelAB<-merge(modelA,modelB) print(modelA) print(modelAB)
set.seed(77) #Fetch Iris data data(iris) #Build models rFerns(Species~.,data=iris)->modelA rFerns(Species~.,data=iris)->modelB modelAB<-merge(modelA,modelB) print(modelA) print(modelAB)
Proof-of-concept ensemble of rFerns models, built to stabilise and improve selection based on shadow importance.
It employs a super-ensemble of iterations
small rFerns forests, each built on a subspace of size
attributes, which is selected randomly, but with a higher selection probability for attributes claimed important by previous sub-models.
Final selection is a group of attributes which hold a substantial weight at the end of the procedure.
naiveWrapper( x, y, iterations = 1000, depth = 5, ferns = 100, size = 30, lambda = 5, threads = 0, saveHistory = FALSE )
naiveWrapper( x, y, iterations = 1000, depth = 5, ferns = 100, size = 30, lambda = 5, threads = 0, saveHistory = FALSE )
x |
Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns.
Factors must have less than 31 levels. No |
y |
A decision vector. Must a factor of the same length as |
iterations |
Number of iterations i.e., the number of sub-models built. |
depth |
The depth of the ferns; must be in 1–16 range. Note that time and memory requirements scale with |
ferns |
Number of ferns to be build in each sub-model. This should be a small number, around 3-5 times |
size |
Number of attributes considered by each sub-model. |
lambda |
Lambda parameter driving the re-weighting step of the method. |
threads |
Number of parallel threads, copied to the underlying |
saveHistory |
Should weight history be stored. |
An object of class naiveWrapper
, which is a list with the following components:
found |
Names of all selected attributes. |
weights |
Vector of weights indicating the confidence that certain feature is relevant. |
timeTaken |
Time of computation. |
weightHistory |
History of weights over all iterations, present if |
params |
Copies of algorithm parameters, |
Kursa MB (2017). Efficient all relevant feature selection with random ferns. In: Kryszkiewicz M., Appice A., Slezak D., Rybinski H., Skowron A., Ras Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science, vol 10352. Springer, Cham.
set.seed(77) #Fetch Iris data data(iris) #Extend with random noise noisyIris<-cbind(iris[,-5],apply(iris[,-5],2,sample)) names(noisyIris)[5:8]<-sprintf("Nonsense%d",1:4) #Execute selection naiveWrapper(noisyIris,iris$Species,iterations=50,ferns=20,size=8)
set.seed(77) #Fetch Iris data data(iris) #Extend with random noise noisyIris<-cbind(iris[,-5],apply(iris[,-5],2,sample)) names(noisyIris)[5:8]<-sprintf("Nonsense%d",1:4) #Execute selection naiveWrapper(noisyIris,iris$Species,iterations=50,ferns=20,size=8)
This function predicts classes of new objects with given rFerns
object.
## S3 method for class 'rFerns' predict(object, x, scores = FALSE, ...)
## S3 method for class 'rFerns' predict(object, x, scores = FALSE, ...)
object |
Object of a class |
x |
Data frame containing attributes; must have corresponding names to training set (although order is not important) and do not introduce new factor levels. If this argument is not given, OOB predictions on the training set will be returned. |
scores |
If |
... |
Additional parameters. |
Predictions.
If scores
is TRUE
, a factor vector (for many-class classification) or a logical data.frame (for multi-class classification) with predictions, else a data.frame with class' scores.
set.seed(77) #Fetch Iris data data(iris) #Split into tRain and tEst set iris[c(TRUE,FALSE),]->irisR iris[c(FALSE,TRUE),]->irisE #Build model rFerns(Species~.,data=irisR)->model print(model) #Test predict(model,irisE)->p print(table( Predictions=p, True=irisE[["Species"]])) err<-mean(p!=irisE[["Species"]]) print(paste("Test error",err,sep=" ")) #Show first OOB scores head(predict(model,scores=TRUE))
set.seed(77) #Fetch Iris data data(iris) #Split into tRain and tEst set iris[c(TRUE,FALSE),]->irisR iris[c(FALSE,TRUE),]->irisE #Build model rFerns(Species~.,data=irisR)->model print(model) #Test predict(model,irisE)->p print(table( Predictions=p, True=irisE[["Species"]])) err<-mean(p!=irisE[["Species"]]) print(paste("Test error",err,sep=" ")) #Show first OOB scores head(predict(model,scores=TRUE))
This function builds a random ferns model on the given training data.
rFerns(x, ...) ## S3 method for class 'formula' rFerns(formula, data = .GlobalEnv, ...) ## S3 method for class 'matrix' rFerns(x, y, ...) ## Default S3 method: rFerns( x, y, depth = 5, ferns = 1000, importance = "none", saveForest = TRUE, consistentSeed = NULL, threads = 0, ... )
rFerns(x, ...) ## S3 method for class 'formula' rFerns(formula, data = .GlobalEnv, ...) ## S3 method for class 'matrix' rFerns(x, y, ...) ## Default S3 method: rFerns( x, y, depth = 5, ferns = 1000, importance = "none", saveForest = TRUE, consistentSeed = NULL, threads = 0, ... )
x |
Data frame containing attributes; must have unique names and contain only numeric, integer or (ordered) factor columns.
Factors must have less than 31 levels. No |
... |
For formula and matrix methods, a place to state parameters to be passed to default method.
For the print method, arguments to be passed to |
formula |
alternatively, formula describing model to be analysed. |
data |
in which to interpret formula. |
y |
A decision vector. Must a factor of the same length as |
depth |
The depth of the ferns; must be in 1–16 range. Note that time and memory requirements scale with |
ferns |
Number of ferns to be build. |
importance |
Set to calculate attribute importance measure (VIM);
|
saveForest |
Should the model be saved? It must be |
consistentSeed |
PRNG seed used for shadow importance only.
Must be either a 2-element integer vector or |
threads |
Number or OpenMP threads to use. The default value of |
An object of class rFerns
, which is a list with the following components:
model |
The built model; |
oobErr |
OOB approximation of accuracy. Ignores never-OOB-tested objects (see oobScores element). |
importance |
The importance scores or |
oobScores |
A matrix of OOB scores of each class for each object in training set.
Rows correspond to classes in the same order as in |
oobPreds |
A vector of OOB predictions of class for each object in training set. Never-OOB-tested objects (see above) have predictions equal to |
oobConfusionMatrix |
Confusion matrix build from |
timeTaken |
Time used to train the model (smaller than wall time because data preparation and model final touches are excluded; however it includes the time needed to compute importance, if it applies).
An object of |
parameters |
Numerical vector of three elements: |
classLabels |
Copy of |
consistentSeed |
Consistent seed used; only present for |
isStruct |
Copy of the train set structure, required internally by predict method. |
The unused levels of the decision will be removed; on the other hand unused levels of categorical attributes will be preserved, so that they could be present in the data later predicted with the model. The levels of ordered factors in training and predicted data must be identical.
Do not use formula interface for a data with large number of attributes; the overhead from handling the formula may be significant.
Ozuysal M, Calonder M, Lepetit V & Fua P. (2009). Fast Keypoint Recognition using Random Ferns, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448-461.
Kursa MB (2014). rFerns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning, Journal of Statistical Software, 61(10), 1-13.
set.seed(77) #Fetch Iris data data(iris) #Build model rFerns(Species~.,data=iris) ##Importance rFerns(Species~.,data=iris,importance="shadow")->model print(model$imp)
set.seed(77) #Fetch Iris data data(iris) #Build model rFerns(Species~.,data=iris) ##Importance rFerns(Species~.,data=iris,importance="shadow")->model print(model$imp)