Title: | Branch-Exclusive Splits Trees |
---|---|
Description: | Decision tree algorithm with a major feature added. Allows for users to define an ordering on the partitioning process. Resulting in Branch-Exclusive Splits Trees (BEST). Cedric Beaulac and Jeffrey S. Rosentahl (2019) <arXiv:1804.10168>. |
Authors: | Beaulac Cedric [aut, cre] |
Maintainer: | Beaulac Cedric <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.2 |
Built: | 2024-10-31 06:31:06 UTC |
Source: | CRAN |
Computes the proportion of matching terms in two vectors of the same length. Used to compute the accuracy for prediction on test set.
Acc(Vec1, Vec2)
Acc(Vec1, Vec2)
Vec1 |
A vector of labels |
Vec2 |
Another vector of labels |
Percentage of identical labels (accuracy)
Vec1 <- c(1,1,2,3,1) Vec2 <- c(1,2,2,3,1) Acc(Vec1,Vec2)
Vec1 <- c(1,1,2,3,1) Vec2 <- c(1,2,2,3,1) Acc(Vec1,Vec2)
Performs Bootstrap Aggregating of BEST trees
BaggedBEST(Data, VA, NoT = 50, Size = 50)
BaggedBEST(Data, VA, NoT = 50, Size = 50)
Data |
A data set (Data Frame): Can take on both numerical and categorical predictors. Last column of the data set must be the Repsonse Variable (Categorical Variables only) |
VA |
Variable Availability structure |
NoT |
Number of Trees in the bag |
Size |
Minimal Number of Observation within a leaf needed for partitionning (default is 50) |
A list of BEST Objects
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size)
Main function of the package. It produces Classification Trees with Branch-Exclusive variables.
BEST(Data, Size, VA)
BEST(Data, Size, VA)
Data |
A data set (Data Frame): Can take on both numerical and categorical predictors. Last column of the data set must be the Repsonse Variable (Categorical Variables only) |
Size |
Minimal Number of Observation within a leaf needed for partitionning |
VA |
Variable Availability structure |
A BEST object with is a list containing the resulting tree, row numbers for each regions and the split points
n <- 1000 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA)
n <- 1000 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA)
Generates a random forest of BEST trees
BESTForest(Data, VA, NoT = 50, Size = 50)
BESTForest(Data, VA, NoT = 50, Size = 50)
Data |
A data set (Data Frame): Can take on both numerical and categorical predictors. Last column of the data set must be the Repsonse Variable (Categorical Variables only) |
VA |
Variable Availability structure |
NoT |
Number of Trees in the bag |
Size |
Minimal Number of Observation within a leaf needed for partitionning (default is 50) |
A list of BEST Objects (Random Forest)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BESTForest(Data,VA,NoT,Size)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BESTForest(Data,VA,NoT,Size)
Data generated according to decision tree for simulation purposes
Data
Data
A data frame with 10000 rows and 5 variables:
Binary predictor
Binary predictor
Continuous predictor between 0 and 1
Continuous predictor between 0 and 1
The response variable
...
Data generated according to decision tree for simulation purposes
Fit
Fit
A typical list produced by the BEST function:
Tree structure indicating spliting variables, impurity of the region and split variable
List of splitting values
Observaton numbers in the respective regions
...
Quickly build the Available Variable list necessary for BEST This list contains details as to which variables is available for the partitioning. It also contains which variables are gating variables.
ForgeVA(d, GV, BEV, Thresh = 0.5, Direc = 0)
ForgeVA(d, GV, BEV, Thresh = 0.5, Direc = 0)
d |
Number of predictors |
GV |
Gating variables |
BEV |
Branch-Exclusive Variables |
Thresh |
Threshold for Gates |
Direc |
Direction of Gates ( 1 means add variable if bigger than thresh) |
The list containing the Variable Availability structure
#This function can be used to set up the variable availability structure. #Suppose we want to fit a regular decision tree on a data set containing d predictors d <- 10 VA <- ForgeVA(d,1,0,0,0) #Suppose now that predictor x5 is a binary gating variable for x4 #such that x4 is available if x5 = 1 GV <- 5 #The gating variable BEV <- 4 #The Branch-Exclusive variable Tresh = 0.5 #Value between 0 and 1 Direc = 1 #X4 is available if X5 is bigger than Tresh VA <- ForgeVA(d,GV,BEV,Tresh,Direc)
#This function can be used to set up the variable availability structure. #Suppose we want to fit a regular decision tree on a data set containing d predictors d <- 10 VA <- ForgeVA(d,1,0,0,0) #Suppose now that predictor x5 is a binary gating variable for x4 #such that x4 is available if x5 = 1 GV <- 5 #The gating variable BEV <- 4 #The Branch-Exclusive variable Tresh = 0.5 #Value between 0 and 1 Direc = 1 #X4 is available if X5 is bigger than Tresh VA <- ForgeVA(d,GV,BEV,Tresh,Direc)
Emits prediction from a forest of BEST's
FPredict(M, LFit)
FPredict(M, LFit)
M |
A matrix of new observations where one row is one observation |
LFit |
A list of BEST Objects (Usually produced by RBEST or BESTForest) |
A vector of predictions
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size) Predictions <- BESTree::FPredict(NewPoints,Fit)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size) Predictions <- BESTree::FPredict(NewPoints,Fit)
Classify a set of new observation points
MPredict(M, Fit)
MPredict(M, Fit)
M |
A matrix of new observations where one row is one observation |
Fit |
A BEST object |
The predicted class
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA) Predictions <- BESTree::MPredict(NewPoints,Fit)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA) Predictions <- BESTree::MPredict(NewPoints,Fit)
Classify a new observation point
Predict(Point, Fit)
Predict(Point, Fit)
Point |
A new observation |
Fit |
A BEST object |
The predicted class
n <- 500 Data <- BESTree::Data[1:n,] NewPoint <- BESTree::Data[n+1,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA) BESTree::Predict(NewPoint[1:d],Fit)
n <- 500 Data <- BESTree::Data[1:n,] NewPoint <- BESTree::Data[n+1,] d <- ncol(Data)-1 VA <- ForgeVA(d,1,0,0,0) Size <- 50 Fit <- BESTree::BEST(Data,Size,VA) BESTree::Predict(NewPoint[1:d],Fit)
Uses a Validation Set to select the best trees within the list of pruned trees.
TreePruning(Fit, VSet)
TreePruning(Fit, VSet)
Fit |
A BEST object |
VSet |
A Validation Set (Can also be used in CV loop) |
The shallower trees among trees wiht Highest accuracy. This replaces the first element in the BEST object list.
nv <- 50 ValData <- BESTree::Data[(1000+1):nv,] Fit <- BESTree::Fit Fit[[1]] <- BESTree::TreePruning(Fit,ValData)
nv <- 50 ValData <- BESTree::Data[(1000+1):nv,] Fit <- BESTree::Fit Fit[[1]] <- BESTree::TreePruning(Fit,ValData)
Produces a variable important analysis using the mean decrease in node impurity
VI(Forest)
VI(Forest)
Forest |
A list of BEST Objects (Usually produced by RBEST or BESTForest) |
A vector of importance (size d)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size) VI <- BESTree::VI(Fit)
n <- 500 Data <- BESTree::Data[1:n,] d <- ncol(Data)-1 NewPoints <- BESTree::Data[(n+1):(n+11),1:d] VA <- ForgeVA(d,1,0,0,0) Size <- 50 NoT <- 10 Fit <- BESTree::BaggedBEST(Data,VA,NoT,Size) VI <- BESTree::VI(Fit)