Title: | Logit Leaf Model Classifier for Binary Classification |
---|---|
Description: | Fits the Logit Leaf Model, makes predictions and visualizes the output. (De Caigny et al., (2018) <DOI:10.1016/j.ejor.2018.02.009>). |
Authors: | Arno De Caigny [aut, cre], Kristof Coussement [aut], Koen W. De Bock [aut] |
Maintainer: | Arno De Caigny <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2024-10-31 06:55:49 UTC |
Source: | CRAN |
This function creates the logit leaf model. It takes a dataframe with numeric values as input and a corresponding vector with dependent values. Decision tree parameters threshold for pruning and number of observations per leaf can be set.
llm(X, Y, threshold_pruning = 0.25, nbr_obs_leaf = 100)
llm(X, Y, threshold_pruning = 0.25, nbr_obs_leaf = 100)
X |
Dataframe containing numerical independent variables. |
Y |
Numerical vector of dependent variable. Currently only binary classification is supported. |
threshold_pruning |
Set confidence threshold for pruning. Default 0.25. |
nbr_obs_leaf |
The minimum number of observations in a leaf node. Default 100. |
An object of class logitleafmodel, which is a list with the following components:
Segment Rules |
The decision rules that define segments. Use |
Coefficients |
The segment specific logistic regression coefficients. Use |
Full decision tree for segmentation |
The raw decision tree. Use |
Observations per segment |
The raw decision tree. Use |
Incidence of dependent per segment |
The raw decision tree. Use |
Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
predict.llm
, table.llm.html
, llm.cv
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100)
In v-fold cross validation, the data are divided into v subsets of approximately equal size. Subsequently, one of the v data parts is excluded while the remaider of the data is used to create a logitleafmodel object. Predictions are generated for the excluded data part. The process is repeated v times.
llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)
llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)
X |
Dataframe containing numerical independent variables. |
Y |
Numerical vector of dependent variable. Currently only binary classification is supported. |
cv |
An integer specifying the number of folds in the cross-validation. |
threshold_pruning |
Set confidence threshold for pruning. Default 0.25. |
nbr_obs_leaf |
The minimum number of observations in a leaf node. Default 100. |
An object of class llm.cv, which is a list with the following components:
foldpred |
a data frame with, per fold, predicted class membership probabilities for the left-out observations |
pred |
a data frame with predicted class membership probabilities. |
foldclass |
a data frame with, per fold, predicted classes for the left-out observations. |
class |
a data frame with the predicted classes. |
conf |
the confusion matrix which compares the real versus the predicted class memberships based on the class object. |
Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
predict.llm
, table.llm.html
, llm
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Create the LLM with 5-cv Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5, threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Create the LLM with 5-cv Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5, threshold_pruning = 0.25,nbr_obs_leaf = 100)
This function creates a prediction for an object of class logitleafmodel. It assumes a dataframe with numeric
values as input and an object of class logitleafmodel, which is the result of the llm
function.
Currently only binary classification is supported.
## S3 method for class 'llm' predict(object, X, ...)
## S3 method for class 'llm' predict(object, X, ...)
object |
An object of class logitleafmodel, as that created by the function llm. |
X |
Dataframe containing numerical independent variables. |
... |
further arguments passed to or from other methods. |
Returns a dataframe containing a probablity for every instance based on the LLM model. Optional rownumbers can be added.
Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Use the model on the test dataset to make a prediction PimaPrediction <- predict.llm(object = Pima.llm, X = Pimatest[,-c(9)]) ## Optionally add the dependent to calculate performance statistics such as AUC # PimaPrediction <- cbind(PimaPrediction, "diabetes" = Pimatest[,"diabetes"])
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Use the model on the test dataset to make a prediction PimaPrediction <- predict.llm(object = Pima.llm, X = Pimatest[,-c(9)]) ## Optionally add the dependent to calculate performance statistics such as AUC # PimaPrediction <- cbind(PimaPrediction, "diabetes" = Pimatest[,"diabetes"])
This function generates HTML code for a visualization of the logit leaf model based on the variable importance per variable category.
table.cat.llm.html( object, category_var_df, headertext = "The Logit Leaf Model", footertext = "A table footer comment", roundingnumbers = 2, methodvarimp = "Coef" )
table.cat.llm.html( object, category_var_df, headertext = "The Logit Leaf Model", footertext = "A table footer comment", roundingnumbers = 2, methodvarimp = "Coef" )
object |
An object of class logitleafmodel, as that created by the function llm. |
category_var_df |
dataframe containing a column called "iv" with the independent variables and a column called "cat" with the variable category names that is associated with every iv |
headertext |
Allows to provide the table with a header. |
footertext |
Allows to provide the table with a custom footer. |
roundingnumbers |
An integer stating the number of decimals in the visualization. |
methodvarimp |
Allows to determine the method to calculate the variable importance. There are 4 options: 1/ Variable coefficent (method = 'Coef) 2/ Standardized beta ('Beta') 3/ Wald statistic ('Wald') 4/ Likelihood Rate Test ('LRT') |
Generates HTML code for a visualization.
Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <- PimaIndiansDiabetes[idtrain,] Pimatest <- PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Define the variable categories (note: the categories are only created for demonstration) var_cat_df <- as.data.frame(cbind(names(PimaTrain[,-c(9)]), c("cat_a","cat_a","cat_a","cat_a","cat_b","cat_b","cat_b","cat_b")), stringsAsFactors = FALSE) names(var_cat_df) <- c("iv", "cat") ## Save the output of the model to a html file Pima.Viz <- table.cat.llm.html(object = Pima.llm,category_var_df= var_cat_df, headertext = "This is an example of the LLM model", footertext = "Enjoy the package!") ## Optionaly write it to your working directory # write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <- PimaIndiansDiabetes[idtrain,] Pimatest <- PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Define the variable categories (note: the categories are only created for demonstration) var_cat_df <- as.data.frame(cbind(names(PimaTrain[,-c(9)]), c("cat_a","cat_a","cat_a","cat_a","cat_b","cat_b","cat_b","cat_b")), stringsAsFactors = FALSE) names(var_cat_df) <- c("iv", "cat") ## Save the output of the model to a html file Pima.Viz <- table.cat.llm.html(object = Pima.llm,category_var_df= var_cat_df, headertext = "This is an example of the LLM model", footertext = "Enjoy the package!") ## Optionaly write it to your working directory # write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
This function generates HTML code for a visualization of the logit leaf model.
table.llm.html( object, headertext = "The Logit Leaf Model", footertext = "A table footer comment", roundingnumbers = 2 )
table.llm.html( object, headertext = "The Logit Leaf Model", footertext = "A table footer comment", roundingnumbers = 2 )
object |
An object of class logitleafmodel, as that created by the function llm. |
headertext |
Allows to provide the table with a header. |
footertext |
Allows to provide the table with a custom footer. |
roundingnumbers |
An integer stating the number of decimals in the visualization. |
Generates HTML code for a visualization.
Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]
Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Save the output of the model to a html file Pima.Viz <- table.llm.html(object = Pima.llm, headertext = "This is an example of the LLM model", footertext = "Enjoy the package!") ## Optionaly write it to your working directory # write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
## Load PimaIndiansDiabetes dataset from mlbench package if (requireNamespace("mlbench", quietly = TRUE)) { library("mlbench") } data("PimaIndiansDiabetes") ## Split in training and test (2/3 - 1/3) idtrain <- c(sample(1:768,512)) PimaTrain <-PimaIndiansDiabetes[idtrain,] Pimatest <-PimaIndiansDiabetes[-idtrain,] ## Create the LLM Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes, threshold_pruning = 0.25,nbr_obs_leaf = 100) ## Save the output of the model to a html file Pima.Viz <- table.llm.html(object = Pima.llm, headertext = "This is an example of the LLM model", footertext = "Enjoy the package!") ## Optionaly write it to your working directory # write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")