Package 'LLM' reference manual

Title:	Logit Leaf Model Classifier for Binary Classification
Description:	Fits the Logit Leaf Model, makes predictions and visualizes the output. (De Caigny et al., (2018) <DOI:10.1016/j.ejor.2018.02.009>).
Authors:	Arno De Caigny [aut, cre], Kristof Coussement [aut], Koen W. De Bock [aut]
Maintainer:	Arno De Caigny <[email protected]>
License:	GPL (>= 3)
Version:	1.1.0
Built:	2025-02-28 07:34:13 UTC
Source:	CRAN

Create Logit Leaf Model

Description

This function creates the logit leaf model. It takes a dataframe with numeric values as input and a corresponding vector with dependent values. Decision tree parameters threshold for pruning and number of observations per leaf can be set.

Usage

llm(X, Y, threshold_pruning = 0.25, nbr_obs_leaf = 100)
llm(X, Y, threshold_pruning = 0.25, nbr_obs_leaf = 100)

Arguments

`X`	Dataframe containing numerical independent variables.
`Y`	Numerical vector of dependent variable. Currently only binary classification is supported.
`threshold_pruning`	Set confidence threshold for pruning. Default 0.25.
`nbr_obs_leaf`	The minimum number of observations in a leaf node. Default 100.

Value

An object of class logitleafmodel, which is a list with the following components:

`Segment Rules`	The decision rules that define segments. Use `table.llm.html` to visualize.
`Coefficients`	The segment specific logistic regression coefficients. Use `table.llm.html` to visualize.
`Full decision tree for segmentation`	The raw decision tree. Use `table.llm.html` to visualize.
`Observations per segment`	The raw decision tree. Use `table.llm.html` to visualize.
`Incidence of dependent per segment`	The raw decision tree. Use `table.llm.html` to visualize.

Author(s)

Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]

References

Arno De Caigny, Kristof Coussement, Koen W. De Bock, A New Hybrid Classification Algorithm for Customer Churn Prediction Based on Logistic Regression and Decision Trees, European Journal of Operational Research (2018), doi: 10.1016/j.ejor.2018.02.009.

Examples

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)

Runs v-fold cross validation with LLM

Description

In v-fold cross validation, the data are divided into v subsets of approximately equal size. Subsequently, one of the v data parts is excluded while the remaider of the data is used to create a logitleafmodel object. Predictions are generated for the excluded data part. The process is repeated v times.

Usage

llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)
llm.cv(X, Y, cv, threshold_pruning = 0.25, nbr_obs_leaf = 100)

Arguments

`X`	Dataframe containing numerical independent variables.
`Y`	Numerical vector of dependent variable. Currently only binary classification is supported.
`cv`	An integer specifying the number of folds in the cross-validation.
`threshold_pruning`	Set confidence threshold for pruning. Default 0.25.
`nbr_obs_leaf`	The minimum number of observations in a leaf node. Default 100.

Value

An object of class llm.cv, which is a list with the following components:

`foldpred`	a data frame with, per fold, predicted class membership probabilities for the left-out observations
`pred`	a data frame with predicted class membership probabilities.
`foldclass`	a data frame with, per fold, predicted classes for the left-out observations.
`class`	a data frame with the predicted classes.
`conf`	the confusion matrix which compares the real versus the predicted class memberships based on the class object.

Author(s)

Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]

References

Examples

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Create the LLM with 5-cv
Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Create the LLM with 5-cv
Pima.llm <- llm.cv(X = PimaIndiansDiabetes[,-c(9)],Y = PimaIndiansDiabetes$diabetes, cv=5,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)

Create Logit Leaf Model Prediction

Description

This function creates a prediction for an object of class logitleafmodel. It assumes a dataframe with numeric values as input and an object of class logitleafmodel, which is the result of the llm function. Currently only binary classification is supported.

Usage

## S3 method for class 'llm'
predict(object, X, ...)
## S3 method for class 'llm'
predict(object, X, ...)

Arguments

`object`	An object of class logitleafmodel, as that created by the function llm.
`X`	Dataframe containing numerical independent variables.
`...`	further arguments passed to or from other methods.

Value

Returns a dataframe containing a probablity for every instance based on the LLM model. Optional rownumbers can be added.

Author(s)

Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]

References

Examples

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Use the model on the test dataset to make a prediction
PimaPrediction <- predict.llm(object = Pima.llm, X = Pimatest[,-c(9)])
## Optionally add the dependent to calculate performance statistics such as AUC
# PimaPrediction <- cbind(PimaPrediction, "diabetes" = Pimatest[,"diabetes"])
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Use the model on the test dataset to make a prediction
PimaPrediction <- predict.llm(object = Pima.llm, X = Pimatest[,-c(9)])
## Optionally add the dependent to calculate performance statistics such as AUC
# PimaPrediction <- cbind(PimaPrediction, "diabetes" = Pimatest[,"diabetes"])

Create the HTML code for Logit Leaf Model visualization

Description

This function generates HTML code for a visualization of the logit leaf model based on the variable importance per variable category.

Usage

table.cat.llm.html(
  object,
  category_var_df,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2,
  methodvarimp = "Coef"
)
table.cat.llm.html(
  object,
  category_var_df,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2,
  methodvarimp = "Coef"
)

Arguments

`object`	An object of class logitleafmodel, as that created by the function llm.
`category_var_df`	dataframe containing a column called "iv" with the independent variables and a column called "cat" with the variable category names that is associated with every iv
`headertext`	Allows to provide the table with a header.
`footertext`	Allows to provide the table with a custom footer.
`roundingnumbers`	An integer stating the number of decimals in the visualization.
`methodvarimp`	Allows to determine the method to calculate the variable importance. There are 4 options: 1/ Variable coefficent (method = 'Coef) 2/ Standardized beta ('Beta') 3/ Wald statistic ('Wald') 4/ Likelihood Rate Test ('LRT')

Value

Generates HTML code for a visualization.

Author(s)

Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]

References

Examples

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <- PimaIndiansDiabetes[idtrain,]
Pimatest <- PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Define the variable categories (note: the categories are only created for demonstration)
var_cat_df <- as.data.frame(cbind(names(PimaTrain[,-c(9)]),
c("cat_a","cat_a","cat_a","cat_a","cat_b","cat_b","cat_b","cat_b")), stringsAsFactors = FALSE)
names(var_cat_df) <- c("iv", "cat")
## Save the output of the model to a html file
Pima.Viz <- table.cat.llm.html(object = Pima.llm,category_var_df= var_cat_df,
 headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <- PimaIndiansDiabetes[idtrain,]
Pimatest <- PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Define the variable categories (note: the categories are only created for demonstration)
var_cat_df <- as.data.frame(cbind(names(PimaTrain[,-c(9)]),
c("cat_a","cat_a","cat_a","cat_a","cat_b","cat_b","cat_b","cat_b")), stringsAsFactors = FALSE)
names(var_cat_df) <- c("iv", "cat")
## Save the output of the model to a html file
Pima.Viz <- table.cat.llm.html(object = Pima.llm,category_var_df= var_cat_df,
 headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")

Create the HTML code for Logit Leaf Model visualization

Description

This function generates HTML code for a visualization of the logit leaf model.

Usage

table.llm.html(
  object,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2
)
table.llm.html(
  object,
  headertext = "The Logit Leaf Model",
  footertext = "A table footer comment",
  roundingnumbers = 2
)

Arguments

`object`	An object of class logitleafmodel, as that created by the function llm.
`headertext`	Allows to provide the table with a header.
`footertext`	Allows to provide the table with a custom footer.
`roundingnumbers`	An integer stating the number of decimals in the visualization.

Value

Generates HTML code for a visualization.

Author(s)

Arno De Caigny, [email protected], Kristof Coussement, [email protected] and Koen W. De Bock, [email protected]

References

Examples

## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Save the output of the model to a html file
Pima.Viz <- table.llm.html(object = Pima.llm, headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")
## Load PimaIndiansDiabetes dataset from mlbench package
if (requireNamespace("mlbench", quietly = TRUE)) {
  library("mlbench")
}
data("PimaIndiansDiabetes")
## Split in training and test (2/3 - 1/3)
idtrain <- c(sample(1:768,512))
PimaTrain <-PimaIndiansDiabetes[idtrain,]
Pimatest <-PimaIndiansDiabetes[-idtrain,]
## Create the LLM
Pima.llm <- llm(X = PimaTrain[,-c(9)],Y = PimaTrain$diabetes,
 threshold_pruning = 0.25,nbr_obs_leaf = 100)
## Save the output of the model to a html file
Pima.Viz <- table.llm.html(object = Pima.llm, headertext = "This is an example of the LLM model",
footertext = "Enjoy the package!")
## Optionaly write it to your working directory
# write(Pima.Viz, "Visualization_LLM_on_PimaIndiansDiabetes.html")

Package 'LLM'

Help Index

Create Logit Leaf Model

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Runs v-fold cross validation with LLM

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Create Logit Leaf Model Prediction

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Create the HTML code for Logit Leaf Model visualization

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Create the HTML code for Logit Leaf Model visualization

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples