| Title: | A Smaller Version of LIME |
|---|---|
| Description: | The existing implementation of 'lime' can be quite limiting in understanding the underlying components that make Local Local interpretable model-agnostic explanations (LIME) work. 'kumquat' is a simpler implementation of 'lime' that is easier to understand and is more transparent on the pieces that come together to make LIME work. For more details on LIME, see Ribeiro, Singh, and Guestrin (2016) <doi:10.1145/2939672.2939778>. |
| Authors: | Janith Wanniarachchi [aut, cre, cph] |
| Maintainer: | Janith Wanniarachchi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-06-22 19:36:08 UTC |
| Source: | https://github.com/cran/kumquat |
A demonstration dataset containing two numeric predictors and a binary class variable. The class boundary is set to A when x < (-0.5) & y < (-0.3) or when x >= -0.5 & x < 0.4 & y < 0.1 + 0.8 * x or when x >= 0.4 and B elsewhere
data(d_multi)data(d_multi)
A tibble with 5,000 rows and 3 variables:
Numeric independent variable generated from a uniform distribution on [-1, 1]
Numeric independent variable generated from a uniform distribution on [-1, 1]
Binary categorical variable with two levels: "A" and "B"
Simple two-dimensional dataset with a complex decison boundary.
A demonstration dataset containing two numeric predictors and a binary class variable. The class boundary is set to A when x < (-0.5) & y < (-0.3) or when x >= -0.5 & x < 0.4 & y < 0.1 + 0.8 * x or when x >= 0.4 B when y < (-1) + 0.8 * x and everywhere else
data(d_multitwo)data(d_multitwo)
A tibble with 5,000 rows and 3 variables:
Numeric independent variable generated from a uniform distribution on [-1, 1]
Numeric independent variable generated from a uniform distribution on [-1, 1]
Binary categorical variable with two levels: "A" and "B"
Simple two-dimensional dataset with a combination of two decision boundaries
A demonstration dataset containing two numeric predictors and a binary class variable. The class boundary is defined based on the inequality: y > 0.1 + 1.4 * x
data(d_oblique)data(d_oblique)
A tibble with 5,000 rows and 3 variables:
Numeric independent variable generated from a uniform distribution on [-1, 1]
Numeric independent variable generated from a uniform distribution on [-1, 1]
Binary categorical variable with two levels: "A" and "B"
Simple two-dimensional dataset with an oblique decision boundary.
A demonstration dataset containing two numeric predictors and a binary class variable. The class boundary is defined by a vertical split at x = 0.3.
data(d_vertical)data(d_vertical)
A tibble with 5,000 rows and 3 variables:
Numeric independent variable generated from a uniform distribution on [-1, 1]
Numeric independent variable generated from a uniform distribution on [-1, 1]
Binary categorical variable with two levels: "A" and "B"
Simple two-dimensional dataset with a vertical decision boundary.
Fit Local Linear Model and Compute Feature Importance
fit_local_model( perturbations, predictor_vars, nfolds = 50, alpha = 1, class_names = c("A", "B") )fit_local_model( perturbations, predictor_vars, nfolds = 50, alpha = 1, class_names = c("A", "B") )
perturbations |
Data frame of perturbations with predictions |
predictor_vars |
Character vector of predictor variable names |
nfolds |
Number of folds for cross-validation (default: 50) |
alpha |
Elastic net mixing parameter (default: 1 for lasso) |
class_names |
Character vector of class names for binary classification |
A list containing glm_predictions, importances, and the fitted model
perturbations <- data.frame( x1 = c(1, 2, 3), x2 = c(4, 5, 6), pred = c("A", "A", "A") ) result <- fit_local_model( perturbations, predictor_vars = c("x1", "x2") )perturbations <- data.frame( x1 = c(1, 2, 3), x2 = c(4, 5, 6), pred = c("A", "A", "A") ) result <- fit_local_model( perturbations, predictor_vars = c("x1", "x2") )
Generate Perturbations Around a Point of Interest
generate_perturbations( data, poi, radius = 0.1, step = 0.01, predictors = names(data) )generate_perturbations( data, poi, radius = 0.1, step = 0.01, predictors = names(data) )
data |
Training data frame |
poi |
Row number of point of interest |
radius |
Perturbation radius (default: 0.1) |
step |
Step size for perturbations (default: 0.01) |
predictors |
Character vector of predictor variable names |
A data frame of perturbed points. The output contains the following properties:
Columns are predictors
Number of rows are going to be dependent on radius and step
data <- data.frame( x = 1, y = 2 ) result <- generate_perturbations( data, poi = 1, radius = 0.1, step = 0.1, predictors = c("x", "y") )data <- data.frame( x = 1, y = 2 ) result <- generate_perturbations( data, poi = 1, radius = 0.1, step = 0.1, predictors = c("x", "y") )
Complete Local Interpretation Pipeline
kumquat( model_bundle, data, pois, perturbations = NULL, radius = 0.1, step = 0.01, predictor_vars = c("x", "y"), nfolds = 50, alpha = 1, class_names = c("A", "B"), predict_func = stats::predict )kumquat( model_bundle, data, pois, perturbations = NULL, radius = 0.1, step = 0.01, predictor_vars = c("x", "y"), nfolds = 50, alpha = 1, class_names = c("A", "B"), predict_func = stats::predict )
model_bundle |
A trained model held in a bundle::bundle() |
data |
Training data |
pois |
Points of interest (row numbers) |
perturbations |
A data.frame of perturbations to be used to fit the local model |
radius |
Perturbation radius (default: 0.1) |
step |
Perturbation step size (default: 0.01) |
predictor_vars |
Character vector of predictor variable names |
nfolds |
Number of CV folds (default: 50) |
alpha |
Elastic net parameter (default: 1) |
class_names |
Character vector of class names |
predict_func |
A function that takes in two arguments: model and data and returns a vector of factors |
A list containing perturbations, predictions, and local model results
data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) )data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) )
pinch_importance() extracts feature importance values from
kumquats and combines them into a single data frame.
pinch_importance(kumquats)pinch_importance(kumquats)
kumquats |
A result from a call to |
A data.frame where each row corresponds to the feature
importances extracted from one kumquat object. Column names represent
feature names.
data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) ) imps <- pinch_importance(ks)data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) ) imps <- pinch_importance(ks)
This function generates a ggplot visualizing the perturbations of a local model, overlaying the training data and highlighting a specific point of interest (poi). The plot subtitle shows the importances of the first two features.
plot_interest(kquat)plot_interest(kquat)
kquat |
A list-like object containing at least:
|
A ggplot object
data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) ) plot_obj <- plot_interest(ks)data(d_vertical) rfmodel <- randomForest::randomForest( class ~ x + y, data = d_vertical ) # Bundle model up rfmodel_bundled <- bundle::bundle(rfmodel) ks <- kumquat( rfmodel_bundled, d_vertical, 1, class_names = unique(d_vertical$class) ) plot_obj <- plot_interest(ks)