Title: | Ceteris Paribus Profiles |
---|---|
Description: | Ceteris Paribus Profiles (What-If Plots) are designed to present model responses around selected points in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. Ceteris Paribus Plots supplement the Break Down Plots from 'breakDown' package. |
Authors: | Przemyslaw Biecek [aut, cre] |
Maintainer: | Przemyslaw Biecek <[email protected]> |
License: | GPL-2 |
Version: | 0.6 |
Built: | 2024-11-24 21:21:48 UTC |
Source: | CRAN |
Add More Layers to a Ceteris Paribus Plot
## S3 method for class 'plot_ceteris_paribus_explainer' e1 + e2
## S3 method for class 'plot_ceteris_paribus_explainer' e1 + e2
e1 |
An object of class 'plot_ceteris_paribus_explainer'. |
e2 |
A plot component |
Calculate Oscillations for Ceteris Paribus Explainer
calculate_oscillations(x, sort = TRUE, ...)
calculate_oscillations(x, sort = TRUE, ...)
x |
a ceteris_paribus explainer produced with the 'ceteris_paribus()' function |
sort |
a logical value. If TRUE then rows are sorted along the oscillations |
... |
other arguments |
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest, y = apartmentsTest$m2.price) apartment <- apartmentsTest[1,] cp_rf <- ceteris_paribus(explainer_rf, apartment) calculate_oscillations(cp_rf) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest, y = apartmentsTest$m2.price) apartment <- apartmentsTest[1,] cp_rf <- ceteris_paribus(explainer_rf, apartment) calculate_oscillations(cp_rf) ## End(Not run)
This function calculates ceteris paribus profiles, i.e. series of predictions from a model calculated for observations with altered single coordinate.
calculate_profiles( data, variable_splits, model, predict_function = predict, ... )
calculate_profiles( data, variable_splits, model, predict_function = predict, ... )
data |
set of observations. Profile will be calculated for every observation (every row) |
variable_splits |
named list of vectors. Elements of the list are vectors with points in which profiles should be calculated. See an example for more details. |
model |
a model that will be passed to the |
predict_function |
function that takes data and model and returns numeric predictions. Note that the ... arguments will be passed to this function. |
... |
other parameters that will be passed to the |
Note that calculate_profiles
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with profiles for selected variables and selected observations
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartmentsTest[1:10, ] profiles <- calculate_profiles(new_apartment, variable_splits, apartments_rf_model) profiles # only subset of observations small_apartments <- select_sample(apartmentsTest, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) small_profiles # neighbors for a selected observation new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) new_apartment small_profiles ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartmentsTest[1:10, ] profiles <- calculate_profiles(new_apartment, variable_splits, apartments_rf_model) profiles # only subset of observations small_apartments <- select_sample(apartmentsTest, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) small_profiles # neighbors for a selected observation new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) new_apartment small_profiles ## End(Not run)
This function Local Conditional Expectation profiles
calculate_profiles_lce( data, variable_splits, model, dataset, predict_function = predict, ... )
calculate_profiles_lce( data, variable_splits, model, dataset, predict_function = predict, ... )
data |
set of observations. Profile will be calculated for every observation (every row) |
variable_splits |
named list of vectors. Elements of the list are vectors with points in which profiles should be calculated. See an example for more details. |
model |
a model that will be passed to the |
dataset |
a data.frame, usually training data of a model, used for calculation of LCE profiles |
predict_function |
function that takes data and model and returns numeric predictions. Note that the ... arguments will be passed to this function. |
... |
other parameters that will be passed to the |
Note that calculate_profiles_lce
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with profiles for selected variables and selected observations
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments[,2:6], y = apartments$m2.price) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartments[1, ] profiles <- calculate_profiles_lce(new_apartment, variable_splits, apartments_rf_model, explainer_rf$data) profiles ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments[,2:6], y = apartments$m2.price) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartments[1, ] profiles <- calculate_profiles_lce(new_apartment, variable_splits, apartments_rf_model, explainer_rf$data) profiles ## End(Not run)
This function calculate candidate splits for each selected variable. For numerical variables splits are calculated as percentiles (in general uniform quantiles of the length grid_points). For all other variables splits are calculated as unique values.
calculate_variable_splits( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles" )
calculate_variable_splits( data, variables = colnames(data), grid_points = 101, variable_splits_type = "quantiles" )
data |
validation dataset. Is used to determine distribution of observations. |
variables |
names of variables for which splits shall be calculated |
grid_points |
number of points used for response path |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
Note that calculate_variable_splits
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
A named list with splits for selected variables
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") calculate_variable_splits(apartments, vars) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") calculate_variable_splits(apartments, vars) ## End(Not run)
This function calculate ceteris paribus profiles for selected data points.
ceteris_paribus( explainer, observations, y = NULL, variable_splits = NULL, variable_splits_type = "quantiles", variables = NULL, grid_points = 101 )
ceteris_paribus( explainer, observations, y = NULL, variable_splits = NULL, variable_splits_type = "quantiles", variables = NULL, grid_points = 101 )
explainer |
a model to be explained, preprocessed by function 'DALEX::explain()'. |
observations |
set of observarvation for which profiles are to be calculated |
y |
true labels for 'observations'. If specified then will be added to ceteris paribus plots. |
variable_splits |
named list of splits for variables, in most cases created with 'calculate_variable_splits()'. If NULL then it will be calculated based on validation data avaliable in the 'explainer'. |
variable_splits_type |
how variable grids shall be calculated? Use "quantiles" (default) for percentiles or "uniform" to get uniform grid of points |
variables |
names of variables for which profiles shall be calculated. Will be passed to 'calculate_variable_splits()'. If NULL then all variables from the validation data will be used. |
grid_points |
number of points for profile. Will be passed to 'calculate_variable_splits()'. |
An object of the class 'ceteris_paribus_explainer'. It's a data frame with calculated average responses.
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- select_sample(apartmentsTest, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf cp_rf <- ceteris_paribus(explainer_rf, apartments_small, y = apartments_small$m2.price) cp_rf ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- select_sample(apartmentsTest, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf cp_rf <- ceteris_paribus(explainer_rf, apartments_small, y = apartments_small$m2.price) cp_rf ## End(Not run)
Function 'ceteris_paribus_layer()' adds a layer to a plot created with 'plot.ceteris_paribus_explainer()' plots. Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
ceteris_paribus_layer( x, ..., size = 1, alpha = 0.3, color = "black", size_points = 2, alpha_points = 1, color_points = color, size_rugs = 0.5, alpha_rugs = 1, color_rugs = color, size_residuals = 1, alpha_residuals = 1, color_residuals = color, only_numerical = TRUE, show_profiles = TRUE, show_observations = TRUE, show_rugs = FALSE, show_residuals = FALSE, aggregate_profiles = NULL, as.gg = FALSE, facet_ncol = NULL, selected_variables = NULL, init_plot = FALSE )
ceteris_paribus_layer( x, ..., size = 1, alpha = 0.3, color = "black", size_points = 2, alpha_points = 1, color_points = color, size_rugs = 0.5, alpha_rugs = 1, color_rugs = color, size_residuals = 1, alpha_residuals = 1, color_residuals = color, only_numerical = TRUE, show_profiles = TRUE, show_observations = TRUE, show_rugs = FALSE, show_residuals = FALSE, aggregate_profiles = NULL, as.gg = FALSE, facet_ncol = NULL, selected_variables = NULL, init_plot = FALSE )
x |
a ceteris paribus explainer produced with function 'ceteris_paribus()' |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between 0 and 1. Opacity of lines |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
size_points |
a numeric. Size of points to be plotted |
alpha_points |
a numeric between 0 and 1. Opacity of points |
color_points |
a character. Either name of a color or name of a variable that should be used for coloring |
size_rugs |
a numeric. Size of rugs to be plotted |
alpha_rugs |
a numeric between 0 and 1. Opacity of rugs |
color_rugs |
a character. Either name of a color or name of a variable that should be used for coloring |
size_residuals |
a numeric. Size of line and points to be plotted for residuals |
alpha_residuals |
a numeric between 0 and 1. Opacity of points and lines for residuals |
color_residuals |
a character. Either name of a color or name of a variable that should be used for coloring for residuals |
only_numerical |
a logical. If TRUE then only numerical variables will be plotted. If FALSE then only categorical variables will be plotted. |
show_profiles |
a logical. If TRUE then profiles will be plotted. Either individual or aggregate (see 'aggregate_profiles') |
show_observations |
a logical. If TRUE then individual observations will be marked as points |
show_rugs |
a logical. If TRUE then individual observations will be marked as rugs |
show_residuals |
a logical. If TRUE then residuals will be plotted as a line ended with a point |
aggregate_profiles |
function. If NULL (default) then individual profiles will be plotted. If a function (e.g. mean or median) then profiles will be aggregated and only the aggregate profile will be plotted |
as.gg |
if TRUE then returning plot will have gg class |
facet_ncol |
number of columns for the 'facet_wrap()'. |
selected_variables |
if not NULL then only 'selected_variables' will be presented |
init_plot |
technical parameter, do not use. |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small_1 <- apartmentsTest[1,] apartments_small_2 <- select_sample(apartmentsTest, n = 20) apartments_small_3 <- select_neighbours(apartmentsTest, apartments_small_1, n = 20) cp_rf_y1 <- ceteris_paribus(explainer_rf, apartments_small_1, y = apartments_small_1$m2.price) cp_rf_y2 <- ceteris_paribus(explainer_rf, apartments_small_2, y = apartments_small_2$m2.price) cp_rf_y3 <- ceteris_paribus(explainer_rf, apartments_small_3, y = apartments_small_3$m2.price) tmp <- plot(cp_rf_y3, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.2, color_residuals = "darkred", selected_variables = c("construction.year", "surface")) tmp <- plot(cp_rf_y3, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.2, color_residuals = "darkred") tmp tmp + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, alpha = 0.2, color = "darkblue") tmp + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, alpha = 0.2, color = "darkblue") + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = FALSE, alpha = 1, size = 2, color = "blue", aggregate_profiles = mean) + ceteris_paribus_layer(cp_rf_y1, show_profiles = TRUE, show_observations = FALSE, alpha = 1, size = 2, color = "red", aggregate_profiles = mean) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small_1 <- apartmentsTest[1,] apartments_small_2 <- select_sample(apartmentsTest, n = 20) apartments_small_3 <- select_neighbours(apartmentsTest, apartments_small_1, n = 20) cp_rf_y1 <- ceteris_paribus(explainer_rf, apartments_small_1, y = apartments_small_1$m2.price) cp_rf_y2 <- ceteris_paribus(explainer_rf, apartments_small_2, y = apartments_small_2$m2.price) cp_rf_y3 <- ceteris_paribus(explainer_rf, apartments_small_3, y = apartments_small_3$m2.price) tmp <- plot(cp_rf_y3, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.2, color_residuals = "darkred", selected_variables = c("construction.year", "surface")) tmp <- plot(cp_rf_y3, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.2, color_residuals = "darkred") tmp tmp + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, alpha = 0.2, color = "darkblue") tmp + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, alpha = 0.2, color = "darkblue") + ceteris_paribus_layer(cp_rf_y2, show_profiles = TRUE, show_observations = FALSE, alpha = 1, size = 2, color = "blue", aggregate_profiles = mean) + ceteris_paribus_layer(cp_rf_y1, show_profiles = TRUE, show_observations = FALSE, alpha = 1, size = 2, color = "red", aggregate_profiles = mean) ## End(Not run)
This explainer works for individual observations. For each observation it calculates Local Conditional Expectation (LCE) profiles for selected variables.
local_conditional_expectations( explainer, observations, y = NULL, variable_splits = NULL, variables = NULL, grid_points = 101 )
local_conditional_expectations( explainer, observations, y = NULL, variable_splits = NULL, variables = NULL, grid_points = 101 )
explainer |
a model to be explained, preprocessed by function 'DALEX::explain()'. |
observations |
set of observarvation for which profiles are to be calculated |
y |
true labels for 'observations'. If specified then will be added to local conditional expectations plots. |
variable_splits |
named list of splits for variables, in most cases created with 'calculate_variable_splits()'. If NULL then it will be calculated based on validation data avaliable in the 'explainer'. |
variables |
names of variables for which profiles shall be calculated. Will be passed to 'calculate_variable_splits()'. If NULL then all variables from the validation data will be used. |
grid_points |
number of points for profile. Will be passed to 'calculate_variable_splits()'. |
An object of the class 'ceteris_paribus_explainer'. A data frame with calculated LCE profiles.
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments[,2:6], y = apartments$m2.price) new_apartment <- apartments[1, ] cp_rf <- ceteris_paribus(explainer_rf, new_apartment) lce_rf <- local_conditional_expectations(explainer_rf, new_apartment) lce_rf lce_rf <- local_conditional_expectations(explainer_rf, new_apartment, y = new_apartment$m2.price) lce_rf # Plot LCE sel_vars <- c("surface", "no.rooms") plot(lce_rf, selected_variables = sel_vars) # Compare ceteris paribus profiles with LCE profiles plot(cp_rf, selected_variables = sel_vars) + ceteris_paribus_layer(lce_rf, selected_variables = sel_vars, color = "red") ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartments[,2:6], y = apartments$m2.price) new_apartment <- apartments[1, ] cp_rf <- ceteris_paribus(explainer_rf, new_apartment) lce_rf <- local_conditional_expectations(explainer_rf, new_apartment) lce_rf lce_rf <- local_conditional_expectations(explainer_rf, new_apartment, y = new_apartment$m2.price) lce_rf # Plot LCE sel_vars <- c("surface", "no.rooms") plot(lce_rf, selected_variables = sel_vars) # Compare ceteris paribus profiles with LCE profiles plot(cp_rf, selected_variables = sel_vars) + ceteris_paribus_layer(lce_rf, selected_variables = sel_vars, color = "red") ## End(Not run)
Local Fit / Wangkardu Explanations
local_fit( explainer, observation, selected_variable, grid_points = 101, select_points = 0.1 )
local_fit( explainer, observation, selected_variable, grid_points = 101, select_points = 0.1 )
explainer |
a model to be explained, preprocessed by the 'DALEX::explain' function |
observation |
a new observation for which predictions need to be explained |
selected_variable |
variable to be presented in the local fit plot |
grid_points |
number of points used for response path |
select_points |
fraction of points from validation data to be presented in local fit plots |
An object of the class 'local_fit_explainer'. It's a data frame with calculated average responses.
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") cr_rf ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") cr_rf ## End(Not run)
Function 'plot.ceteris_paribus_explainer' plots Ceteris Paribus Plots for selected observations. Various parameters help to decide what should be plotted, profiles, aggregated profiles, points or rugs.
## S3 method for class 'ceteris_paribus_explainer' plot( x, ..., size = 1, alpha = 0.3, color = "black", size_points = 2, alpha_points = 1, color_points = color, size_rugs = 0.5, alpha_rugs = 1, color_rugs = color, size_residuals = 1, alpha_residuals = 1, color_residuals = color, only_numerical = TRUE, show_profiles = TRUE, show_observations = TRUE, show_rugs = FALSE, show_residuals = FALSE, aggregate_profiles = NULL, as.gg = FALSE, facet_ncol = NULL, selected_variables = NULL )
## S3 method for class 'ceteris_paribus_explainer' plot( x, ..., size = 1, alpha = 0.3, color = "black", size_points = 2, alpha_points = 1, color_points = color, size_rugs = 0.5, alpha_rugs = 1, color_rugs = color, size_residuals = 1, alpha_residuals = 1, color_residuals = color, only_numerical = TRUE, show_profiles = TRUE, show_observations = TRUE, show_rugs = FALSE, show_residuals = FALSE, aggregate_profiles = NULL, as.gg = FALSE, facet_ncol = NULL, selected_variables = NULL )
x |
a ceteris paribus explainer produced with function 'ceteris_paribus()' |
... |
other explainers that shall be plotted together |
size |
a numeric. Size of lines to be plotted |
alpha |
a numeric between 0 and 1. Opacity of lines |
color |
a character. Either name of a color or name of a variable that should be used for coloring |
size_points |
a numeric. Size of points to be plotted |
alpha_points |
a numeric between 0 and 1. Opacity of points |
color_points |
a character. Either name of a color or name of a variable that should be used for coloring |
size_rugs |
a numeric. Size of rugs to be plotted |
alpha_rugs |
a numeric between 0 and 1. Opacity of rugs |
color_rugs |
a character. Either name of a color or name of a variable that should be used for coloring |
size_residuals |
a numeric. Size of line and points to be plotted for residuals |
alpha_residuals |
a numeric between 0 and 1. Opacity of points and lines for residuals |
color_residuals |
a character. Either name of a color or name of a variable that should be used for coloring for residuals |
only_numerical |
a logical. If TRUE then only numerical variables will be plotted. If FALSE then only categorical variables will be plotted. |
show_profiles |
a logical. If TRUE then profiles will be plotted. Either individual or aggregate (see 'aggregate_profiles') |
show_observations |
a logical. If TRUE then individual observations will be marked as points |
show_rugs |
a logical. If TRUE then individual observations will be marked as rugs |
show_residuals |
a logical. If TRUE then residuals will be plotted as a line ended with a point |
aggregate_profiles |
function. If NULL (default) then individual profiles will be plotted. If a function (e.g. mean or median) then profiles will be aggregated and only the aggregate profile will be plotted |
as.gg |
if TRUE then returning plot will have gg class |
facet_ncol |
number of columns for the 'facet_wrap()' |
selected_variables |
if not NULL then only 'selected_variables' will be presented |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- apartmentsTest[1:20,] apartments_small_1 <- apartmentsTest[1,] apartments_small_2 <- select_sample(apartmentsTest, n = 20) apartments_small_3 <- select_neighbours(apartmentsTest, apartments_small_1, n = 20) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf_1 <- ceteris_paribus(explainer_rf, apartments_small_1) cp_rf_2 <- ceteris_paribus(explainer_rf, apartments_small_2) cp_rf_3 <- ceteris_paribus(explainer_rf, apartments_small_3) cp_rf cp_rf_y <- ceteris_paribus(explainer_rf, apartments_small, y = apartments_small$m2.price) cp_rf_y1 <- ceteris_paribus(explainer_rf, apartments_small_1, y = apartments_small_1$m2.price) cp_rf_y2 <- ceteris_paribus(explainer_rf, apartments_small_2, y = apartments_small_2$m2.price) cp_rf_y3 <- ceteris_paribus(explainer_rf, apartments_small_3, y = apartments_small_3$m2.price) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.3, alpha_points = 1, alpha_residuals = 0.5, size_points = 2, size_rugs = 0.5) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", selected_variables = c("construction.year", "surface"), alpha = 0.3, alpha_points = 1, alpha_residuals = 0.5, size_points = 2, size_rugs = 0.5) plot(cp_rf_y1, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.5, size_points = 3, alpha_points = 1, size_rugs = 0.5) plot(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, alpha = 0.2, alpha_points = 1, size_rugs = 0.5) plot(cp_rf_y3, show_profiles = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.2, color_residuals = "orange", size_rugs = 0.5) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, size_rugs = 0.5, show_residuals = TRUE, alpha = 0.5, color = "surface", as.gg = TRUE) + scale_color_gradient(low = "darkblue", high = "darkred") plot(cp_rf_y1, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.5, color = "surface", size_points = 3) plot(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, size = 0.5, alpha = 0.5, color = "surface") plot(cp_rf_y, show_profiles = TRUE, show_rugs = TRUE, size_rugs = 0.5, show_residuals = FALSE, aggregate_profiles = mean, color = "darkblue") ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- apartmentsTest[1:20,] apartments_small_1 <- apartmentsTest[1,] apartments_small_2 <- select_sample(apartmentsTest, n = 20) apartments_small_3 <- select_neighbours(apartmentsTest, apartments_small_1, n = 20) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf_1 <- ceteris_paribus(explainer_rf, apartments_small_1) cp_rf_2 <- ceteris_paribus(explainer_rf, apartments_small_2) cp_rf_3 <- ceteris_paribus(explainer_rf, apartments_small_3) cp_rf cp_rf_y <- ceteris_paribus(explainer_rf, apartments_small, y = apartments_small$m2.price) cp_rf_y1 <- ceteris_paribus(explainer_rf, apartments_small_1, y = apartments_small_1$m2.price) cp_rf_y2 <- ceteris_paribus(explainer_rf, apartments_small_2, y = apartments_small_2$m2.price) cp_rf_y3 <- ceteris_paribus(explainer_rf, apartments_small_3, y = apartments_small_3$m2.price) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", alpha = 0.3, alpha_points = 1, alpha_residuals = 0.5, size_points = 2, size_rugs = 0.5) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_residuals = TRUE, color = "black", selected_variables = c("construction.year", "surface"), alpha = 0.3, alpha_points = 1, alpha_residuals = 0.5, size_points = 2, size_rugs = 0.5) plot(cp_rf_y1, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.5, size_points = 3, alpha_points = 1, size_rugs = 0.5) plot(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, alpha = 0.2, alpha_points = 1, size_rugs = 0.5) plot(cp_rf_y3, show_profiles = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.2, color_residuals = "orange", size_rugs = 0.5) plot(cp_rf_y, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, size_rugs = 0.5, show_residuals = TRUE, alpha = 0.5, color = "surface", as.gg = TRUE) + scale_color_gradient(low = "darkblue", high = "darkred") plot(cp_rf_y1, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, show_residuals = TRUE, alpha = 0.5, color = "surface", size_points = 3) plot(cp_rf_y2, show_profiles = TRUE, show_observations = TRUE, show_rugs = TRUE, size = 0.5, alpha = 0.5, color = "surface") plot(cp_rf_y, show_profiles = TRUE, show_rugs = TRUE, size_rugs = 0.5, show_residuals = FALSE, aggregate_profiles = mean, color = "darkblue") ## End(Not run)
Function 'plot.ceteris_paribus_oscillations' plots variable importance plots.
## S3 method for class 'ceteris_paribus_oscillations' plot(x, ...)
## S3 method for class 'ceteris_paribus_oscillations' plot(x, ...)
x |
a ceteris paribus oscillation explainer produced with function 'calculate_oscillations()' |
... |
other explainers that shall be plotted together |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest, y = apartmentsTest$m2.price) apartment <- apartmentsTest[1:2,] cp_rf <- ceteris_paribus(explainer_rf, apartment) plot(cp_rf, color = "_ids_") vips <- calculate_oscillations(cp_rf) vips plot(vips) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest, y = apartmentsTest$m2.price) apartment <- apartmentsTest[1:2,] cp_rf <- ceteris_paribus(explainer_rf, apartment) plot(cp_rf, color = "_ids_") vips <- calculate_oscillations(cp_rf) vips plot(vips) ## End(Not run)
Function 'plot.local_fit_explainer' plots Local Fit Plots for a single prediction / observation.
## S3 method for class 'local_fit_explainer' plot(x, ..., plot_residuals = TRUE, palette = "default")
## S3 method for class 'local_fit_explainer' plot(x, ..., plot_residuals = TRUE, palette = "default")
x |
a local fir explainer produced with the 'local_fit' function |
... |
other explainers that shall be plotted together |
plot_residuals |
if TRUE (default) then residuals are plotted as red/blue bars |
palette |
color palette. Currently the choice is limited to 'wangkardu' and 'default' |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE, palette = "wangkardu") plot(cr_rf, palette = "wangkardu") new_apartment <- apartmentsTest[10, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) new_apartment <- apartmentsTest[302, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) new_apartment <- apartmentsTest[720, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE, palette = "wangkardu") plot(cr_rf, palette = "wangkardu") new_apartment <- apartmentsTest[10, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) new_apartment <- apartmentsTest[302, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) new_apartment <- apartmentsTest[720, ] cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") plot(cr_rf, plot_residuals = FALSE) plot(cr_rf) ## End(Not run)
Function 'plot.what_if_2d_explainer' plots What-If Plots for a single prediction / observation.
## S3 method for class 'what_if_2d_explainer' plot( x, ..., split_ncol = NULL, add_raster = TRUE, add_contour = TRUE, add_observation = TRUE, bins = 3 )
## S3 method for class 'what_if_2d_explainer' plot( x, ..., split_ncol = NULL, add_raster = TRUE, add_contour = TRUE, add_observation = TRUE, bins = 3 )
x |
a ceteris paribus explainer produced with the 'what_if_2d' function |
... |
currently will be ignored |
split_ncol |
number of columns for the 'facet_wrap' |
add_raster |
if TRUE then 'geom_raster' will be added to present levels with diverging colors |
add_contour |
if TRUE then 'geom_contour' will be added to present contours |
add_observation |
if TRUE then 'geom_point' will be added to present observation that is explained |
bins |
number of contours to be added |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf_2d <- what_if_2d(explainer_rf, observation = new_apartment) wi_rf_2d plot(wi_rf_2d) plot(wi_rf_2d, add_contour = FALSE) plot(wi_rf_2d, add_observation = FALSE) plot(wi_rf_2d, add_raster = FALSE) # HR data model <- randomForest(status ~ gender + age + hours + evaluation + salary, data = HR) pred1 <- function(m, x) predict(m, x, type = "prob")[,1] explainer_rf_fired <- explain(model, data = HR[,1:5], y = HR$status == "fired", predict_function = pred1, label = "fired") new_emp <- HR[1, ] new_emp wi_rf_2d <- what_if_2d(explainer_rf_fired, observation = new_emp) wi_rf_2d plot(wi_rf_2d) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf_2d <- what_if_2d(explainer_rf, observation = new_apartment) wi_rf_2d plot(wi_rf_2d) plot(wi_rf_2d, add_contour = FALSE) plot(wi_rf_2d, add_observation = FALSE) plot(wi_rf_2d, add_raster = FALSE) # HR data model <- randomForest(status ~ gender + age + hours + evaluation + salary, data = HR) pred1 <- function(m, x) predict(m, x, type = "prob")[,1] explainer_rf_fired <- explain(model, data = HR[,1:5], y = HR$status == "fired", predict_function = pred1, label = "fired") new_emp <- HR[1, ] new_emp wi_rf_2d <- what_if_2d(explainer_rf_fired, observation = new_emp) wi_rf_2d plot(wi_rf_2d) ## End(Not run)
Function 'plot.what_if_explainer' plots What-If Plots for a single prediction / observation.
## S3 method for class 'what_if_explainer' plot( x, ..., quantiles = TRUE, split = "models", split_ncol = NULL, color = "variables" )
## S3 method for class 'what_if_explainer' plot( x, ..., quantiles = TRUE, split = "models", split_ncol = NULL, color = "variables" )
x |
a ceteris paribus explainer produced with the 'what_if' function |
... |
other explainers that shall be plotted together |
quantiles |
if TRUE (default) then quantiles will be presented on OX axis. If FALSE then original values will be presented on OX axis |
split |
a character, either 'models' or 'variables'. Sets the variable for faceting |
split_ncol |
number of columns for the 'facet_wrap' |
color |
a character, either 'models' or 'variables'. Sets the variable for coloring |
a ggplot2 object
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf <- what_if(explainer_rf, observation = new_apartment) wi_rf plot(wi_rf, split = "variables", color = "variables") plot(wi_rf) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf <- what_if(explainer_rf, observation = new_apartment) wi_rf plot(wi_rf, split = "variables", color = "variables") plot(wi_rf) ## End(Not run)
Print Ceteris Paribus Explainer Summary
## S3 method for class 'ceteris_paribus_explainer' print(x, ...)
## S3 method for class 'ceteris_paribus_explainer' print(x, ...)
x |
a ceteris_paribus explainer produced with the 'ceteris_paribus()' function |
... |
other arguments that will be passed to 'head()' |
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- select_sample(apartmentsTest, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) apartments_small <- select_sample(apartmentsTest, 10) cp_rf <- ceteris_paribus(explainer_rf, apartments_small) cp_rf ## End(Not run)
Print Ceteris Paribus Profiles
## S3 method for class 'ceteris_paribus_profile' print(x, ...)
## S3 method for class 'ceteris_paribus_profile' print(x, ...)
x |
a ceteris paribus profile produced with the 'calculate_profiles' function |
... |
other arguments that will be passed to head() |
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartmentsTest[1:10, ] profiles <- calculate_profiles(new_apartment, variable_splits, apartments_rf_model) profiles # only subset of observations small_apartments <- select_sample(apartmentsTest, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) small_profiles # neighbors for a selected observation new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) new_apartment small_profiles ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) vars <- c("construction.year", "surface", "floor", "no.rooms", "district") variable_splits <- calculate_variable_splits(apartments, vars) new_apartment <- apartmentsTest[1:10, ] profiles <- calculate_profiles(new_apartment, variable_splits, apartments_rf_model) profiles # only subset of observations small_apartments <- select_sample(apartmentsTest, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) small_profiles # neighbors for a selected observation new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) small_apartments small_profiles <- calculate_profiles(small_apartments, variable_splits, apartments_rf_model) new_apartment small_profiles ## End(Not run)
Prints Local Fit / Wangkardu Summary
## S3 method for class 'local_fit_explainer' print(x, ...)
## S3 method for class 'local_fit_explainer' print(x, ...)
x |
a local fit explainer produced with the 'local_fit' function |
... |
other arguments that will be passed to 'head' function |
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") cr_rf ## End(Not run)
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment cr_rf <- local_fit(explainer_rf, observation = new_apartment, select_points = 0.002, selected_variable = "surface") cr_rf ## End(Not run)
See more examples in the ceteris_paribus_layer
function
## S3 method for class 'plot_ceteris_paribus_explainer' print(x, ...)
## S3 method for class 'plot_ceteris_paribus_explainer' print(x, ...)
x |
a plot_ceteris_paribus_explainer object to plot |
... |
other arguments that will be passed to 'print.ggplot()' |
Print What If 2D Explainer Summary
## S3 method for class 'what_if_2d_explainer' print(x, ...)
## S3 method for class 'what_if_2d_explainer' print(x, ...)
x |
a what_if_2d explainer produced with the 'what_if_2d' function |
... |
other arguments that will be passed to head() |
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment ## End(Not run)
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment ## End(Not run)
Print What If Explainer Summary
## S3 method for class 'what_if_explainer' print(x, ...)
## S3 method for class 'what_if_explainer' print(x, ...)
x |
a what_if explainer produced with the 'what_if' function |
... |
other arguments that will be passed to head() |
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment ## End(Not run)
library("DALEX") ## Not run: library("randomForest") apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment ## End(Not run)
This function selects subset of rows from data set. This is useful if data is large and we need just a sample to calculate profiles.
select_neighbours( data, observation, variables = NULL, distance = gower::gower_dist, n = 20, frac = NULL )
select_neighbours( data, observation, variables = NULL, distance = gower::gower_dist, n = 20, frac = NULL )
data |
set of observations |
observation |
single observation |
variables |
variables that shall be used for calculation of distance. By default these are all variables present in 'data' and 'observation' |
distance |
distance function, by default the 'gower_dist' function. |
n |
number of neighbors to select |
frac |
if 'n' is not specified (NULL), then will be calculated as 'frac' * number of rows in 'data'. Either 'n' or 'frac' need to be specified. |
Note that select_neighbours
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with selected rows
library("DALEX") new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) new_apartment small_apartments
library("DALEX") new_apartment <- apartments[1, 2:6] small_apartments <- select_neighbours(apartmentsTest, new_apartment, n = 10) new_apartment small_apartments
This function selects subset of rows from data set. This is useful if data is large and we need just a sample to calculate profiles.
select_sample(data, n = 100, seed = 1313)
select_sample(data, n = 100, seed = 1313)
data |
set of observations. Profile will be calculated for every observation (every row) |
n |
named list of vectors. Elements of the list are vectors with points in which profiles should be calculated. See an example for more details. |
seed |
seed for random number generator. |
Note that select_subsample
function is S3 generic.
If you want to work on non standard data sources (like H2O ddf, external databases)
you should overload it.
a data frame with selected rows
library("DALEX") small_apartments <- select_sample(apartmentsTest) head(small_apartments)
library("DALEX") small_apartments <- select_sample(apartmentsTest) head(small_apartments)
What-If Plot
what_if(explainer, observation, grid_points = 101, selected_variables = NULL)
what_if(explainer, observation, grid_points = 101, selected_variables = NULL)
explainer |
a model to be explained, preprocessed by the 'DALEX::explain' function |
observation |
a new observarvation for which predictions need to be explained |
grid_points |
number of points used for response path |
selected_variables |
if specified, then only these variables will be explained |
An object of the class 'what_if_explainer'. It's a data frame with calculated average responses.
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf <- what_if(explainer_rf, observation = new_apartment) wi_rf wi_rf <- what_if(explainer_rf, observation = new_apartment, selected_variables = c("surface", "floor", "no.rooms")) wi_rf ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf <- what_if(explainer_rf, observation = new_apartment) wi_rf wi_rf <- what_if(explainer_rf, observation = new_apartment, selected_variables = c("surface", "floor", "no.rooms")) wi_rf ## End(Not run)
This function calculates what if scores for grid of values spanned by two variables.
what_if_2d( explainer, observation, grid_points = 101, selected_variables = NULL )
what_if_2d( explainer, observation, grid_points = 101, selected_variables = NULL )
explainer |
a model to be explained, preprocessed by the 'DALEX::explain' function |
observation |
a new observarvation for which predictions need to be explained |
grid_points |
number of points used for response path. Will be used for both variables |
selected_variables |
if specified, then only these variables will be explained |
An object of the class 'what_if_2d_explainer'. It's a data frame with calculated average responses.
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf_2d <- what_if_2d(explainer_rf, observation = new_apartment, selected_variables = c("surface", "floor", "no.rooms")) wi_rf_2d plot(wi_rf_2d) ## End(Not run)
library("DALEX") ## Not run: library("randomForest") set.seed(59) apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments) explainer_rf <- explain(apartments_rf_model, data = apartmentsTest[,2:6], y = apartmentsTest$m2.price) new_apartment <- apartmentsTest[1, ] new_apartment wi_rf_2d <- what_if_2d(explainer_rf, observation = new_apartment, selected_variables = c("surface", "floor", "no.rooms")) wi_rf_2d plot(wi_rf_2d) ## End(Not run)