Title: | Fairness Auditing and Debiasing for 'mlr3' |
---|---|
Description: | Integrates fairness auditing and bias mitigation methods for the 'mlr3' ecosystem. This includes fairness metrics, reporting tools, visualizations and bias mitigation techniques such as "Reweighing" described in 'Kamiran, Calders' (2012) <doi:10.1007/s10115-011-0463-8> and "Equalized Odds" described in 'Hardt et al.' (2016) <https://papers.nips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf>. Integration with 'mlr3' allows for auditing of ML models as well as convenient joint tuning of machine learning algorithms and debiasing methods. |
Authors: | Florian Pfisterer [cre, aut] , Wei Siyi [aut], Michel Lang [aut] |
Maintainer: | Florian Pfisterer <[email protected]> |
License: | LGPL-3 |
Version: | 0.3.2 |
Built: | 2024-11-11 07:28:42 UTC |
Source: | CRAN |
Dataset used to predict whether income exceeds $50K/yr based on census data.
Also known as "Census Income" dataset
Train dataset contains 13 features and 30178 observations.
Test dataset contains 13 features and 15315 observations.
Target column is "target": A binary factor where 1: <=50K and 2: >50K for annual income.
The column "sex"
is set as protected attribute.
adult_train
: Original train split for the adult task available at UCI.
adult_test
: Original test split for the adult task available at UCI.
The adult dataset has several known limitations such as its age, limited documentation, and outdated feature encodings (Ding et al., 2021). Furthermore, the selected threshold (income <=50K) has strong implications on the outcome of analysis, such that "In many cases, the $50k threshold understates and misrepresents the broader picture" (Ding et al., 2021). As a result, conclusions w.r.t. real-world implications are severely limited.
We decide to replicate the dataset here, as it is a widely used benchmark dataset and it can still serve this purpose.
fnlwgt
Remove final weight, which is the number of people the census believes the entry represents
native-country
Remove Native Country, which is the country of origin for an individual
Rows containing NA
in workclass and occupation have been removed.
Pre-processing inspired by article: @url https://cseweb.ucsd.edu//classes/sp15/cse190-c/reports/sp15/048.pdf
(integer) age: The age of the individuals
(factor) workclass: A general term to represent the employment status of an individual
(factor) education: The highest level of education achieved by an individual.
(integer) education_num: the highest level of education achieved in numerical form.
(factor) marital_status: marital status of an individual.
(factor) occupation: the general type of occupation of an individual
(factor) relationship: whether the individual is in a relationship-
(factor) race: Descriptions of an individual’s race
(factor) sex: the biological sex of the individual
(integer) captain-gain: capital gains for an individual
(integer) captain-loss: capital loss for an individual
(integer) hours-per-week: the hours an individual has reported to work per week
(factor) target: whether or not an individual makes more than $50,000 annually
Dua, Dheeru, Graff, Casey (2017). “UCI Machine Learning Repository.” http://archive.ics.uci.edu/ml/. Ding, Frances, Hardt, Moritz, Miller, John, Schmidt, Ludwig (2021). “Retiring adult: New datasets for fair machine learning.” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
library("mlr3") data("adult_test", package = "mlr3fairness") data("adult_train", package = "mlr3fairness")
library("mlr3") data("adult_test", package = "mlr3fairness") data("adult_train", package = "mlr3fairness")
Compare learners with respect to to one or multiple metrics. Metrics can but be but are not limited to fairness metrics.
compare_metrics(object, ...)
compare_metrics(object, ...)
object |
(PredictionClassif | BenchmarkResult | ResampleResult)
|
... |
The arguments to be passed to methods, such as:
|
A 'ggplot2' object.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") library("mlr3learners") # Setup the Fairness Measures and tasks task = tsk("adult_train")$filter(1:500) learner = lrn("classif.ranger", predict_type = "prob") learner$train(task) predictions = learner$predict(task) design = benchmark_grid( tasks = task, learners = lrns(c("classif.ranger", "classif.rpart"), predict_type = "prob", predict_sets = c("train", "predict")), resamplings = rsmps("cv", folds = 3) ) bmr = benchmark(design) fairness_measure = msr("fairness.tpr") fairness_measures = msrs(c("fairness.tpr", "fairness.fnr", "fairness.acc")) # Predictions compare_metrics(predictions, fairness_measure, task) compare_metrics(predictions, fairness_measures, task) # BenchmarkResult and ResamplingResult compare_metrics(bmr, fairness_measure) compare_metrics(bmr, fairness_measures)
library("mlr3") library("mlr3learners") # Setup the Fairness Measures and tasks task = tsk("adult_train")$filter(1:500) learner = lrn("classif.ranger", predict_type = "prob") learner$train(task) predictions = learner$predict(task) design = benchmark_grid( tasks = task, learners = lrns(c("classif.ranger", "classif.rpart"), predict_type = "prob", predict_sets = c("train", "predict")), resamplings = rsmps("cv", folds = 3) ) bmr = benchmark(design) fairness_measure = msr("fairness.tpr") fairness_measures = msrs(c("fairness.tpr", "fairness.fnr", "fairness.acc")) # Predictions compare_metrics(predictions, fairness_measure, task) compare_metrics(predictions, fairness_measures, task) # BenchmarkResult and ResamplingResult compare_metrics(bmr, fairness_measure) compare_metrics(bmr, fairness_measures)
The COMPAS dataset includes the processed COMPAS data between 2013-2014.
The data cleaning process followed the guidance in the original COMPAS repo.
Contains 6172 observations and 14 features.
The target column could either be "is_recid" or "two_year_recid", but often "two_year_recid" is prefered.
The column "sex"
is set as protected attribute, but more often "race"
is used.
Derived tasks:
compas
: A classification task for the compas data set with the protected attribute 'sex'.
compas_race_binary
: A classification task for the compas data set with the protected attribute 'race'.
The observations have been filtered, keeping only observations with race
"Caucasian"
and "African-American"
. The protected attribute has been set
to "race"
.
R6::R6Class inheriting from TaskClassif.
R6::R6Class inheriting from TaskClassif.
The COMPAS dataset was collected as part of the ProPublica analysis of machine bias in criminal sentencing. It is important to note, that using COMPAS is generally discouraged for the following reasons:
The prediction task derived from this dataset has little connection to actually relevant tasks in the context of risk assessment instruments.
Collected data and labels suffer from disparate measurement bias.
The dataset should therefore not be used to benchmark new fairness algorithms or measures. For a more in-depth treatment, see Bao et al., 2021: It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. We replicate the dataset here to raise awareness for this issue. Furthermore, similar issues exist across a wide variety of datasets widely used in the context of fairness auditing and we, therefore, consider issues, e.g. derived from disparate measurement bias an important issue in the context of fairness audits.
Identifying columns are removed
Removed the outliers for abs(days_b_screening_arrest) >= 30.
Removed observations where is_recid != -1.
Removed observations where c_charge_degree != "O".
Removed observations where score_text != 'N/A'.
Factorize the features that are categorical.
Add length of stay (c_jail_out - c_jail_in) in the dataset.
Pre-processing Resource:
@url https://github.com/propublica/compas-analysis/blob/master/Compas%20Analysis.ipynb
(integer) age : The age of defendants.
(factor) c_charge_degree : The charge degree of defendants. F: Felony M: Misdemeanor
(factor) race: The race of defendants.
(factor) age_cat: The age category of defendants.
(factor) score_text: The score category of defendants.
(factor) sex: The sex of defendants.
(integer) priors_count: The prior criminal records of defendants.
(integer) days_b_screening_arrest: The count of days between screening date and (original) arrest date. If they are too far apart, that may indicate an error. If the value is negative, that indicate the screening date happened before the arrest date.
(integer) decile_score: Indicate the risk of recidivism (Min=1, Max=10)
(integer) is_recid: Binary variable indicate whether defendant is rearrested at any time.
(factor) two_year_recid: Binary variable indicate whether defendant is rearrested at within two years.
(numeric) length_of_stay: The count of days stay in jail.
mlr_tasks$get("compas") tsk("compas")
mlr_tasks$get("compas_race_binary") tsk("compas_race_binary")
ProPublica Analysis: https://github.com/propublica/compas-analysis
Bao, Michelle, Zhou, Angela, Zottola, A S, Brubach, Brian, Desmarais, Sarah, Horowitz, Seth A, Lum, Kristian, Venkatasubramanian, Suresh (2021). “It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks.” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
library("mlr3") data("compas", package = "mlr3fairness")
library("mlr3") data("compas", package = "mlr3fairness")
Allows computing metrics for predictions that do not stem from mlr3, and were
e.g. being made by models outside of mlr3.
Currently only classif
and regr
- style predictions are supported.
compute_metrics(data, target, protected_attribute, prediction, metrics = NULL)
compute_metrics(data, target, protected_attribute, prediction, metrics = NULL)
data |
( |
target |
( |
protected_attribute |
( |
prediction |
( |
metrics |
( |
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") # Get adult data as a data.table train = tsk("adult_train")$data() mod = rpart::rpart(target ~ ., train) # Predict on test data test = tsk("adult_test")$data() yhat = predict(mod, test, type = "vector") # Convert to a factor with the same levels yhat = as.factor(yhat) levels(yhat) = levels(test$target) compute_metrics( data = test, target = "target", prediction = yhat, protected_attribute = "sex", metrics = msr("fairness.acc") )
library("mlr3") # Get adult data as a data.table train = tsk("adult_train")$data() mod = rpart::rpart(target ~ ., train) # Predict on test data test = tsk("adult_test")$data() yhat = predict(mod, test, type = "vector") # Convert to a factor with the same levels yhat = as.factor(yhat) levels(yhat) = levels(test$target) compute_metrics( data = test, target = "target", prediction = yhat, protected_attribute = "sex", metrics = msr("fairness.acc") )
Provides visualization wrt. trade-offs between fairness and accuracy metrics across learners and resampling iterations. This can assist in gauging the optimal model from a set of options along with estimates of variance (through individual resampling iterations).
fairness_accuracy_tradeoff(object, ...)
fairness_accuracy_tradeoff(object, ...)
object |
(PredictionClassif | BenchmarkResult | ResampleResult)
|
... |
Arguments to be passed to methods. Such as:
|
A 'ggplot2' object.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") library("mlr3learners") library("ggplot2") # Setup the Fairness measure and tasks task = tsk("adult_train")$filter(1:500) learner = lrn("classif.ranger", predict_type = "prob") fairness_measure = msr("fairness.tpr") # Example 1 - A single prediction learner$train(task) predictions = learner$predict(task) fairness_accuracy_tradeoff(predictions, fairness_measure, task = task) # Example2 - A benchmark design = benchmark_grid( tasks = task, learners = lrns(c("classif.featureless", "classif.rpart"), predict_type = "prob", predict_sets = c("train", "test")), resamplings = rsmps("cv", folds = 2) ) bmr = benchmark(design) fairness_accuracy_tradeoff(bmr, fairness_measure)
library("mlr3") library("mlr3learners") library("ggplot2") # Setup the Fairness measure and tasks task = tsk("adult_train")$filter(1:500) learner = lrn("classif.ranger", predict_type = "prob") fairness_measure = msr("fairness.tpr") # Example 1 - A single prediction learner$train(task) predictions = learner$predict(task) fairness_accuracy_tradeoff(predictions, fairness_measure, task = task) # Example2 - A benchmark design = benchmark_grid( tasks = task, learners = lrns(c("classif.featureless", "classif.rpart"), predict_type = "prob", predict_sets = c("train", "test")), resamplings = rsmps("cv", folds = 2) ) bmr = benchmark(design) fairness_accuracy_tradeoff(bmr, fairness_measure)
Visualizes per-subgroup densities across learners, task and class. The plot is a combination of boxplot and violin plot. The y-axis shows the levels in protected columns. And the x-axis shows the predicted probability. The title for the plot will demonstrate which class for predicted probability.
fairness_prediction_density(object, ...)
fairness_prediction_density(object, ...)
object |
(PredictionClassif | ResampleResult | BenchmarkResult) |
... |
The arguments to be passed to methods, such as:
|
A 'ggplot2' object.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") library("mlr3learners") task = tsk("adult_train")$filter(1:500) learner = lrn("classif.rpart", predict_type = "prob", cp = 0.001) learner$train(task) # For prediction predictions = learner$predict(task) fairness_prediction_density(predictions, task) # For resampling rr = resample(task, learner, rsmp("cv")) fairness_prediction_density(rr)
library("mlr3") library("mlr3learners") task = tsk("adult_train")$filter(1:500) learner = lrn("classif.rpart", predict_type = "prob", cp = 0.001) learner$train(task) # For prediction predictions = learner$predict(task) fairness_prediction_density(predictions, task) # For resampling rr = resample(task, learner, rsmp("cv")) fairness_prediction_density(rr)
A fairness tensor is a list of groupwise confusion matrices.
fairness_tensor(object, normalize = "all", ...) ## S3 method for class 'data.table' fairness_tensor(object, normalize = "all", task, ...) ## S3 method for class 'PredictionClassif' fairness_tensor(object, normalize = "all", task, ...) ## S3 method for class 'ResampleResult' fairness_tensor(object, normalize = "all", ...)
fairness_tensor(object, normalize = "all", ...) ## S3 method for class 'data.table' fairness_tensor(object, normalize = "all", task, ...) ## S3 method for class 'PredictionClassif' fairness_tensor(object, normalize = "all", task, ...) ## S3 method for class 'ResampleResult' fairness_tensor(object, normalize = "all", ...)
object |
( |
normalize |
( |
... |
|
task |
(TaskClassif) |
list()
of confusion matrix for every group in "pta"
.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") task = tsk("compas") prediction = lrn("classif.rpart")$train(task)$predict(task) fairness_tensor(prediction, task = task)
library("mlr3") task = tsk("compas") prediction = lrn("classif.rpart")$train(task)$predict(task) fairness_tensor(prediction, task = task)
groupdiff_tau()
computes , i.e. the smallest symmetric ratio between
and eqny
that is smaller than 1. If
is a vector, the symmetric ratio between all
elements in
is computed.
groupdiff_absdiff()
computes , i.e. the smallest absolute difference
between
and
.
If
is a vector, the symmetric absolute difference between all elements in
is computed.
groupdiff_tau(x) groupdiff_absdiff(x) groupdiff_diff(x)
groupdiff_tau(x) groupdiff_absdiff(x) groupdiff_diff(x)
x |
( |
A single numeric
.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
groupdiff_tau(1:3) groupdiff_diff(1:3) groupdiff_absdiff(1:3)
groupdiff_tau(1:3) groupdiff_diff(1:3) groupdiff_absdiff(1:3)
Instantiates one new measure per protected attribute group in a task. Each metric is then evaluated only on predictions made for the given specific subgroup.
groupwise_metrics(base_measure, task, intersect = TRUE)
groupwise_metrics(base_measure, task, intersect = TRUE)
base_measure |
( |
task |
|
intersect |
|
list
List of mlr3::Measures.
library("mlr3") t = tsk("compas") l = lrn("classif.rpart") m = groupwise_metrics(msr("classif.acc"), t) l$train(t)$predict(t)$score(m, t)
library("mlr3") t = tsk("compas") l = lrn("classif.rpart") m = groupwise_metrics(msr("classif.acc"), t) l$train(t)$predict(t)$score(m, t)
This measure extends mlr3::Measure()
with statistical group fairness:
A common approach to quantifying a model's fairness is to compute the difference between a
protected and an unprotected group according w.r.t. some performance metric, e.g.
classification error
(mlr_measures_classif.ce) or false positive rate
(mlr_measures_classif.fpr).
The operation for comparison (e.g., difference or quotient) can be specified using the operation
parameter, e.g. groupdiff_absdiff()
or groupdiff_tau()
.
Composite measures encompasing multiple fairness metrics can be built using MeasureFairnessComposite.
Some popular predefined measures can be found in the dictionary mlr_measures.
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
mlr3::Measure
-> MeasureFairness
base_measure
(Measure()
)
The base measure to be used by the fairness measures,
e.g. mlr_measures_classif.fpr for the false positive rate.
operation
(function()
)
The operation used to compute the difference. A function with args 'x' and 'y' that returns
a single value. Defaults to abs(x - y)
.
new()
Creates a new instance of this R6 class.
MeasureFairness$new( id = NULL, base_measure, operation = groupdiff_absdiff, minimize = TRUE, range = c(-Inf, Inf) )
id
(character
)
The measure's id. Set to 'fairness.<base_measure_id>' if ommited.
base_measure
(Measure()
)
The base metric evaluated within each subgroup.
operation
(function
)
The operation used to compute the difference. A function that returns
a single value given input: computed metric for each subgroup.
Defaults to groupdiff_absdiff.
minimize
(logical()
)
Should the measure be minimized? Defaults to TRUE
.
range
(numeric(2)
)
Range of the resulting measure. Defaults to c(-Inf, Inf)
.
clone()
The objects of this class are cloneable with this method.
MeasureFairness$clone(deep = FALSE)
deep
Whether to make a deep clone.
library("mlr3") # Create MeasureFairness to measure the Predictive Parity. t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("fairness", base_measure = msr("classif.ppv")) predictions = learner$predict(t) predictions$score(measure, task = t)
library("mlr3") # Create MeasureFairness to measure the Predictive Parity. t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("fairness", base_measure = msr("classif.ppv")) predictions = learner$predict(t) predictions$score(measure, task = t)
Computes a composite measure from multiple fairness metrics and aggregates them
using aggfun
(defaulting to mean()
).
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
mlr3::Measure
-> MeasureFairnessComposite
new()
Creates a new instance of this R6 class.
MeasureFairnessComposite$new( id = NULL, measures, aggfun = function(x) mean(x), operation = groupdiff_absdiff, minimize = TRUE, range = c(-Inf, Inf) )
id
(character(1)
)
Id of the measure. Defaults to the concatenation of ids in measure
.
measures
(list of MeasureFairness)
List of fairness measures to aggregate.
aggfun
(function()
)
Aggregation function used to aggregate results from respective measures. Defaults to sum
.
operation
(function()
)
The operation used to compute the difference. A function that returns
a single value given input: computed metric for each subgroup.
Defaults to groupdiff_absdiff
.
See MeasureFairness
for more information.
minimize
(logical(1)
)
Should the measure be minimized? Defaults to TRUE
.
range
(numeric(2)
)
Range of the resulting measure. Defaults to c(-Inf, Inf)
.
clone()
The objects of this class are cloneable with this method.
MeasureFairnessComposite$clone(deep = FALSE)
deep
Whether to make a deep clone.
library("mlr3") # Equalized Odds Metric MeasureFairnessComposite$new(measures = msrs(c("fairness.fpr", "fairness.tpr"))) # Other metrics e.g. based on negative rates MeasureFairnessComposite$new(measures = msrs(c("fairness.fnr", "fairness.tnr")))
library("mlr3") # Equalized Odds Metric MeasureFairnessComposite$new(measures = msrs(c("fairness.fpr", "fairness.tpr"))) # Other metrics e.g. based on negative rates MeasureFairnessComposite$new(measures = msrs(c("fairness.fnr", "fairness.tnr")))
This measure allows constructing for 'constraint' measures of the following form:
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
mlr3::Measure
-> MeasureFairnessConstraint
performance_measure
(Measure()
)
The performance measure to be used.
fairness_measure
(Measure()
)
The fairness measure to be used.
epsilon
(numeric
)
Deviation from perfect fairness that is allowed.
new()
Creates a new instance of this R6 class.
MeasureFairnessConstraint$new( id = NULL, performance_measure, fairness_measure, epsilon = 0.01, range = c(-Inf, Inf) )
id
(character
)
The measure's id. Set to 'fairness.<base_measure_id>' if ommited.
performance_measure
(Measure()
)
The measure used to measure performance (e.g. accuracy).
fairness_measure
(Measure()
)
The measure used to measure fairness (e.g. equalized odds).
epsilon
(numeric
)
Allowed divergence from perfect fairness. Initialized to 0.01.
range
(numeric
)
Range of the resulting measure. Defaults to c(-Inf, Inf)
.
clone()
The objects of this class are cloneable with this method.
MeasureFairnessConstraint$clone(deep = FALSE)
deep
Whether to make a deep clone.
mlr_measures_fairness
# Accuracy subject to equalized odds fairness constraint: library("mlr3") t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("fairness.constraint", id = "acc_tpr", msr("classif.acc"), msr("fairness.tpr")) predictions = learner$predict(t) predictions$score(measure, task = t)
# Accuracy subject to equalized odds fairness constraint: library("mlr3") t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("fairness.constraint", id = "acc_tpr", msr("classif.acc"), msr("fairness.tpr")) predictions = learner$predict(t) predictions$score(measure, task = t)
Allows for calculation of arbitrary mlr3::Measure()
s on a selected sub-group.
mlr3::Measure
-> MeasureSubgroup
base_measure
(Measure()
)
The base measure to be used by the fairness measures,
e.g. mlr_measures_classif.fpr for the false positive rate.
subgroup
(character
)|(integer
)
Subgroup identifier.
intersect
(logical
)
Should groups be intersected?
new()
Creates a new instance of this R6 class.
MeasureSubgroup$new(id = NULL, base_measure, subgroup, intersect = TRUE)
id
(character
)
The measure's id. Set to 'fairness.<base_measure_id>' if ommited.
base_measure
(Measure()
)
The measure used to measure fairness.
subgroup
(character
)|(integer
)
Subgroup identifier. Either value for the protected attribute or position in task$levels
.
intersect
logical
Should multiple pta groups be intersected? Defaults to TRUE
.
Only relevant if more than one pta
columns are provided.
clone()
The objects of this class are cloneable with this method.
MeasureSubgroup$clone(deep = FALSE)
deep
Whether to make a deep clone.
MeasureFairness, groupwise_metrics
library("mlr3") # Create MeasureFairness to measure the Predictive Parity. t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("subgroup", base_measure = msr("classif.acc"), subgroup = "Female") predictions = learner$predict(t) predictions$score(measure, task = t)
library("mlr3") # Create MeasureFairness to measure the Predictive Parity. t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("subgroup", base_measure = msr("classif.acc"), subgroup = "Female") predictions = learner$predict(t) predictions$score(measure, task = t)
Fair Learners in mlr3
mlr_learners_fairness
mlr_learners_fairness
An object of class data.table
(inherits from data.frame
) with 5 rows and 3 columns.
A data.table containing an overview of available fair learners.
mlr3fairness comes with a set of predefined fairn learners listed below:
key | package | reference |
regr.fairfrrm | fairml | Scutari et al., 2021 |
classif.fairfgrrm | fairml | Scutari et al., 2021 |
regr.fairzlm | fairml | Zafar et al., 2019 |
classif.fairzlrm | fairml | Zafar et al., 2019 |
regr.fairnclm | fairml | Komiyama et al., 2018 |
The protected attribute is specified as a col_role
in the corresponding Task()
:<Task>$col_roles$pta = "name_of_attribute"
This also allows specifying more than one protected attribute,
in which case fairness will be considered on the level of intersecting groups defined by all columns
selected as a predicted attribute.
library("mlr3") # Available learners: mlr_learners_fairness
library("mlr3") # Available learners: mlr_learners_fairness
Fairness Measures in mlr3
mlr_measures_fairness
mlr_measures_fairness
An object of class data.table
(inherits from data.frame
) with 18 rows and 2 columns.
A data.table containing an overview of available fairness metrics.
mlr3fairness comes with a set of predefined fairness measures as listed below. For full flexibility, MeasureFairness can be used to construct classical group fairness measures based on a difference between a performance metrics across groups by combining a performance measure with an operation for measuring differences. Furthermore MeasureSubgroup can be used to measure performance in a given subgroup, or alternatively groupwise_metrics(measure, task) to instantiate a measure for each subgroup in a Task.
key | description |
fairness.acc | Absolute differences in accuracy across groups |
fairness.mse | Absolute differences in mean squared error across groups |
fairness.fnr | Absolute differences in false negative rates across groups |
fairness.fpr | Absolute differences in false positive rates across groups |
fairness.tnr | Absolute differences in true negative rates across groups |
fairness.tpr | Absolute differences in true positive rates across groups |
fairness.npv | Absolute differences in negative predictive values across groups |
fairness.ppv | Absolute differences in positive predictive values across groups |
fairness.fomr | Absolute differences in false omission rates across groups |
fairness.fp | Absolute differences in false positives across groups |
fairness.tp | Absolute differences in true positives across groups |
fairness.tn | Absolute differences in true negatives across groups |
fairness.fn | Absolute differences in false negatives across groups |
fairness.cv | Difference in positive class prediction, also known as Calders-Wevers gap or demographic parity |
fairness.eod | Equalized Odds: Mean of absolute differences between true positive and false positive rates across groups |
fairness.pp | Predictive Parity: Mean of absolute differences between ppv and npv across groups |
fairness.acc_eod=.05 | Accuracy under equalized odds < 0.05 constraint |
fairness.acc_ppv=.05 | Accuracy under ppv difference < 0.05 constraint |
library("mlr3") # Predefined measures: mlr_measures_fairness$key
library("mlr3") # Predefined measures: mlr_measures_fairness$key
Return the probabiliy of a positive prediction, often known as 'Calders-Wevers' gap. This is defined as count of positive predictions divided by the number of observations.
mlr3::Measure
-> MeasurePositiveProbability
new()
Initialize a Measure Positive Probability Object
MeasurePositiveProbability$new()
clone()
The objects of this class are cloneable with this method.
MeasurePositiveProbability$clone(deep = FALSE)
deep
Whether to make a deep clone.
library("mlr3") # Create Positive Probability Measure t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("classif.pp") predictions = learner$predict(t) predictions$score(measure, task = t)
library("mlr3") # Create Positive Probability Measure t = tsk("adult_train") learner = lrn("classif.rpart", cp = .01) learner$train(t) measure = msr("classif.pp") predictions = learner$predict(t) predictions$score(measure, task = t)
Fairness post-processing method to achieve equalized odds fairness.
Works by randomly flipping a subset of predictions with pre-computed
probabilities in order to satisfy equalized odds constraints.
NOTE: Carefully assess the correct privileged group.
R6Class object inheriting from PipeOpTaskPreproc
/PipeOp
.
PipeOpEOd*$new(id = "eod", param_vals = list())
id
(character(1))
.
param_vals
(list()
)
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task
, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task. The output during prediction is a PredictionClassif with partially flipped predictions.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc.
alpha
(numeric()
): A number between 0 (no debiasing) and 1 (full debiasing).
Controls the debiasing strength by multiplying the flipping probabilities with alpha.
privileged
(character()
): The privileged group.
Only fields inherited from PipeOpTaskPreproc/PipeOp
.
Methods inherited from PipeOpTaskPreproc/PipeOp.
mlr3pipelines::PipeOp
-> PipeOpEOd
new()
Creates a new instance of this [R6][R6::R6Class][PipeOp] R6 class.
PipeOpEOd$new(id = "EOd", param_vals = list())
id
character
The PipeOps identifier in the PipeOps library.
param_vals
list
The parameter values to be set. See Parameters
.
clone()
The objects of this class are cloneable with this method.
PipeOpEOd$clone(deep = FALSE)
deep
Whether to make a deep clone.
Hardt M, Price E, Srebro N (2016). “Equality of Opportunity in Supervised Learning.” In Advances in Neural Information Processing Systems, volume 29, 3315–3323. https://papers.nips.cc/paper/2016/file/9d2682367c3935defcb1f9e247a97c0d-Paper.pdf.
Pleiss, Geoff, Raghavan, Manish, Wu, Felix, Kleinberg, Jon, Weinberger, Q K (2017). “On Fairness and Calibration.” In Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.), Advances in Neural Information Processing Systems, volume 30. https://proceedings.neurips.cc/paper/2017/file/b8b9c74ac526fffbeb2d39ab038d1cd7-Paper.pdf.
https://mlr3book.mlr-org.com/list-pipeops.html
Other PipeOps:
mlr_pipeops_explicit_pta
,
mlr_pipeops_reweighing
library("mlr3") library("mlr3pipelines") eod = po("EOd") learner_po = po("learner_cv", learner = lrn("classif.rpart"), resampling.method = "insample" ) task = tsk("compas") graph = learner_po %>>% eod glrn = GraphLearner$new(graph) glrn$train(task) # On a Task glrn$predict(task) # On newdata glrn$predict_newdata(task$data(cols = task$feature_names))
library("mlr3") library("mlr3pipelines") eod = po("EOd") learner_po = po("learner_cv", learner = lrn("classif.rpart"), resampling.method = "insample" ) task = tsk("compas") graph = learner_po %>>% eod glrn = GraphLearner$new(graph) glrn$train(task) # On a Task glrn$predict(task) # On newdata glrn$predict_newdata(task$data(cols = task$feature_names))
Turns the column with column role 'pta' into an explicit separate column prefixed with "..internal_pta". This keeps it from getting changed or adapted by subsequent pipelines that operate on the feature pta.
R6Class object inheriting from PipeOpTaskPreproc
/PipeOp
.
PipeOpExplicitPta$new(id = "reweighing", param_vals = list())
id
(character(1)
).
param_vals
(list()
)
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task, a TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added weights column according to target class. The output during prediction is the unchanged input.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc.
The PipeOp does not have any hyperparameters.
Copies the existing pta column to a new column.
Only fields inherited from PipeOpTaskPreproc/PipeOp
.
Methods inherited from PipeOpTaskPreproc/PipeOp.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpExplicitPta
new()
Creates a new instance of this [R6][R6::R6Class][PipeOp] R6 class.
PipeOpExplicitPta$new(id = "explicit_pta", param_vals = list())
id
character
The PipeOps identifier in the PipeOps library.
param_vals
list
The parameter values to be set. See Parameters
.
clone()
The objects of this class are cloneable with this method.
PipeOpExplicitPta$clone(deep = FALSE)
deep
Whether to make a deep clone.
https://mlr3book.mlr-org.com/list-pipeops.html
Other PipeOps:
mlr_pipeops_equalized_odds
,
mlr_pipeops_reweighing
library("mlr3") library("mlr3pipelines") epta = po("explicit_pta") new = epta$train(list(tsk("adult_train")))
library("mlr3") library("mlr3pipelines") epta = po("explicit_pta") new = epta$train(list(tsk("adult_train")))
Adjusts class balance and protected group balance in order to achieve fair(er) outcomes.
R6Class object inheriting from PipeOpTaskPreproc
/PipeOp
.
Adds a class weight column to the Task that different Learner
s
may be using. In case initial weights are present, those are multiplied with new weights.
Caution: Only fairness tasks are supported. Which means tasks need to have protected field.
tsk$col_roles$pta
.
Oversamples a Task for more balanced ratios in subgroups and protected groups.
Can be used if a learner does not support weights.
Caution: Only fairness tasks are supported. Which means tasks need to have protected field.
tsk$col_roles$pta
.
PipeOpReweighing*$new(id = "reweighing", param_vals = list())
id
(character(1)
).
param_vals
(list()
)
Input and output channels are inherited from PipeOpTaskPreproc. Instead of a Task
, a
TaskClassif is used as input and output during training and prediction.
The output during training is the input Task with added weights column according to target class. The output during prediction is the unchanged input.
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc.
alpha
(numeric()
): A number between 0 (no debiasing) and 1 (full debiasing).
Introduces, or overwrites, the "weights" column in the Task. However, the Learner method needs to respect weights for this to have an effect.
The newly introduced column is named reweighing.WEIGHTS
; there will be a naming conflict if this
column already exists and is not a weight column itself.
Only fields inherited from PipeOpTaskPreproc/PipeOp
.
Methods inherited from PipeOpTaskPreproc/PipeOp
.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpReweighingWeights
new()
Creates a new instance of this [R6][R6::R6Class][PipeOp] R6 class.
PipeOpReweighingWeights$new(id = "reweighing_wts", param_vals = list())
id
character
The PipeOps identifier in the PipeOps library.
param_vals
list
The parameter values to be set.
alpha: controls the proportion between initial weight (1 if non existing) and reweighing weight. Defaults to 1. Here is how it works: new_weight = (1 - alpha) * 1 + alpha x reweighing_weight final_weight = old_weight * new_weight
clone()
The objects of this class are cloneable with this method.
PipeOpReweighingWeights$clone(deep = FALSE)
deep
Whether to make a deep clone.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpReweighingOversampling
new()
PipeOpReweighingOversampling$new(id = "reweighing_os", param_vals = list())
id
‘character’
The PipeOp's id.
param_vals
‘list’
A list of parameter values.
clone()
The objects of this class are cloneable with this method.
PipeOpReweighingOversampling$clone(deep = FALSE)
deep
Whether to make a deep clone.
Kamiran, Faisal, Calders, Toon (2012). “Data preprocessing techniques for classification without discrimination.” Knowledge and Information Systems, 33(1), 1–33.
https://mlr3book.mlr-org.com/list-pipeops.html
Other PipeOps:
mlr_pipeops_equalized_odds
,
mlr_pipeops_explicit_pta
library("mlr3") library("mlr3pipelines") reweighing = po("reweighing_wts") learner_po = po("learner", learner = lrn("classif.rpart")) data = tsk("adult_train") graph = reweighing %>>% learner_po glrn = GraphLearner$new(graph) glrn$train(data) tem = glrn$predict(data) tem$confusion
library("mlr3") library("mlr3pipelines") reweighing = po("reweighing_wts") learner_po = po("learner", learner = lrn("classif.rpart")) data = tsk("adult_train") graph = reweighing %>>% learner_po glrn = GraphLearner$new(graph) glrn$train(data) tem = glrn$predict(data) tem$confusion
Creates a new rmarkdown template with a skeleton questionnaire for dataset documentation. Uses the awesome markdown template created by Chris Garbin from Github.
report_datasheet(filename = "datasheet.Rmd", edit = FALSE, build = FALSE)
report_datasheet(filename = "datasheet.Rmd", edit = FALSE, build = FALSE)
filename |
( |
edit |
( |
build |
( |
Invisibly returns the path to the newly created file(s).
Gebru, Timnit, Morgenstern, Jamie, Vecchione, Briana, Vaughan, Wortman J, Wallach, Hanna, III D, Hal, Crawford, Kate (2018). “Datasheets for datasets.” arXiv preprint arXiv:1803.09010.
Other fairness_reports:
report_fairness()
,
report_modelcard()
report_file = tempfile() report_datasheet(report_file)
report_file = tempfile() report_datasheet(report_file)
Creates a new rmarkdown template with a skeleton of reported metrics and visualizations. Uses the awesome markdown template created by Chris Garbin from Github.
report_fairness( filename = "fairness_report.Rmd", objects, edit = FALSE, check_objects = FALSE, build = FALSE )
report_fairness( filename = "fairness_report.Rmd", objects, edit = FALSE, check_objects = FALSE, build = FALSE )
filename |
( |
objects |
(
|
edit |
( |
check_objects |
( |
build |
( |
Invisibly returns the path to the newly created file(s).
Other fairness_reports:
report_datasheet()
,
report_modelcard()
library("mlr3") report_file = tempfile() task = tsk("compas") learner = lrn("classif.rpart", predict_type = "prob") rr = resample(task, learner, rsmp("cv", folds = 3L)) report_fairness(report_file, list(task = task, resample_result = rr))
library("mlr3") report_file = tempfile() task = tsk("compas") learner = lrn("classif.rpart", predict_type = "prob") rr = resample(task, learner, rsmp("cv", folds = 3L)) report_fairness(report_file, list(task = task, resample_result = rr))
Creates a new rmarkdown template with a skeleton questionnaire for a model card. Uses the awesome markdown template created by Chris Garbin from Github.
report_modelcard(filename = "modelcard.Rmd", edit = FALSE, build = FALSE)
report_modelcard(filename = "modelcard.Rmd", edit = FALSE, build = FALSE)
filename |
( |
edit |
( |
build |
( |
Invisibly returns the path to the newly created file(s).
Mitchell, Margaret, Wu, Simone, Zaldivar, Andrew, Barnes, Parker, Vasserman, Lucy, Hutchinson, Ben, Spitzer, Elena, Raji, Deborah I, Gebru, Timnit (2019). “Model cards for model reporting.” In Proceedings of the conference on fairness, accountability, and transparency, 220–229.
Other fairness_reports:
report_datasheet()
,
report_fairness()
report_file = tempfile() report_modelcard(report_file)
report_file = tempfile() report_modelcard(report_file)
Create the general task documentation in a dataframe for fairness report. The information includes
Audit Date
Task Name
Number of observations
Number of features
Target Name
Feature Names
The Protected Attribute
task_summary(task)
task_summary(task)
task |
data.frame
containing the reported information
library("mlr3") task_summary(tsk("adult_train"))
library("mlr3") task_summary(tsk("adult_train"))