<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md Fahim Sikder</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Resmi Ramachandranpillai</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel de Leng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fredrik Heintz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science (IDA), Linköping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Experiential AI, Northeastern University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present FairX, an open-source Python-based benchmarking tool designed for the comprehensive analysis of models under the umbrella of fairness, utility, and eXplainability (XAI). FairX enables users to train benchmarking bias-mitigation models and evaluate their fairness using a wide array of fairness metrics, data utility metrics, and generate explanations for model predictions, all within a unified framework. Existing benchmarking tools do not have the way to evaluate synthetic data generated from fair generative models, also they do not have the support for training fair generative models either. In FairX, we add fair generative models in the collection of our fair-model library (preprocessing, in-processing, post-processing) and evaluation metrics for evaluating the quality of synthetic fair data. This version of FairX supports both tabular and image datasets. It also allows users to provide their own custom datasets. The open-source FairX benchmarking package is publicly available at https://github.com/fahim-sikder/FairX.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fair evaluation</kwd>
        <kwd>Benchmarking tool</kwd>
        <kwd>Synthetic data</kwd>
        <kwd>Data utility</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the rapid development of artificial intelligence-based systems to aid us in our daily lives,
it is important for these systems to give outcomes that is acceptable for all users, including—but
not limited to—from demographic perspective. Troublingly, as the available data is filled with
human or machine bias, models trained with these dataset often gives unfair outcome towards
some demographic [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is therefore critical to mitigate bias in the dataset and model. Over the
years, researchers have used diferent techniques to achieve this [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. These techniques can
be roughly grouped into three families: 1) Pre-processing, i.e. where the dataset is processed in
such a manner that it produces less biased outcomes, before passing it to a model for training;
2) In-processing, i.e. where the model learns the original data distribution and shifts the data
distribution to a fair distribution by adding some constraints during the training process; and 3)
Post-processing, i.e. where the model’s outcome is changed in such a manner that it gives fair
outcomes relative to protected attributes. The performance of these models or datasets can be
measured by the evaluation metrics that reflect both the fairness and data utility. To ease up the
work for training models and evaluating them, researchers has developed benchmarking tool
that bring the training and evaluation in one framework. Recently, research on fair generative
models has found a lot of spotlight and measuring the quality of the synthetic data is as crucial
as evaluating fairness and data utlity.
      </p>
      <p>
        Existing fairness-related benchmarking tools focus on creating benchmarks and measuring
their fairness on diferent datasets. For example, FairLearn [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] by Microsoft contains several fair
models and evaluation metrics for checking fairness and data utility. AI Fairness 360 (AIF360) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
by IBM also contains fairness evaluation metrics and basic data utility measuring metrics. But
both of these frameworks lack the ability to train fair generative models and measure the data
utility for synthetic data. For synthetic fair data, it is important to validate the quality of the
generated data alongside measuring the fairness and other data utilities. Explainability is an
essential property of fair models because it aids in making the model’s decision-making process
more transparent. These modules should therefore be included in such benchmarking tools.
      </p>
      <p>In this work, we present FairX, an open-source modular fairness benchmarking tool,
available to use at https://github.com/fahim-sikder/FairX. A high-level system overview
is given in Figure 1. FairX contains data processing techniques and benchmarking fairness
models (incorporating pre-processing, in-processing, and post-processing), including generative
fair models. We evaluate these models in terms of fairness, data utility. We also add evaluation
methods for synthetic fair data (Advanced Utility) to check the quality of the generated samples.
FairX supports both tabular and image data and can plot feature importance for down-streaming
task using explainable algorithms.</p>
      <p>The remainder of this paper is organised as follows. In Section 2 we discuss some background
information that will help the reader understand the rest of the paper. We then present FairX in
Section 3. Section 4 shows some fairness results obtained by FairX for a number of datasets and
models. Finally, the paper looks ahead towards future improvements in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>In this section, we provide the necessary details to follow the paper.</title>
        <sec id="sec-2-1-1">
          <title>2.1. Bias mitigation methods</title>
          <p>A variety of bias mitigation methods have been proposed in the literature based on data, training,
and predictions. These methods can be broadly categorized into three main approaches:
preprocessing, in-processing, and post-processing techniques.</p>
          <p>
            Pre-processing. These techniques involve altering the training data to resolve any potential
causes of biases before it is fed to the model. There are various techniques in the literature
such as disparate impact remover [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], data cleaning and augmentation, and fair representation
learning [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. This involves balancing the representation of diferent groups or generating
synthetic data to augment underrepresented groups, assigning weights to uphold some minority
groups, and transforming the data representation in a format that obscures protected features
while maintaining feature attributions.
          </p>
          <p>
            In-processing. This involves mitigating biases during training. The techniques involve
fairness constraints, adversarial de-biasing [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], and fairness-aware learning. In fairness constraints
training, a multi-objective optimization combining a prediction loss and a fairness penalty will
be used such as adding regularization terms to the objective function that penalizes unfairness
or incorporating fairness metrics as part of the optimization process. In adversarial de-biasing
[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], adversarial training is used to reduce bias. The model is trained to perform well on the
primary classification/prediction tasks while simultaneously trying to prevent an adversary from
predicting the protected features, thus forcing the model to learn less biased representations.
Post-processing. These methods are applied to the predictions of a classifier. Techniques
such as threshold adjustment, calibration [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], and Reject Option Classifications [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] fall under
this category. In threshold adjustment the decision thresholds of a trained model are adjusted
to ensure that the outcomes meet the chosen fairness metric. Calibration [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] ensures that
the predicted probabilities maintain the true likelihood of outcomes equally across diferent
demographic groups. Techniques like equalized odds post-processing is used where the model’s
outputs are adjusted to satisfy fairness constraints. Reject Option-Based Classification (ROC)
[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] allows the model to prevent from making a decision when the confidence is low, for the
chosen sensitive attributes. This can reduce the likelihood of biased or unfair decisions in
uncertain instances.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. Evaluation metrics</title>
          <p>
            To measure the performance of models or dataset, various evaluation methods are being used.
For evaluating fair model or checking the dataset for potential bias, diferent kinds of fairness
metrics exists. For example, demographic parity checks if the decision from a down-streaming
task is equal for each class in sensitive attributes. Fairness through unawareness [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] checks
how the accuracy of down-stream task efects if no-sensitive attributes is used during the
training and prediction phase. Adding fairness constraints to the models or datasets may change
the data distributions and thereby afect the performance of the dataset or models [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. To
check the data utility performance, we commonly use Accuracy score, F1-score, Precision and
Recall. To evaluate the quality of the synthetic data researchers use,  -precision [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ],  -recall
[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. Also to check, is the generative model is truly generating new contents or not, the metrics
authenticity [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] is being used.
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.3. Comparison of existing benchmarking tools</title>
          <p>
            Over the years researchers have developed various fairness benchmarking tools which
commonly include a dataset loader, diferent bias mitigation techniques and evaluation metrics.
Fairlearn [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] by Microsoft is one such benchmarking tool. It has support for diferent algorithms
for bias mitigation and measuring the fairness of a model. AIF360 [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] by IBM is another
benchmarking tool. It supports a wide range of evaluation metrics (both for fairness and data utility)
and bias-removal algorithms (in-processing, pre-processing and post-processing). Another
example is Jurity [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. It contains recommender system evaluations, and various fairness and
data utility functions. AEQUITAS [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], FairBench [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] generate fairnes report and REVISE
[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] is a tool to detect and mitigate bias in the image dataset. More recently, in the area of
generative models, there has been an increased interest in generating fair data in the image,
tabular and medical domains [
            <xref ref-type="bibr" rid="ref1 ref18 ref19 ref20 ref21 ref22">18, 19, 20, 1, 21, 22</xref>
            ]. But the aforementioned benchmarking tools
do not contain these models. Also, when evaluating models, other benchmarking tools, only
measure the fairness and data utility of the models itself. But evaluation methods for generated
data is needed. We need to verify the quality of the synthetic data. We also need to verify the
authenticity of the synthetic data, to show the generative models are actually generating new
content rather than just copying the data itself. FairX is bridging this gap. We add support for
evaluating synthetic data and add generative models in our benchmarking tool. Table 1 shows
the comparison among the models with FairX.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. FairX</title>
      <p>In this section we present FairX in detail. FairX is built on three primary modules, 1) the Data
Loading Module, 2) the Bias-mitigating Techniques Module, and 3) the Evaluation Module. The
main pipeline (shown in Figure 1) works as follows. Given a dataset, FairX will pre-process it in
a way that is compatible with the benchmarking model. Next the model will train itself using
the dataset. After the training the evaluation module will give the results based on fairness,
data utility and explain the outcome using explainability.</p>
      <sec id="sec-3-1">
        <title>3.1. Data loading module</title>
        <p>The BaseDataClass handles the internal processing of datasets and make it compatible with
the bias-mitigating models that are present on our framework as well as making it easier to
handle for other bias-mitigating models that are not present in this tool. This class contains
diferent methods for handling diferent kinds of data extension (CSV, and others). We add
three widely used tabular datasets (Adult-Income, COMPAS and Credit Card) and two image
datasets (Colored MNIST and CelebA) in the benchmarking tool, and we plan to add more. The
BaseDataClass process datasets based on numerical and categorical features. It also provides
methods to normalize the dataset and is equipped with functionality for various encodings
(e.g. One-hot encoding, QuantileTransformer). It also has a dataset-splitting function to split
the dataset for training and testing purpose. We also add functionality to prepare the dataset
for explainability algorithms. Sample usage of datasets are described in Appendix Section A,
Listing 1.</p>
        <p>Custom Dataset Loader. Besides adding widely used benchmarking datasets for fair data
research, we also provide the option to use custom dataset. By using the CustomDataClass,
users can load their own dataset (CSV, TXT, etc.) and train the models. Users need to specify the
sensitive attributes and target attributes while using the CustomDataClass. Pre-processing
and other functionalities are also available in this class, like in the BaseDataClass. We present
sample usage of CustomDataClass in Listing 4 of Appendix Section A.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Bias-mitigating techniques module</title>
        <p>
          One of FairX’s main aims is to benchmark diferent bias-mitigation techniques on various
datasets. Over the years, diferent techniques have been proposed, and we add models from
these techniques to the tool. For the benchmarking process, we use the same hyper-parameters
used in their respective works. We create a common format for all the bias-mitigation techniques
to make it easy for the users. For example, each bias-mitigation technique has its own class,
which has model.fit() function. This fit() function takes the dataset and processes it (if needed
for the specific model). For the generative models (in-processing techniques), this function
also generates synthetic data and saves it as a Pandas dataframe. Sample usage of models is
described in Appendix Section A, Listing 2.
Pre-processing. We add the support for Correlation remover [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (CorrRemover in FairX)
in the benchmarking. Correlation Remover removes the correlation between the sensitive
attributes with other data features by using a linear transformation while keeping as much
information as possible. It is also possible to control on how much correlation we want remove
by using the remove_intensity parameter while the value 1.0 will result maximum correlation
removal while 0.0 will do the opposite. We can access the pre-processing algorithm by using
fairx.models.preprocessing.
        </p>
        <p>
          In-processing. Most recent in-processing bias mitigation techniques are based on generative
models. And the fairness benchmarking tools we mentioned in this work does not contain these
models. One of our contribution of FairX is that, we add several fair generative models, such as,
TabFairGAN [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], Decaf [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and Fairdisco [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We can access the in-processing algorithm by
using fairx.models.inprocessing module. After training, these models will generate and save
the samples automatically.
        </p>
        <p>
          Post-processing. For the post-processing bias mitigation technique, We add Threshold
Optimizer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This technique operates on a classifier and improve the output of its based
on a fairness constraint. In this case, we use demographic_parity as a fairness constraint
to improve the outcome of the classifier as presented in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For using the post-processing
algorithm, we can use fairx.models.postprocessing module.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Evaluation module</title>
        <p>In FairX, we aim to evaluate the performance of model or dataset using wide range of evaluation
metrics. We evaluate in terms of fairness, data utility. Other existing fairness benchmarking
tools, lacks the capability to measure the data quality of the synthetic data. It is necessary to
check the data quality of the synthetic data as well as the fairness criteria. Here, we present the
evaluation module FairX has and we use XGBoost as a classifier, also we keep the option to use
scikit-learn’s LogisticRegression.</p>
        <p>Fairness Evaluation. We create the FairnessUtils class to accommodate fairness
evaluation metrics. In this class, currently we add the support for checking the Demographic
Parity Ratio, Equalized Odds Ratio, Fairness Through Unawareness (FTU) metrics. We also
have plan to add more metrics over the time. Fairness metrics can be accessed using the
fairx.metrics.FairnessUtils module.</p>
        <p>Data Utility. Beside checking the fairness criteria of the datasets or models, we also add
the functionality to check the data utility using FairX. We add the support for checking the
Accuracy, Precision, Recall, AUROC, and F1-score. And these functions can be accessed by
using the fairx.metrics.DataUtilsMetrics module.</p>
        <p>
          Synthetic Data Evaluation. In FairX, we add the functionality to evaluate the quality of
the generated data by the fair generative models. It is important to validate the quality of the
synthetic data along with the validation of fairness and data utility criteria. Existing fairness
measuring benchmark do not have the functionality to evaluate the synthetic data quality. We
evaluate the synthetic data quality in terms of fidelity, diversity and check if the synthetic data
has any trace of original data in it [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. We use  -precision [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] to evaluate the fidelity of the
synthetic data,  -recall [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] to check the diversity and Authenticity [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] is used to check if
the generative models are just memorising the training data or not. Synthetic data evaluation
module can be accessed from fairx.metrics.SyntheticEvaluation. We also add the t-SNE
and PCA plots to check the fidelity and diversity of the synthetic data too, more about the plots
are discussed in section 3.4.
        </p>
        <p>
          Explainability. We add the explainability functionality in FairX to explain the prediction of
a model. We train a classifier (XGBoost) on the benchmarking datasets, and then we explain
the prediction using the fairx.explainability.ExplainUtils module. This module is based on
the TreeExplainer of SHAP [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. Beside this, we give the functionality to show the feature
importance while making a decision. This functionality is especially useful when we want to
see how much importance is given to the sensitive attributes while making a decision.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Plotting</title>
        <p>We add various plotting support in FairX. They can be accessed under the fairx.utils.plotting
module. We add support to show the performance trade-of of model accuracy and their fairness
performance. Also, we plot the feature importance to show which features are responsible for
Protected
Attribute
Gender
Race
Gender
Race
Gender
Race
Gender
Race
Gender
Race
Gender</p>
        <p>Race
Protected
Attribute
Gender
Race
Gender
Race
Gender
Race
Gender
Race
Gender
Race
Gender
Race
best result, and all the metrics score are higher as better. Synthetic Data Evaluation is only applicable to
the Fair Generative Models (i.e. TabFairGAN and Decaf ).
how much the fair model reduce the feature importance for the sensitive attributes.</p>
        <p>To show the quality of the synthetic data generated by the fair generative models, we add
PCA and t-SNE plots. These plots shows how close the synthetic data is from the original data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussion</title>
      <p>We now consider the fairness, data utility and synthetic data evaluation (only for in-processing
generative models) of the models presented in this benchmarking tool. We also present the</p>
      <p>ACC
explainability analysis where we use the generated data by in-processing generative models
and show how the fair generated data perform on down-streaming task and how the prediction
is afected by the sensitive attributes. We also show the feature importance by using these
explainability analysis.</p>
      <p>Table 3 and 4 shows the performance of the bias mitigation algorithms for the Adult-Income
dataset and Compas dataset respectively. We run experiment using diferent Protected
attributes 1. Besides, fairness and data utility, we add synthetic data evaluation for the output of
TabFairGAN 2, and Decaf 3.</p>
      <p>From the table, we see for the generative fair models, TabFairGAN is performing well
comparing with the Decaf in both datasets with both protected attributes. The  − precision,  − recall
scores of TabFairGAN is better than Decaf, this represents the synthetic data quality of
TabFairGAN is superior than Decaf. On the other hand, TabFairGAN perform poorly in the fairness
evaluation for the ‘race’ protected attribute of the Adult-Income dataset. Whereas In-processing
technique FairDisco 4 performs well in terms of fairness and data utility.</p>
      <p>On the visual evaluation of fair synthetic data, we use the synthetic data generated by
TabFairGAN. Figure 2 shows the PCA and t-SNE plots of the synthetic data generated by
TabFairGAN. We show how closely the synthetic data distribution is matching with the original
data. If the generative model can capture the original data distribution, original and synthetic
data should overlap with each other on the PCA and t-SNE plot. Figure 2 shows that data
generated by TabFairGAN partially learned the data distribution of the original data.</p>
      <p>In Figure 3, we show the feature importance for a down-streaming task to predict the target
attribute of the Adult-Income dataset where the ‘Sensitive attribute’ is ‘sex’. We compared the
1For the sake of brevity, we could not include additional results using other datasets—we refer the reader to the
FairX repository for these results. Some metrics like precision, recall, fairness through unawareness (FTU), and
plots like fairness-accuracy trade-ofs were similarly omitted.
2https://github.com/amirarsalan90/TabFairGAN
3https://github.com/vanderschaarlab/synthcity
4https://github.com/SoftWiser-group/FairDisCo
feature importance of original data with the synthetic data generated by TabFairGAN. We can
see the feature importance of the synthetic data is lower than the original data. This means the
synthetic data generated by the TabFairGAN is less biased towards entity.</p>
      <p>Finally, Figure 4 shows the intersectional bias on the Adult-Income dataset. We plot the
percentage of ‘salary-income’ for both ‘race’ and ‘sex’ protected attributes. We see in the dataset,
decisions are given in favor towards white people.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>Massive amounts of data are being produced everyday. Unfortunately, much of this data
contains human or machine biases. Furthermore, the usage of recommendation system has
increased with advancements in artificial intelligence. But if we use biased data to train a
recommendation system, there is a high chance that the recommendation system will yield
unfair decision towards some demographics. To mitigate this issue, researchers have developed
various measure to mitigate the bias from the dataset, or to train the model in such a way that
the model produces bias-free data. To help in this process, benchmarking tools equipped with
diferent bias-mitigation techniques and evaluation metrics were developed over the years. But
these benchmarking tools commonly lack the option to evaluate generative models or to train
them. We therefore presented FairX, an open-source, modular, fairness benchmarking tool.
FairX comes with a data-loader, supports model training, and has an evaluation module. FairX
provides support for training fair generative models and for evaluating the synthetic data created
by them. FairX also contains various fairness evaluation metrics, data utility evaluation metrics
and diferent plotting techniques to help users to evaluate models and visualize outcomes. FairX
comes with support for explainability analysis of a prediction using the dataset (both original
and synthetic) and shows feature importance. We believe FairX will help the researchers and
mitigate the gap of not having fair generative models and way of evaluating synthetic data.</p>
      <p>
        In the future, we intend to extend FairX to be able to handle other modalities in addition to
tabular and image data, for example text and video. Also, we will add wider range of evaluation
metrics for both synthetic data utility and fairness metrics. For the models, we plan to add text
based and more tabular and image based fair generative models [
        <xref ref-type="bibr" rid="ref18 ref19 ref20 ref27">19, 20, 27, 18</xref>
        ]. In this version
of FairX, we do not have option to add custom models, but we plan to add this features in future
version, so users can use their own model and use all the functionalities of FairX for their model.
We also plan to add hyper-parameter optimization feature for the models so, we can find the
optimal parameters and best result. Finally, we plan to add functionalities to evaluate the output
of large language models.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The work was partially funded by the Knut and Alice Wallenberg Foundation, and the TAILOR
Network of Excellence for trustworthy AI (EC Grant Agreement 952215). Portions of this
work were carried out using the AIOps/Stellar facilities funded by the Excellence Center at
Linköping–Lund in Information Technology (ELLIIT).</p>
      <p>A. Detailed Usage
1
2
3
4
5
6
7</p>
      <p>In this section, we present diferent sample code example of our tool. We give a brief description
of each module and their corresponding class description and function details.
Dataset usage. To use the dataset already pre-loaded with the tool, we need to use the
BaseDataClass. This class takes three hyperparameters as input; dataset_name,
sensitive_attirbute and a boolean flag for attaching the target variable with the main dataframe.
BaseDataClass has two functions, preprocess_data() and split_data() to preprocess the
dataset using categorical, numerical transformation and split the dataset for training and testing
purpose respectively.</p>
      <p>from fairX.dataset import BaseDataClass
dataset_name = ’Adult-Income’
sensitive_attribute = ’race’
attach_target = True
data_module = BaseDataClass(dataset_name, sensitive_attribute, attach_target)</p>
      <sec id="sec-6-1">
        <title>Listing 1: Using BaseDataset Class.</title>
        <p>Model usage. We add three kinds of bias-removal techniques under the models folder of
FairX. The list of available models can be found in Table 2. Here is an example usage of
inprocessing algorithm called TabFairGAN. After initializing the Model, we train the it by calling
the fit() function which takes the dataset, batch size and number of epochs as parameters.
After training, for the fair generative models (TabFairGAN and Decaf), synthetic data will be
automatically saved in the working directory.</p>
        <p>from fairX.models.inprocessing import TabFairGAN
data_module = BaseDataClass(dataset_name, sensitive_attribute, attach_target)
under_prev = ’Female’
y_desire = ’&gt;50K’
tabfairgan = TabFairGAN(under_prev, y_desire)
tabfairgan.fit(data_module, batch_size = 256, epochs = 1000)</p>
      </sec>
      <sec id="sec-6-2">
        <title>Listing 2: Using Models.</title>
        <p>Metrics usage. Here, we give a sample code for measuring the fairness and data utilities
with a dataset that is already part of the FairX system. Both FairnessUtils and
DataUtilsMetrics class takes the dataset as input and then we call the evaluate_fairness() and
evaluate_utility() function to measure the fairness data utilities respectively. The result is
stored as a dictionary file.</p>
        <p>from fairX.metrics import FairnessUtils
from fairX.metrics import DataUtilsMetrics
from fairX.dataset import BaseDataClass
data_module = BaseDataClass(dataset_name, sensitive_attribute, attach_target)
cat_transformer, num_scaler, transformed_data = data_module.preprocess_data()
splitted_data = data_module.split_data(transformed_data)
fairness_measurement = FairnessUtils(splitted_data)
utility_measurement = DataUtilsMetrics(splitted_data)
fairness_res = fairness_measurement.evaluate_fairness()
datautils_res = utility_measurement.evaluate_utility()
print(fairness_res)
print(datautils_res)</p>
      </sec>
      <sec id="sec-6-3">
        <title>Listing 3: Using Fairness &amp; Data utility Metrics.</title>
        <p>The following code example is to use the CustomDataClass to load custom dataset in
FairX. We need to give the dataset path, list of sensitive attributes and a boolean operator for
attaching the target. This code also shows the usage of synthetic data evaluation using the
SyntheticEvaluation class.</p>
        <p>from fairX.metrics import SyntheticEvaluation
from fairX.dataset import BaseDataClass
from fairX.dataset import CustomDataClass
original_data = BaseDataClass(dataset_name, sensitive_attribute, attach_target)
generated_data = CustomDataClass(generated_data_path, sensitive_attribute,
attach_target)
synthetic_evaluation_class = SyntheticEvaluation(original_data, generated_data)
synthetic_data_measurement = synthetic_evaluation_class.evaluate_synthetic()
print(synthetic_data_measurement)</p>
        <p>Listing 4: Using Synthetic Data Evaluation Metrics with Custom Data Loader.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , M. Xu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <article-title>Fair representation learning: An alternative to mutual information</article-title>
          ,
          <source>in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1088</fpage>
          -
          <lpage>1097</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ntoutsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fafalios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Gadiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Iosifidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ruggieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Turini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Papadopoulos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Krasanakis</surname>
          </string-name>
          , et al.,
          <article-title>Bias in data-driven artificial intelligence systems-an introductory survey</article-title>
          ,
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <article-title>e1356</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM computing surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Weerts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dudík</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Edgar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jalali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Madaio</surname>
          </string-name>
          ,
          <article-title>Fairlearn: Assessing and improving fairness of ai systems</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Bellamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Houde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kannan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lohia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mojsilović</surname>
          </string-name>
          , et al.,
          <source>Ai</source>
          fairness
          <volume>360</volume>
          :
          <article-title>An extensible toolkit for detecting and mitigating algorithmic bias</article-title>
          ,
          <source>IBM Journal of Research and Development</source>
          <volume>63</volume>
          (
          <year>2019</year>
          )
          <fpage>4</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Friedler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scheidegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkatasubramanian</surname>
          </string-name>
          ,
          <article-title>Certifying and removing disparate impact</article-title>
          ,
          <source>in: proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>268</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pitassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <article-title>Learning fair representations</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lemoine</surname>
          </string-name>
          , M. Mitchell,
          <article-title>Mitigating unwanted biases with adversarial learning</article-title>
          ,
          <source>in: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pleiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <article-title>On fairness and calibration</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kamiran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karim</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Decision theory for discrimination-aware classification</article-title>
          ,
          <source>in: 2012 IEEE 12th international conference on data mining, IEEE</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>924</fpage>
          -
          <lpage>929</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cornacchia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Anelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Biancofiore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Narducci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ragone</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Di Sciascio</surname>
          </string-name>
          ,
          <article-title>Auditing fairness under unawareness through counterfactual reasoning</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>60</volume>
          (
          <year>2023</year>
          )
          <fpage>103224</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. E.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Y. Liu,
          <article-title>Understanding instance-level impact of fairness constraints</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>23114</fpage>
          -
          <lpage>23130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van</given-names>
            <surname>Breugel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Saveliev</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van der Schaar</surname>
          </string-name>
          ,
          <article-title>How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>290</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thielbar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kadıoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pack</surname>
          </string-name>
          , L. Dannull,
          <article-title>Surrogate membership for inferred metrics in fairness evaluation</article-title>
          ,
          <source>in: International Conference on Learning and Intelligent Optimization</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>424</fpage>
          -
          <lpage>442</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Saleiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kuester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hinkson</surname>
          </string-name>
          , J. London,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anisfeld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Rodolfa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghani</surname>
          </string-name>
          ,
          <article-title>Aequitas: A bias and fairness audit toolkit</article-title>
          , arXiv preprint arXiv:
          <year>1811</year>
          .
          <volume>05577</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Krasanakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          ,
          <article-title>Towards standardizing ai bias exploration</article-title>
          ,
          <source>arXiv preprint arXiv:2405.19022</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Russakovsky,</surname>
          </string-name>
          <article-title>REVISE: A tool for measuring and mitigating bias in visual datasets</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramachandranpillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sikder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bergström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Heintz</surname>
          </string-name>
          , Bt-GAN:
          <article-title>Generating Fair Synthetic Healthdata via Bias-transforming Generative Adversarial Networks</article-title>
          ,
          <source>Journal of Artificial Intelligence Research (JAIR) 79</source>
          (
          <year>2024</year>
          )
          <fpage>1313</fpage>
          -
          <lpage>1341</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Deng</surname>
          </string-name>
          , Fairgan:
          <article-title>Gans-based fairness-aware learning for recommendations with implicit feedback</article-title>
          ,
          <source>in: Proceedings of the ACM web conference</source>
          <year>2022</year>
          ,
          <year>2022</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramachandranpillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sikder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <article-title>Fair Latent Deep Generative Models (FLDGMs) for Syntax-Agnostic and Fair Synthetic Data Generation</article-title>
          ,
          <source>in: ECAI</source>
          <year>2023</year>
          , IOS Press,
          <year>2023</year>
          , pp.
          <fpage>1938</fpage>
          -
          <lpage>1945</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. O.</given-names>
            <surname>Garibay</surname>
          </string-name>
          , Tabfairgan:
          <article-title>Fair tabular data generation with generative adversarial networks</article-title>
          ,
          <source>Machine Learning and Knowledge Extraction</source>
          <volume>4</volume>
          (
          <year>2022</year>
          )
          <fpage>488</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Van Breugel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kyono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berrevoets</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van der Schaar</surname>
          </string-name>
          ,
          <article-title>Decaf: Generating fair synthetic data using causally-aware generative networks</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>22221</fpage>
          -
          <lpage>22233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Bryant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Yarnold</surname>
          </string-name>
          ,
          <article-title>Principal-Components Analysis and Exploratory and Confirmatory Factor Analysis (</article-title>
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Van der Maaten</surname>
          </string-name>
          , G. Hinton,
          <article-title>Visualizing Data using T-SNE</article-title>
          ,
          <source>Journal of machine learning research 9</source>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sikder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramachandranpillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <article-title>Transfusion: Generating long, high fidelity time series using difusion models with transformers</article-title>
          ,
          <source>arXiv preprint arXiv:2307.12667</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          , G. Erion,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , A. DeGrave,
          <string-name>
            <surname>J. M. Prutkin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Himmelfarb</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>S.-I. Lee</given-names>
          </string-name>
          ,
          <article-title>From local explanations to global understanding with explainable ai for trees</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>2522</fpage>
          -
          <lpage>5839</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ermon</surname>
          </string-name>
          ,
          <article-title>Fair generative modeling via weak supervision</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1887</fpage>
          -
          <lpage>1898</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>