A Methodology based on Rebalancing Techniques to measure
   and improve Fairness in Artificial Intelligence algorithms
                                Ana Lavalle                                                          Alejandro Maté
                      Lucentia Research (DLSI)                                                  Lucentia Research (DLSI)
                     University of Alicante (Spain)                                            University of Alicante (Spain)
                         alavalle@dlsi.ua.es                                                        amate@dlsi.ua.es

                               Juan Trujillo                                                           Jorge García
                      Lucentia Research (DLSI)                                                 Lucentia Research (DLSI)
                     University of Alicante (Spain)                                           University of Alicante (Spain)
                          jtrujillo@dlsi.ua.es                                                       jorge.g@ua.es

ABSTRACT                                                                         Unfortunately, several studies have shown that the recidivism
Artificial Intelligence (AI) has become one of the key drivers                   prediction scores are biased [1, 2]. This algorithm showed dis-
for the next decade. As important decisions are increasingly                     criminatory behavior towards African-American inmates, which
supported or directly made by AI systems, concerns regarding                     were almost three times more likely to be classified as high risk
the rationale and fairness in their outputs are becoming more                    inmates than Caucasian inmates [1].
and more prominent nowadays. Following the recent interest in                       As a result of this trend, AI research communities have recently
fairer predictions, several metrics for measuring fairness have                  increased their attention towards the issue of AI algorithm’s fair-
been proposed, leading to different objectives which may need to                 ness. The IEEE Standards Association pays attention to the mean-
be addressed in different fashion. In this paper, we propose (i) a               ing and impact of algorithmic transparency [18]. Moreover, these
methodology for analyzing and improving fairness in AI predic-                   issues are also aligned to the ethical guidelines for a trustworthy
tions by selecting sensitive attributes that should be protected;                AI presented by the European Commission [8].Therefore, it is
(ii) We analyze how the most common rebalance approaches                         essential to ensure that the decisions made by AI solutions do
affect the fairness of AI predictions and how they compare to the                not reflect discriminatory behavior.
alternatives of removing or creating separate classifiers for each                  Nevertheless, to the best of our knowledge, most of the ap-
group within a protected attribute. Finally, (iii) our methodology               proaches are mainly focused on improving the accuracy of algo-
generates a set of tables that can be easily computed for choosing               rithms in the prediction, while the fairness of the output is rele-
the best alternative in each particular case. The main advantage                 gated to a second-class metric [5, 11, 14]. Thus, there has not been
of our methodology is that it allows AI practitioners to measure                 any proposal or methodology that guides the AI practitioners
and improve fairness in AI algorithms in a systematic way. In                    in choosing the best features to avoid unfair and discriminatory
order to check our proposal, we have properly applied it to the                  outputs from AI algorithms.
COMPAS dataset, which has been widely demonstrated to be                            In this paper, we propose a methodology that considers fair-
biased by several previous studies.                                              ness as a first-class citizen. Our methodology measures and eval-
                                                                                 uates the impact of the dataset rebalancing techniques on AI
                                                                                 fairness. The novelty of our methodology is that it introduces
1    INTRODUCTION                                                                new steps with respect to the traditional process of AI develop-
The use of Artificial Intelligence (AI) systems is rapidly spreading             ment such as: (i) a bias analysis, (ii) fairness definition and (iii)
across many different sectors and organizations. More and more                   fairness evaluation. Moreover, another novelty of our methodol-
important decisions are being made supported by AI algorithms.                   ogy is that it helps to improve fairness by applying rebalancing
Therefore, it is essential to ensure that these decisions do not                 approaches considering not only the target variable/s, but also
reflect discriminatory behavior towards certain groups. How-                     sensitive attributes in the dataset that should be protected from
ever, given the lack of an adequate methodology, creating fair AI                discrimination. In order to both exemplify our approach and test
systems has proven to be a complex and challenging task [16].                    the impact of each rebalancing alternative, we implement a classi-
   As it is becoming more and more used, big companies and                       fier over the COMPAS dataset, calculating the degree of fairness
governments are delegating responsibilities to AI systems which                  obtained according to three different fairness definitions.
have not been thoroughly evaluated. In turn, some taken de-
cisions have often been biased and unfair (e.g. the AI system
from Amazon to qualify job applicants [22] or the granting of
                                                                                 2   RELATED WORK
credit for the Apple credit card [20]). One of the most notorious                Bias can appear in many forms. [16] groups and lists different
cases where AI tools have acted in a biased and unfair way is                    types of biases that can affect AI solutions according to where
COMPAS (Correctional Offender Management Profiling for Al-                       they appear: from Data to Algorithm, when AI algorithms are
ternative Sanctions). This software has been used by judges in                   trained with biased data, the output of these algorithms might
order to decide whether to grant parole to criminals or keep them                be also biased. From Algorithm to User, when bias arises as a
in prison. The output is provided by an algorithm that evaluates                 result of an algorithm output it affects users’ behavior. Or from
the probability that a criminal defendant becoming a recidivist.                 User to Data, when data sources used for training AI algorithms
                                                                                 are generated by users, historical socio-cultural issues can be
© Copyright 2022 for this paper by its author(s). Use permitted under Creative   introduced into the data even when perfect sampling and feature
Commons License Attribution 4.0 International (CC BY 4.0)
                                                                                 selection are carried out.
   To tackle these situations, researchers have proposed differ-        3.1     Dataset Description
ent techniques that can be grouped into the next perspectives.          The dataset chosen in order to apply our methodology in a real
Data Perspective when class distribution is artificially rebal-         case study has been the ProPublica COMPAS dataset available in
anced by sampling the data. This rebalancing can be done by:            [17]. This dataset includes information about criminal defendants
Oversampling [14], creating more data in the minority classes.          who were evaluated with COMPAS scores in the Broward County
Undersampling [11], eliminating data from the majority classes          Sheriff’s Office in Florida, during 2013 and 2014.
or other like SMOTE [5], where minority classes are oversam-               For each accused (case), this dataset contain information re-
pled by interpolating between neighboring data points. However,         lated to their demographic information (race, gender, etc), crimi-
these techniques must be used with tremendous care as they can          nal history and administrative information. Finally, the dataset
lead to the loss of certain characteristics of the data. An alterna-    also contains information about whether the accused was really
tive perspective is the Algorithmic Perspective, these solutions        a recidivists or not in the next 2 years. This dataset is highly
adjust the hyperparameters of the learning algorithms. Or, the          imbalanced, the representation of the different races is heavily
Ensemble Approach that mixes aspects from both the data and             skewed. Then, we will apply our methodology step by step.
algorithmic perspectives.
   Most of these approaches mainly focus on improving the ac-           3.2     Target Variable Definition
curacy of algorithms in the prediction, while the fairness of the
                                                                        The first step of our proposed methodology is to define the target
output is relegated to a second-class metric. As [21] states, ac-
                                                                        variable. In this case, the target variable is “v_score_text” which
curacy is no longer the only concern when developing models.
                                                                        uses 3 attributes (Low, Middle, High) to classify the risk of re-
Fairness must be taken into account as well in order to avoid
                                                                        cidivism. For the sake of simplicity, we will binarize the target
more cases as those presented in the introduction.
                                                                        variable by mapping the Low class to Non-Recividist, and the
   Moreover, as [9] argues, modifying data sources or restricting
                                                                        Middle and High classes to the Recividist class thereby facilitating
models in order to improve the fairness can harm the predictive
                                                                        following the analysis presented. Therefore the Target variable
accuracy. The fairness of predictions should be evaluated in the
                                                                        is defined as Risk of recidivism (0 Non-recividist, 1 Recividist).
context of data. Unfairness induced by inadequate samples sizes
or unmeasured predictive variables should be addressed through
                                                                        3.3     Bias Analysis
data collection, rather than by constraining the model [6].
   Thus, differently from the above-presented proposals, we pro-        The second step is to perform a Bias Analysis. As previously-
pose a novel methodology that considers fairness as a first-class       summarized in Section 2, and according to [16], the different data
citizen from the very beginning of the AI process. We drive the         bias that can be used in our case study context are (i) Data to
whole process considering protected attributes during the rebal-        Algorithm, (ii) Algorithm to Use and (iii) User to Data. As in our
ance step and leading the AI practitioner to a conscious decision       particular case, we are analyzing how biased data sets affect AI
on the trade-off (if necessary) between accuracy and fairness.          algorithms, we will apply the Data to Algorithm bias.
                                                                            In order to analyze how data bias affect the behavior of AI algo-
                                                                        rithms, firstly we apply our previously published algorithm [15]
3    IMPROVING FAIRNESS IN ARTIFICIAL                                   that automatically detects and visualizes bias in data analytics.
     INTELLIGENCE                                                           This algorithm examines the dataset returning us as output a
Tackling AI challenges requires awareness of the context where          number between 0 and 10 that establishes the bias ratio of the
algorithms will be not only trained, but also, where they generate      attributes (being 0 equally distributed and 10 very biased). This
outputs. Biases and errors that go unnoticed lead into wrong or         number is visually represented in order to present an overview
unfair decisions. Moreover, since training AI algorithms is a time-     of the data bias for a better understanding and exploration.
consuming task (several days or weeks), developing them without             In this case of study, the bias ratios were (Race: 9.95, Sex:
a clear direction may result in considerable waste of resources.        7.60 and Age category: 6.28), the most biased attribute was race
   For this reason, we propose the methodology shown in Fig. 1.         and it was selected as a protected attribute. The main reason
By following this methodology, AI practitioners will be able to         is that the race of the accused should never be a characteristic
analyze and improve fairness in AI predictions.                         that influences the classification of risk of recidivism (the target
   The first step in our methodology (Fig. 1) starts with the defini-   variable). Therefore the Protected attribute is defined as Race.
tion of the Target Variable by AI practitioners. Then, during the           Furthermore, a visualization (Fig. 2) that groups the predicted
Bias Analysis step, the algorithm proposed in [15] is executed          target variable (risk of recidivism) by the attributes selected as
in order to detect existing biases in the dataset. This algorithm       protected (race) is created. As cleary observed, there is a high
output will provide an overview of how biased the attributes of         risk in accused of African-American race than in the rest of races.
the dataset are. Moreover, this information will help practition-           Once the dataset has been analyzed and the bias has been
ers to select the Protected Attribute/s such as race, gender, or        located, AI practitioners will have more detailed knowledge in
any other that requires special attention to ensure fair treatment.     order to detect the types of bias that might arise. Among the types
Whether protected attributes have been detected in the dataset,         of bias which can appear, those relevant for our methodology are
a Definition of Fairness will be launched in order to allow             categorized in Data to Algorithm bias as described by [16]:
practitioners to measure whether the AI system is really being                • Measurement Bias: Arises when we choose and mea-
fair. Then, a Data Rebalancing (whether necesary) will be ac-                   sure features of interest. If a group is monitored more
complished and AI practitioners will proceed to the Algorithm                   frequently, more errors will be observed in that group.
Training. Finally, we propose a set of tables and visualizations              • Omitted Variable Bias: When important variables are
in order to interpret the Algorithm Results.                                    left out of the model.
   In the following, we will further describe all the steps of our            • Representation Bias: Arises in the data collection pro-
methodology by applying it in a real case study.                                cess when data does not represent the real population.
                                                      Target
                                                     Variable     Bias Analysis
                                                     Definition
                                  AI Practitioners

                                                                           Protected    Yes     Fairness
                                                                          attributes?           Definition
                                                                           No

                                                                        Data                        Algorithm
                                                                     Rebalancing                    Training

                                                                                                Algorithm
                                                                                                 Results
                                                                                              Interpretation


                                         Figure 1: Methodology to mitigate bias in AI algorithms


      • Aggregation Bias: When false conclusions are drawn                            In our case study, the race attribute was considerably biased.
        about individuals from observing the entire population.                   As this is a protected attribute, it is important to define one or
        Data from several groups (i.e. cities, races, age groups, etc.)           more metrics that quantify the fairness of the results.
        can be correlated differently across classes. However, if an                  As [19] argues, in general terms, fairness can be defined as the
        aggregation is performed, the general correlation of the                  absence of any prejudice or favoritism towards an individual or a
        aggregated data could be completely different from the                    group. However, although fairness is a quality highly desired by
        earlier correlations.                                                     society, it can be surprisingly difficult to achieve in practice [16].
      • Sampling Bias: Trends estimated for one population may                        Therefore, with the aim of defining, limiting and being able
        not generalize to data collected from a new population.                   to measure whether fairness is being achieved, our proposed
      • Longitudinal Data Fallacy: When temporal data is mod-                     method makes AI practitioners reflect on the type of justice that
        eled using a cross-sectional analysis, which combines mul-                they want to achieve. Among the types of justice we can find:
        tiple groups at a single point in time.                                         • Individual Fairness: Give similar predictions to similar
      • Linking Bias: When network attributes obtained from                               individuals [10], i.e. points that are closer to each other in
        user connections, activities, or interactions differ and mis-                     the feature space should have similar predictions.
        represent the true behavior of the users.                                       • Group Fairness: Treat different groups equally [10].
   In this specific case study, the bias analysis leads to consider as                  • Subgroup Fairness: Try to obtain the best properties of
potential biases both Measurement Bias, some individuals tend                             the group and individual notions of fairness. It picks a
to live in zones with high criminal activity, hence a higher level                        statistical fairness constraint (like equalizing false posi-
of surveillance by the police is needed and it could derive into a                        tive rates across protected groups) and asks whether this
feedback loop. And Representation Bias since data presents a                              constraint holds over a large collection of subgroups [13].
significantly different distribution compared to the demographic                      In this case of study, “Group Fairness” has been selected, since
distribution of Florida state [3] (where the data was collected).                 the race attribute has been selected as protected and fairness is
                                                                                  sought between the different groups of races. Specifically, the
3.4     Fairness Definition                                                       following definitions of “Group Fairness” have been followed:
The presence of bias can eventually derive into unfair results,                         • Equalized Odds: Groups within protected attributes must
especially when the bias is present in protected attributes. Thus,                        have the same ratio of true and false positives [12]. As
analyzing which biases might be present in the current problem is                         equality of odds can be really difficult to achieve, it can be
essential to determine which fairness metrics are more important.                         decomposed into two more relaxed versions:
                                                                                        • Equal Opportunity: Groups within protected attributes
                                                                                          must have equal true positive rates [12].
                                                                                        • Predictive Equality: False positive rates must be equal
                                                                                          across all groups of the protected attribute [4].
                                                                                      Depending on the problem, one definition could be more im-
                                                                                  portant than the other. For example, in building a model to predict
                                                                                  if a subject is eligible for a grant, it is relevant for the rate of true
                                                                                  positives of both sexes to be equal, i.e. equal opportunity should
                                                                                  be achieved. On the other side, a risk assessing model should
                                                                                  focus on having the same false positive rates across protected
                                                                                  groups, as missclassifying an individual as high risk can be really
                                                                                  harmful, hence the importance to prioritize predictive equality.

                                                                                  3.5    Data Rebalancing
                                                                                  By choosing and studying which fairness metrics are more suit-
                                                                                  able for the current problem, AI practitioners are now able to
                                                                                  focus on applying several techniques and evaluate its impact
           Figure 2: Risk of recidivism score by race
                                                                                  based on these fairness definitions.
   In this case, different data rebalancing techniques will be used       most sensitive classification, since classifying non-recidivists as
to modify the dataset distribution in terms of race and recidivism        a high risk of recidivism can bring them negative consequences.
rate, in order to assess its impact in terms of fairness.                     As we can see in Table 1, the techniques that achieve the best
   Usually, data rebalancing techniques are used in problems of           FPR for the Caucasian race are Original Train - Original Test and
imbalanced classification, where the target variable to be pre-           Remove race attribute with a 0.172 rate. Meanwhile, Remove race
dicted has a majority and a minority class.                               attribute obtains the best FPR for African-American race with a
   In this case study, the dataset could be rebalanced to be com-         0.347 rate. It is remarkable how the Caucasian race obtains the
posed of 50% non-recidivists and 50% recidivists, which is the            best results when the data is original, while the African-American
target variable. However, this approach does not take into ac-            race obtains the best results when the race attribute is removed.
count the different groups where fairness has to be assessed and              However, even though using these techniques we get better
preserved. Therefore, as an alternative view on the problem, we           False Positive rates, the difference between getting a 17,2% of
propose to treat the bias and unfairness in the protected attributes      Caucasian defendants wrongly accused as a recidivists and that
as a rebalancing problem. In this sense, we extend the rebalancing        the 34,7% of African-American defendants were wrongly accused
methods to consider the protected attribute in addition to their          as a recidivists would still be considered highly unfair.
associated target variable, thus allowing to control the proportion           Additionally, our methodology generates Table 2 that calcu-
of each group in the sample.                                              lates and compares the fairness definitions chosen in Section
   In other words, by extending the rebalancing techniques, the           3.4. Using Table 2 is possible to know, depending on the type
dataset of this case study can be modified as follows: 25% African-       of fairness pursued, which technique will bring better results.
American non-recidivists, 25% African-American recidivists, 25%           We have marked the best (green) and worst (red) techniques
Caucasian non-recidivists and 25% Caucasian recidivists.                  for each definition of fairness and for the overall accuracy. We
   As there are several techniques for rebalancing, in this case          should clarify that a lower fairness number represents that there
study we will focus on three different data rebalancing techniques:       is less difference between the protected groups, i.e. it is more fair.
Undersampling [11], Oversampling [14], and SMOTE [5].                     However, the accuracy is better when its value is higher, since it
                                                                          means that there have been fewer errors in the classification.
3.6     Algorithm Training                                                    As we can observe, the technique that gets the best score in
In this case, the XGBoost [7] classifier has been used with the de-       terms of Equal Opportunity, Equalized Odds and Accuracy is to
fault hyperparameters. In order to complement the experiments             remove the protected attribute, in this case the attribute race.
related to rebalancing using the previous techniques, three extra         However, other highly used techniques as SMOTE gets the worst
experiments have been carried out to provide further insights:            results in terms of Equal Opportunity and Equalized Odds.
      • Baseline: It is important to evaluate the model obtained
        without applying any rebalancing so that it acts as a base-       4    CONCLUSIONS AND FUTURE WORK
        line model in order to compare the results.                       The use of Artificial Intelligence (AI) systems is rapidly spreading
      • Split by race: Two separate classifiers will be trained, one      across different sectors and organizations. More and more impor-
        for each of the race studied.                                     tant decisions are being made supported by AI systems which
      • Remove race attribute: Same experiment as baseline,               have not been thoroughly evaluated. It is essential to ensure that
        but omitting the race attribute.                                  these decisions do not reflect discriminatory behavior towards
   Regarding the accomplished experiments, the whole training             certain groups. Nevertheless, most of the approaches mainly fo-
process can be described as follows: (1) Split the dataset into           cus on improving the prediction accuracy of algorithms without
a training and test sets, (2) Rebalance the dataset by using the          considering fairness in their development.
forementioned techniques (depending on the experiment, either                Thus, in this paper we have presented a methodology that
the training set or both sets are rebalanced), (3) Train the classifier   allows AI practitioners to measure and improve fairness in AI
to predict the risk of recividism given variables such as sex, age,       algorithms in a systematic way. Our novel methodology consid-
race and prior criminal history of the subject, and (4) Once the          ers fairness as a first-class citizen and introduces new steps with
classifier is trained, it is evaluated on the test set by computing       respect to the traditional process of AI development such as: (i) a
the metrics above-mentioned.                                              bias analysis, (ii) fairness definition and (iii) fairness evaluation.
   In total, nine experiments will be performed: the baseline,            We have also analyzed how the most common data rebalancing
training one separate model for each race, completely omitting            approaches affect the fairness of AI predictions taking into ac-
the race variable, and six related to rebalancing either the training     count both (i) the target variable and (ii) the protected attributes.
set or both the training and test set, with each of the rebalancing       Furthermore, our methodology generates a set of tables for choos-
techniques presented: undersampling, oversampling and SMOTE.              ing the best rebalancing alternative for each particular definition
The code of the experiments is publicly available in https://gitlab.      of fairness. Both our methodology as well as the interpretation
com/lucentia/DOLAP2022.                                                   of the algorithms results (tables and visualizations) can be easily
                                                                          replicated in any AI algorithm.
3.7     Algorithm Results Interpretation                                     In order to both exemplify our approach and test the impact
Finally, in order to compare the output of the XGBoost classifier         of each rebalancing alternative, we have applied it in a real case
algorithm and to be able to measure if it has been fair, we have          of study. We have implemented a classifier over the COMPAS
created Table 1 and Table 2. It should be noted that this tables          dataset, calculating the degree of fairness obtained according to
can be easily replicated in any Artificial Intelligence challenge.        three different fairness definitions.
   First, Table 1 represents the True Positive Rates (TPR) and               Given the obtained results, we consider that by following our
False Positive Rates (FPR) for Caucasian and African-American             proposed methodology we can avoid falling into the usual pitfalls
groups. In this specific case, False Positive Rates (FPR) were the        that lead to controversial outputs when the input datasets include
                      Table 1: True Positive and False Positive Rates for African-American and Caucasian races

                                                                              TPR Cauc.     TPR Afr.       FPR Cauc.        FPR Afr.
                                    Original Train - Original Test              0.356        0.714           0.172            0.381
                                    SMOTE Train - Original Test                 0.340        0.716           0.178            0.384
                                    SMOTE Train - SMOTE Test                    0.294        0.716           0.188            0.391
                                    Over Train - Original Test                  0.397        0.701           0.215            0.370
                                    Over Train - Over Test                      0.372        0.701           0.206            0.380
                                    Under Train - Original Test                 0.371        0.721           0.188            0.418
                                    Under Train - Under Test                    0.371        0.722           0.227            0.454
                                    Split by race                               0.371        0.703           0.198            0.372
                                    Remove race attribute                       0.407        0.674           0.172           0.347

                                                              Table 2: Fairness rates comparison

                                                                        Eq. Oportunity      Pred. Equality        Eq. Odds        Accuracy
                              Original Train - Original Test                 0.358              0.209               0.567           0.659
                              SMOTE Train - Original Test                    0.376              0.206               0.582           0.654
                              SMOTE Train - SMOTE Test                       0.422              0.203              0.625            0.608
                              Over Train - Original Test                     0.304              0.155               0.459           0.654
                              Over Train - Over Test                         0.328              0.174               0.503           0.622
                              Under Train - Original Test                    0.350              0.230               0.580           0.649
                              Under Train - Under Test                       0.351              0.227               0.577          0.603
                              Split by race                                  0.332              0.174               0.506           0.654
                              Remove race attribute                          0.267              0.175              0.442           0.664


biased protected attributes. In addition, it allows us to discover                        [9] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq.
which is the most appropriate data rebalancing techniques to try                              2017. Algorithmic Decision Making and the Cost of Fairness. (2017), 797–806.
                                                                                         [10] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard
to maximize different definitions of fairness.                                                Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innova-
   Regarding the limitations of our proposal, we should take                                  tions in theoretical computer science conference. Association for Computing
                                                                                              Machinery, 214–226.
into account that our proposal has achieved successful results                           [11] Salvador García and Francisco Herrera. 2009. Evolutionary undersampling for
when protected attributes are individual and binary. However,                                 classification with imbalanced datasets: Proposals and taxonomy. Evolutionary
when as the number of protected attributes increases, rebalancing                             computation 17, 3 (2009), 275–306.
                                                                                         [12] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in
becomes more difficult. Future work is needed in order to study                               supervised learning. Advances in neural information processing systems 29
the best approach to carry out rebalancing techniques in the                                  (2016), 3315–3323.
cases where there are several protected attributes defined and                           [13] Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Prevent-
                                                                                              ing fairness gerrymandering: Auditing and learning for subgroup fairness. In
the classes contain a large number of different attribute groups.                             Proceedings of the 35th International Conference on Machine Learning, Vol. 80.
                                                                                              PMLR, 2564–2572.
                                                                                         [14] György Kovács. 2019. An empirical comparison and evaluation of minority
ACKNOWLEDGMENTS                                                                               oversampling techniques on a large number of imbalanced datasets. Applied
                                                                                              Soft Computing 83 (2019), 105662.
This work has been co-funded by the AETHER-UA project (PID                               [15] Ana Lavalle, Alejandro Maté, and Juan Trujillo. 2020. An Approach to Auto-
2020-112540RB-C43), funded by Spanish Ministry of Science                                     matically Detect and Visualize Bias in Data Analytics. In Proceedings of the
and Innovation and the BALLADEER (PROMETEO/2021/088)                                          22nd International Workshop on Design, Optimization, Languages and Analyti-
                                                                                              cal Processing of Big Data, DOLAP@EDBT/ICDT 2020, Vol. 2572. CEUR-WS.org,
projects, funded by the Conselleria de Innovación, Universidades,                             84–88. http://ceur-ws.org/Vol-2572/short11.pdf
Ciencia y Sociedad Digital (Generalitat Valenciana).                                     [16] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and
                                                                                              Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM
                                                                                              Computing Surveys (CSUR) 54, 6 (2021), 1–35.
REFERENCES                                                                               [17] Broward County Clerk’s Office, Broward County Sherrif’s Office, Florida De-
                                                                                              partment of Corrections, and ProPublica. 2021. COMPAS Recidivism Risk
 [1] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Ma-
                                                                                              Score Data and Analysis. https://www.propublica.org/datastore/dataset/
     chine Bias - There’s software used across the country to predict future crim-
                                                                                              compas-recidivism-risk-score-data-and-analysis.
     inals. And it’s biased against blacks. https://www.propublica.org/article/
                                                                                         [18] The IEEE Global Initiative on Ethics of Autonomous and Intelligent
     machine-bias-risk-assessments-in-criminal-sentencing.
                                                                                              Systems. 2017.        Ethically Aligned Design: A Vision for Prioritizing
 [2] Matias Barenstein. 2019. ProPublica’s COMPAS Data Revisited. CoRR
                                                                                              Human Well-being with Autonomous and Intelligent Systems, Version
     abs/1906.04711 (2019). arXiv:1906.04711
                                                                                              2. https://standards.ieee.org/content/dam/ieee-standards/standards/web/
 [3] The U.S. Census Bureau. 2010. Population percent change. https://www.
                                                                                              documents/other/ead_v2.pdf.
     census.gov/quickfacts/FL.
                                                                                         [19] Nripsuta Ani Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic,
 [4] Alessandro Castelnovo, Riccardo Crupi, Greta Greco, and Daniele Regoli. 2021.
                                                                                              David C Parkes, and Yang Liu. 2019. How do fairness definitions fare? Examin-
     The zoo of Fairness metrics in Machine Learning. CoRR abs/2106.00467 (2021).
                                                                                              ing public attitudes towards algorithmic definitions of fairness. In Proceedings
 [5] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip
                                                                                              of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 99–106.
     Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique.
                                                                                         [20] Neil Vigdor. 2019.           Apple Card Investigated After Gender Dis-
     Journal of artificial intelligence research 16 (2002), 321–357.
                                                                                              crimination Complaints. https://www.nytimes.com/2019/11/10/business/
 [6] Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why Is My Classifier
                                                                                              Apple-credit-card-investigation.html.
     Discriminatory?. In Advances in Neural Information Processing Systems, Vol. 31.
                                                                                         [21] Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving
     Curran Associates, Inc.
                                                                                              fairness through adversarial learning: an application to recidivism prediction.
 [7] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting
                                                                                              CoRR abs/1807.00199 (2018). arXiv:1807.00199
     system. Proceedings of the 22nd ACM SIGKDD International Conference on
                                                                                         [22] Jordan Weissmann. 2018. Amazon Created a Hiring Tool Using A.I. It Imme-
     Knowledge Discovery and Data Mining (Aug 2016).
                                                                                              diately Started Discriminating Against Women. https://slate.com/business/
 [8] European Commission. 2021. Ethics guidelines for trustworthy AI. https:
                                                                                              2018/10/amazon-artificial-intelligence-hiring-discrimination-women.html.
     //digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.