=Paper= {{Paper |id=Vol-2841/PIE+Q_4 |storemode=property |title=FAIR-DB: FunctionAl DependencIes to discoveR Data Bias |pdfUrl=https://ceur-ws.org/Vol-2841/PIE+Q_4.pdf |volume=Vol-2841 |authors=Fabio Azzalini,Chiara Criscuolo,Letizia Tanca |dblpUrl=https://dblp.org/rec/conf/edbt/AzzaliniCT21 }} ==FAIR-DB: FunctionAl DependencIes to discoveR Data Bias== https://ceur-ws.org/Vol-2841/PIE+Q_4.pdf
      FAIR-DB: FunctionAl DependencIes to discoveR Data Bias
                  Fabio Azzalini                                         Chiara Criscuolo                                Letizia Tanca
              Politecnico di Milano                                   Politecnico di Milano                           Politecnico di Milano
                    Milan, Italy                                            Milan, Italy                                    Milan, Italy
             fabio.azzalini@polimi.it                            chiara.criscuolo@mail.polimi.it                     letizia.tanca@polimi.it

ABSTRACT                                                                               only if it conforms to high ethical standard[8] and, to avoid (possi-
Computers and algorithms have become essential tools that per-                         bly unintentional) unethical behaviors and their consequences,
vade all aspects of our daily lives; this technology is based on                       data cleaning tools should also include tools to discover bias in
data and, for it to be reliable, we have to make sure that the data                    data, and technologies that accurately discover discrimination
on which it is based on is fair and without bias.                                      and bias to obtain fair databases are badly needed[13].
   In this context, Fairness has become a relevant topic of dis-                          In this paper we present FAIR-DB, a framework that, by dis-
cussion within the field of Data Science Ethics, and in general in                     covering and analyzing some special types of functional depen-
Data Science. Today’s applications should therefore be associated                      dencies, is able to find unfair behaviors in a dataset and guide its
with tools to discover bias in data, in order to avoid (possibly                       correction.
unintentional) unethical behavior and consequences; as a result,                          A Functional Dependency (F D : X → Y ) is a class of database
technologies that accurately discover discrimination and bias in                       integrity constraints that hold between two sets X and Y of at-
databases are of paramount importance.                                                 tributes in a relation of a database. It specifies that the values of
   In this work we propose FAIR-DB (FunctionAl dependencIes to                         the attributes of X uniquely (or functionally) determine the values
discoveR Data Bias), a novel solution to detect biases and discover                    of the attributes of Y . Looking at Table 1 we may spot the follow-
discrimination in datasets, that exploits the notion of Functional                     ing FD: Education-Num → Education, meaning that the number
Dependency, a particular type of constraint on the data. The                           of years already attended at school functionally determines the
proposed solution is implemented as a framework that focuses                           school level. In a FD, X is called antecedent or left-hand-side
on the mining of such dependencies, also proposing some new                            (LHS) while Y is called consequent, or right-hand-side (RHS).
metrics for evaluating the bias found in the input dataset. Our                           The constraints that Functional Dependencies impose are of-
tool can identify the attributes of the database that encompass                        ten too strict for real world datasets since they must hold for
discrimination (e.g. gender, ethnicity or religion) and the ones                       all the values of the attribute sets X and Y. For this reason, re-
that instead verify various fairness measures; moreover, based                         searchers have begun to study generalizations of FDs, called
on special aspects of these metrics and the intrinsic nature of                        Relaxed Functional Dependencies[4], which relax one or more
dependencies, the framework provides very precise information                          constraints of canonical FDs.
about the groups treated unequally, obtaining more insights re-                           Among these, we consider Approximate Functional Depen-
garding the bias present in dataset compared to other existing                         dencies (AFDs), that are uncertain FDs, i.e. they hold only on a
tools. Finally, our system also suggests possible future steps, by                     subset of the tuples, and Conditional Functional Dependencies
indicating the most appropriate (already existing) algorithms to                       (or CFDs), where conditions are used to specify the subset of
correct the dataset on the basis of the computed results.                              tuples on which a dependency holds.
                                                                                          In particular, a CFD is a pair X → Y , tp , where X and Y are
                                                                                                                                      
                                                                                       sets of attributes, X → Y is a standard functional dependency
1    INTRODUCTION                                                                      and tp is a pattern tuple over the attributes in X and Y ; for each
In recent years, fairness has become an important topic of in-                         A in X ∪ Y , tp [A] is a constant ‘a’ in dom(A), or an unnamed
terest in the Data Science community. Indeed, computers and                            variable ‘_’.
algorithms have made our lives efficient and easier, but among                            Looking at Table 1 we may have the CFD: Education, Income=
the prices we risk to pay is the possible presence of discrimination                   ‘>50K’ → Native-Country, meaning that, for people whose in-
and unfairness in the decisions we make with their support.                            come is higher than 50K, their education degree functionally
   A famous example of unfairness in a data science applica-                           determines their native country.
tion regards the Propublica analysis of the COMPAS Recidivism                             With this type of dependencies we can spot specific concrete
Algorithm 1 , a decision-making tool used by judges, probation                         patterns in the dataset, and thus we are able to analyze behaviors
and parole officers to assess a criminal defendant’s likelihood of                     in correspondence to precise values. Combining the two kinds
becoming a recidivist, where their study found that black defen-                       of relaxed dependencies we obtain the Approximate Conditional
dants were far more likely than white defendants to be incorrectly                     Functional Dependencies (ACFDs), i.e., uncertain CFDs.
judged to be at a higher risk of recidivism.                                              In this work we use Approximate Conditional Functional De-
   Data Science technologies are based on data, and for them to                        pendencies (ACFDs) to detect biases and discover discrimination
be reliable we have to make sure that the data we feed them are                        in the datasets subject to analysis, by recognizing cases where
fair and without bias. As a consequence, in these particular ap-                       the value of a certain attribute (e.g. gender, ethnicity or religion)
plications of data analysis, data can be considered of good quality                    frequently determines the value of another one (such as range of
1 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-        the proposed salary or social state).
sentencing                                                                                The paper is organized as follows. Section 2 contains the re-
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-
                                                                                       lated work. Section 3 details the methodology. Section 4 presents
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       the experiments and, finally, Section 5 concludes the paper.
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
2   STATE OF THE ART                                                       Even though Ranking Facts and our framework have a similar
Most of the research done in the area of Data Science Ethics is         objective (they both analyze a dataset to discover unfair behav-
carried out by the Machine Learning community; in this context,         iors), they achieve this goal using two very different strategies.
researcher start from defining a notion of fairness in data, and        Moreover, the output of the two systems is very different, our
then apply it to solve the problem of learning from unfair data         framework provides very precise indications of unfairness in-
in a prediction task. This notion of fairness is based on the idea      tended to be used in the correction of the dataset, while Ranking
that data have to satisfy three requirements: Diversity, Coverage       Facts employes data visualization tools to display the general
and Stability.                                                          unfair behaviors found in the dataset.
    Three possible approaches can be adopted when trying to
enforce fairness in a data analysis application: (i) preprocessing      3     METHODOLOGY
techniques, i.e. procedures that before, the application of a pre-      This section presents the details of the FAIR-DB framework.
diction algorithm, make sure that the learning data are fair; (ii)
inprocessing techniques, i.e. procedures that ensure that, during       3.1    Preliminary Notions and Framework
the learning phase, the algorithm does not picks up the bias                   Overview
present in the data, and (iii) postprocessing techniques, i.e. pro-     We now introduce some fundamental notions that will accom-
cedures that correct the algorithm’s decisions with the scope of        pany us along our discussion.
making them fair.                                                          Given a dataset D the support of a CFD X → Y , tp is defined
                                                                                                                            
    Most of the machine learning research concerns the second           as the proportion of tuples t in the dataset D which contain tp ,
approach; recently, several open-source libraries have been de-         that is:
veloped to solve the “unfairness problem" during the learning
phase of the prediction task. An example is FairML [1], which                                                 |t ∈ D; tp ⊆ t|
provides an auditing tool for predictive models by quantifying                        Support(X → Y , tp ) =
                                                                                                                    |D|
the relative effects providing various different inputs to a model
                                                                           Confidence is an indication of how often the CFD has been
and comparing its predictions. FairTest [15], on the other hand,
                                                                        found to be true. Let tp = (x ∪ y) where x is a tuple over X and y
approaches the task of detecting bias by looking for correlations
                                                                        is a tuple over Y . The confidence value of a CFD X → Y , tp is
                                                                                                                                       
between predicted labels and protected attributes.
                                                                        the proportion of the tuples t containing x which also contain y:
    A project that provides the implementation of many inter-
esting techniques is AI Fairness 360: An Extensible Toolkit for                                                   |t ∈ D; tp ⊆ t|
Detecting, Understanding, and Mitigating Unwanted Algorithmic                       Confidence(X → Y , tp ) =
                                                                                                                   |t ∈ D; x ⊆ t|
Bias [2], this work presents a new open-source framework whose
                                                                           Figure 1 represents the workflow of FAIR-DB, composed by
aim is to reach algorithmic fairness. The system tries to mitigate
                                                                        the following main steps:
algorithmic bias by exploiting techniques such as: Reweighing [9],
Optimized Preprocessing [3], Learning Fair representations [10]            (1) Data Preparation and Exploration: in this phase we
and Disparate Impact Remover [7].                                              import the data, perform (if needed) data integration and
    In the machine learning context, the majority of works that                apply the typical data preprocessing steps (solve missing
try to enforce fairness are related to a prediction task, and more             values, apply discretization etc.) needed to clean and pre-
specifically to classification algorithms in decision-making tools.            pare the data. During this phase, we also might visualize
The main difference between these approaches and our frame-                    the attribute features using different Data Visualization
work, that solves unfairness adopting a preprocessing tecnique,                techniques.
is that our system does not need a classifier to work, because             (2) ACFD Discovery and Filtering: in this phase we apply
it is based on finding conditions (constraints) that are already               an ACFD Discovery algorithm to extract Approximate Con-
present in the data, even though possibly with some level of                   ditional Functional Dependencies from the dataset. This
approximation. Furthermore, building a classifier to solve the                 algorithm takes as input the prepared dataset and three
fairness problem requires the policy to be application-oriented,               threshold parameters: minimum support, minimum con-
greatly limiting the applicability of these systems to scenarios               fidence and maximum antecedent size of the ACFDs that
where other tasks are needed.                                                  we are searching. From the output of ACFD Discovery we
    A very interesting work on fairness in Machine Learning that               discard the dependencies that contain variables. In this
employs a preprocessing tecnique is Nutritional Labels for Data and            phase, we also discard those some dependencies, keeping
Models by Stoyanovich and Howe [12]. The authors developed                     only those that might reveal unfairness.
an interpretability and transparency tool based on the concept             (3) ACFDs Selection: for each ACFD, we compute some met-
of Nutritional Labels, drawing an analogy to the food industry,                rics capturing the "ethical level" of the dependency, in par-
where simple and standardized labels convey information about                  ticular: (i) the Difference metric is a novel score introduced
the ingredients and the production processes. Nutritional labels               to discover dependencies that highlight the violation of
are derived semi-automatically as part of the complex process that             fairness, and (ii) for each protected attribute p we com-
gave rise to the data or model they describe. The final system is              pute its specific p-Difference. According to the values of
called Ranking Facts, and automatically derives nutritional labels             the metrics, we select the most interesting ACFDs.
for ranking. Ranking Facts is a collection of visual widgets that are      (4) ACFDs Ranking: we rank the ACFDs in descending or-
based on stability, fairness and diversity concepts. In particular,            der of importance, according to the three metrics.
the Fairness widget quantifies whether the ranked output exhibit           (5) ACFDs Selection and Scoring: from the ranking, the
statistical parity (a particular definition of group fairness) with            user selects the N ACFDs she perceives as the most prob-
respect to one or more protected attributes.                                   lematic, and then the system computes some metrics that
                                                                               summarize the level of unfairness of the dataset.
                                                                        values are very precise, but they do not give an overview over
                    Data Preparation and Exploration:                   the attribute values. For instance, if we have a dependency that
                    Data Preprocessing and Visualization
                                                                        contains a specific value of the attribute ‘Age’ (let us assume
                                                                        that it is 18) we do not know if the dependency could be valid
                                                                        also for values near to 18, like 20 or 16; furthermore, the depen-
                    ACFDs Discovery and Filtering:
                                                                        dency could be not useful if involves only 18-years old people.
                  Apply a ACFD discovery algorithm and                  For this reason, we suggest to use Discretization for numerical
                             filter its result                          attributes that have many different values. A similar procedure
                                                                        is often needed also when a categorical attribute can assume a
                                                                        wide range of values.
                                                                           As a last step of this first phase, Data Visualization can be of
                            ACFDs Selection:
                                                                        great help in analyzing the characteristics of the data; in fact, from
                     According to a novel metric, select
                             unethical ACFDs
                                                                        the plots, a user can understand whether groups are present in
                                                                        the dataset and more specifically, if there is a majority class for a
                                                                        certain attribute, and if present can identify minorities. Visualiza-
                                                                        tion and summary statistics are very important to find protected
                             ACFDs Ranking:                             attributes and understand if the data contains minorities that
                      Rank the ACFDs to facilitate user                 need to be analyzed more in details.
                                interaction

                                                                            Example 3.2 (Data Preparation and Exploration). Before being
                                                                        able to gather useful insights from the running example, we have to
                   ACFDs User Selection and Scoring:                    perform some preprocessing operations as Data Cleaning, Feature
                          User selects N ACFDs,                         Selection and Discretization. We noticed that the majority of the
                  presentation of the summarizing metrics
                                                                        missing values belong to attributes that are not relevant for our
                                                                        analysis (e.g. ‘Marital-Status’), we therefore decided to first per-
           Figure 1: Steps of the FAIR-DB framework                     form feature selection and then to remove the few tuples that still
                                                                        had missing values. Regarding feature selection we also removed
                                                                        interesting columns that were related to another one expressing
   In the next subsections we give a more detailed description of
                                                                        the same meaning; we noticed that two of the columns, ‘Educa-
these phases, with the help of the following running example.
                                                                        tion’ and ‘Education-Num’, represented the same information. The
    Example 3.1 (Running Example). For a part of our experiments,       latter can be obtained through a numerical encoding of the first
we have used the U.S. Census Adult Dataset2 [5], containing in-         one. To extract Functional Dependencies that do not depend on
formation about many social factors of US adults, like ‘Income’,        specific values of an attribute, it is useful to group into bins the
‘Age’, ‘Workclass’, ‘Education’, ‘Education-Num’ (i.e. the number       values of attributes that are continuous. In particular we created
of years already attended at school), ‘Marital-Status’, ‘Race’, ‘Sex’   five bins for ‘Hours-Per-Week’ attribute that are: ‘0-20’, ‘21-40’,
(i.e. gender), (work) ‘Hours-Per-Week’, ‘Native-Country’, and some      ‘41-60’, ‘61-80’, ‘81-100’. We concluded the analysis identifying the
more. Table 1 contains examples of its tuples.                          protected attribute: ‘Sex’, ‘Race’ and ‘Native-Country’, and select-
                                                                        ing ‘Income’ as target variable. We decided to keep both ‘Race’
3.2      Data Preparation and Exploration                               and ‘Native-Country’ attributes because they are correlated, but
                                                                        not in all cases, for example the offspring of migrants have the
Data Preparation starts with Data Acquisition, when the user
                                                                        same race of their parents, but different native country because in
gathers one or more datasets that she will use in the analysis.
                                                                        many cases they may be born in U.S.. To have readable and more
After Data Acquisition, we compute Summary Statistics for each
                                                                        effective Functional Dependencies, we keep the attribute ‘Race’ as
column of the dataset. This primary phase gives a general idea of
                                                                        is in the original dataset and we group the values of the attribute
the dataset and during this step, we can hypothesize the protected
                                                                        ‘Native-Country’ using 4 different values: ‘NC-US’, ‘NC-Hispanic’,
columns and identify, if present, the target variable.
                                                                        ‘NC-Non-US/Hispanic’ and ‘NC-Asian-Pacific’.
   We then perform Data Cleaning, since poor data quality could
lead to wrong or incomplete results. Depending on the dataset we
are considering, we may need to face several different error types.     3.3     ACFD Discovery and Filtering
We continue with Data Integration, where, if present, multiple          In this phase we extract the ACFDs from the dataset, using the
datasets are merged into a single one. After Data Cleaning and          ACFD Discovery algorithm of [11]. The algorithm expects as input
Data Integration we may need to perform other steps: Feature            the following three parameters: the (minimum) support threshold,
Selection and Discretization [14].                                      the (minimum) confidence threshold, and the size of the largest
   Feature selection refers to choosing the set of features to use      antecedent maxSize.
that is appropriate for the subsequent analysis. The goal of feature       Given an instance D of a schema R, support threshold δ , confi-
selection is to come up with the smallest set of features that best     dence threshold ϵ, and maximum antecedent size α, the approxi-
captures the characteristics of the problem being addressed [14].       mate CFDs discovery problem is to find all ACFDs ϕ: X → Y , tp
                                                                                                                                        
   Another important step is Discretization, performed by trans-        over R with:
forming data from numeric to nominal data type [14]. In our case
discretization is particularly important, since Functional Depen-             • support(ϕ, D) ≥ δ
dencies built on numerical attributes that have many different                • confidence(ϕ, D) ≥ ϵ
2 https://archive.ics.uci.edu/ml/datasets/Adult                               • |X | ≤ α.
       Age   Workclass      Education       Education-Num       Marital-Status     Race        Sex       Hours-Per-Week         Native-Country   Income
 0      90        ?          HS-grad                9             Widowed         White      Female              40              United-States       <=50K
 1      82     Private       HS-grad                9             Widowed         White      Female              18              United-States       <=50K
 2      66        ?        Some-college            10             Widowed         Black      Female              40              United-States       <=50K
 3      54     Private       7th-8th                4             Divorced        White      Female              40              United-States       <=50K
                                  Table 1: A few exemplar tuples of th U.S. Census Adult dataset



     (Education-Degree=‘Middle-school’) → Income=‘≤ 50K’                  3.4     ACFDs Selection
             (Age-Range=‘15-30’) → income=‘≤ 50K’
                                                                          This phase is responsible for finding the dependencies that actu-
     (Education-Degree, Income=‘≤ 50K’) → Native-Country
                                                                          ally reveal unfairness in the dataset, in fact, even if the ACFDs
       (Native-Country=‘NC-Hispanic’) → Income=‘≤ 50K’
                                                                          identified in the previous step contain protected attributes they
          (Income=‘≤ 50K’) → Native-Country=‘NC-US’
                                                                          do not all necessarily show some unethical behavior.
                Table 2: CFD Discovery output                                To do so we have devised two unfairness measures:
                                                                               • Difference: it indicates how much a dependency is ‘uneth-
                                                                                 ical’. The more this metric is high, the more the ACFD
     The ACFDs obtained are in this form:                                        reveals an unfair behavior.
                                                                               • ProtectedAttributeDifference: it indicates how much the
                                                                                 dependency shows bias with respect to a specific protected
       (lhsAttr 1 = v1, ..., lhsAttr N = vN ) → (rhsAttr = v)
                                                                                 attribute.
   The algorithm returns all the dependencies that satisfy the
                                                                          In order to assess the unfair behavior of a dependency, we also
aforementioned criteria. The discovered dependencies may suffer
                                                                          take into consideration its support, that indicates the pervasive-
from the following shortcomings:
                                                                          ness of the ACFD; unethical dependencies with high support will
    • It can happen that a dependency does not involve any pro-           impact many tuples, and thus will be more important.
      tected attribute, nor the target attribute, and thus it is use-        For each dependency ϕ, we define the Difference metric of ϕ
      less for the detection of unfair behavior.                          as the difference between the dependency confidence and the
    • Some of the dependencies may be ACFDs that contain                  confidence of the dependency computed without the protected
      variables; in this case the dependency holds for all the            attributes of the LHS of the ACFD.
      values of each non-constant attribute in its LHS. In our               Given a dependency in the form:
      application this type of dependency does not highlight
      a significant difference corresponding to some specific                                          ϕ : (X → Y , tp )
      values of the data, and thus is useless for our aims.               Let Z = (X −{ProtectedAttributes}), that is the LHS of the depen-
                                                                          dency without its protected attributes, and let z be the restriction
   Example 3.3 (CFD Discovery). The algorithm, applied to the
                                                                          of t to Z . We define the Difference as:
dataset resulting from the previous phase, finds 118 ACFDs. Table 2
reports a few of them. We can easily detect the ACFDs that contain        Difference(ϕ) = Confidence(ϕ)−NoProtectedAttributeConfidence(ϕ)
variables, for example dependency number ϕ 3 : (Education-Degree,         where
Income = ‘>50K’) →(Native-Country), does not specify the values
                                                                                                                               |t ∈ D; tp ⊆ t|
of the attributes ‘Education-Degree’ and ‘Native-Country’. As the                 NoProtectedAttributeConfidence(ϕ) =
user can notice, there are also dependencies that should be removed                                                             |t ∈ D; z ⊆ t|
from the list because they do not contain any protected attribute.           That is:
                                                                                                        |t ∈ D; tp ⊆ t| |t ∈ D; tp ⊆ t|
   Now we filter the dependencies, discarding the ones that do                      Difference(ϕ) =                     −
not satisfy the following two constraints:                                                               |t ∈ D; x ⊆ t|   |t ∈ D; z ⊆ t|
     • all the attributes of the dependency must be assigned to a            The Difference metric gives us an idea of how much the values
       value (this type of ACFDs that consists of only constants          of the protected attributes influence the value of Y .
       in both its LHS and RHS is called Constant ACFDs[6]);                 Example 3.5 (Difference score of a dependency). Analyzing the
     • at least one protected attribute and the target variable have      following dependency:
       to be present inside the dependency, so that the ACFDs
                                                                          ϕ 1 : (Sex = ‘Female ′, Workclass = ‘Private ′ ) → (Income = ‘ ≤ 50K ′ )
       might show bias.
   During the ACFDs Filtering phase, the user can also add some           We can compute the Difference as:
constraints, such as deciding which values must appear in the                           Diff (ϕ 1 ) : Conf (ϕ 1 ) − NoProtAttrConf (ϕ 1′ )
dependencies, for instance requiring that all the ACFDs must
                                                                          where
involve only the ‘Females’ because the researcher is interested
only in peculiar aspects of women, or deciding which values must                  ϕ 1′ : (Workclass = ‘Private ′ ) → (Income = ‘ ≤ 50K ′ )
not appear in the dependencies.                                           Three different behaviors can emerge:
   Example 3.4 (ACFDs Filtering). From Table 2 we discarded the               • If the Difference is close to zero, fairness is respected since it
third dependency for not being a Constant ACFD and the first three               means that females are treated equally to all the elements of
dependencies for not containing any protected attribute. After this              the population that have the same characteristics (without
phase we are left with 84 of the original 118 dependencies.                      specifying the protected attribute).
                                   ACFD                                     Support    Difference    RaceDiff    SexDiff    NativeCountryDiff
       (Native-Country = ‘NC-Hispanic’) → (Income = ‘≤50K’)                 0.0439      0.1570           0           0             0.1570
                (Sex = ‘Female’) → (Income = ‘≤50K’)                        0.2874      0.1352           0        0.1352              0
                (Race = ‘Black’) → (Income = ‘≤50K’)                        0.0813      0.1190        0.1190         0                0
                                    Table 3: A few of the selected ACFDs using the Difference metric



      • If the Difference is positive it means that the women that                  • Cumulative Support: is the percentage of tuples in the
        work privately and gain less than 50K dollars/year are over-                  dataset involved by the selected ACFDs. The more this
        all treated worse than the generality of people that work                     value is close to 1, the more tuples are impacted by unfair
        privately and gain less than 50K dollars/year.                                dependencies.
      • If the Difference is negative the opposite situation is detected.           • Difference Mean: is the mean of all the ‘Difference’ scores
                                                                                      of the selected ACFDs. It indicates how much the dataset
   A dependency could contain in the LHS more than one pro-
                                                                                      is unethical according to the dependencies selected. The
tected attribute at the same time. For this reason, we introduce the
                                                                                      greater the value, the higher the bias in the dataset.
last metric: the Protected Attribute Difference (P-Difference), which
                                                                                    • Protected Attribute Difference Mean: for each protected
is very similar to the Difference measure but is computed sepa-
                                                                                      attribute P, we report the mean of its P-Difference over all
rately, excluding one protected attribute P at a time (Z = X − P).
                                                                                      the selected ACFDs. It indicates how much the dataset is
   Finally, we choose the ACFDs whose Difference is above the
                                                                                      ethical over P according to the selected dependencies.
minThreshold, which means that there is a significant inequal-
ity between the group involved in the ACFD and the general                        These summarizing metrics entirely depend on the specific
behaviour of the population.                                                    ACFDs selected by the user, thus by selecting different sets of
                                                                                dependencies, the user can highlight different aspects of the
   Example 3.6 (Selected ACFDs). Table 3 reports three of the                   dataset.
seventeen dependencies that satisfied the selection criteria along                Note also that the framework allows the user to see some
with their relevant metrics. From the example, “Hispanic”, “Female”             exemplar tuples impacted by the selected ACFDs.
and “Black” groups suffer from discrimination with respect to the
rest of the population, in fact, people that belong to one or more of               Example 3.7 (ACFDs User Selection and Scoring). The user
these groups have an income that is below the 50000 dollars/year                 chooses N = 15 ACFDs that are interesting according to her needs
because of their nationality, sex or race.                                       among the dependencies obtained after the ranking step. The total
                                                                                 number of tuples involved by the ACFDs is 13296 while the total
3.5     ACFDs Ranking                                                            number of tuples in the dataset is 30169; this results in a Cumula-
                                                                                 tive Support of 0.44. The Difference Mean is 0.16. These two scores
In a real-world dataset, the number of ACFDs selected in the
                                                                                 indicate that a considerable number of tuples, 44%, show a behavior
previous step could be very large, even in the order of thousands,
                                                                                 that is very different, on average 16%, from the fair one. Finally, the
therefore for the user to look at all these dependencies would
                                                                                 P-Difference Mean metrics confirm that the dataset is unfair with re-
be a very demanding task to complete. Thus, it is necessary to
                                                                                 spect to all the protected attributes; the groups more discriminated
order the dependencies according to some criterion, enabling the
                                                                                 are: ‘Female’, ‘Black’, ‘NC-Hispanic’ and ‘Amer-Indian-Eskimo’.
user to analyze the most important and interesting ones first,
                                                                                 Table 4 reports a few interesting ACFDs.
speeding up the process and reducing the cost.
   In our framework the user can order the dependencies accord-
ing to one of the following criteria:                                                       (Sex = ‘Female’) → Income = ‘≤ 50K’
                                                                                             (Race = ‘Black’) → Income = ‘≤ 50K’
     • Support-based: the support indicates the proportion of tu-
                                                                                     (Race = ‘Amer-Indian-Eskimo’) → Income = ‘≤ 50K’
       ples impacted by the dependency: the higher the support,
                                                                                   (Native-Country = ‘NC-Hispanic’) → Income = ‘≤ 50K’
       the more tuples are involved by the ACFD. Ordering de-
       pendencies by support highlights the pervasiveness of the                Table 4: A few user-selected dependencies from the U.S.
       dependency.                                                              Census Adult Income Dataset
     • Difference-based: this criterion highlights the dependen-
       cies where the values of the protected attributes influence
       most the value of their RHS, therefore this ordering privi-               4     EXPERIMENTAL RESULTS
       leges the unethical aspect of the dependencies.                          We now present the results obtained by FAIR-DB on two real-
     • Mean-based: this method tries to combine both aspects                    world datasets, and a comparison between our framework and
       of a dependency: the unethical perspective and its perva-                the Ranking Facts system[12].
       siveness. Sorting the ACFDs using this criterion results
       in positioning first the dependencies that have the best                  4.1    Datasets
       trade-off between difference and support.                                The first dataset we considered is the U.S. Census Adult Income
                                                                                Dataset, already briefly presented in Example 3.1. The version we
3.6     ACFDs User Selection and Scoring                                        considered contains 32561 tuples and 13 attributes, 5 of which
In this last phase the user selects from the ranked list N depen-               are numerical and 8 are categorical.
dencies that are interesting for the research needs. Using only                    The second dataset we considered is the Titanic Dataset, con-
the N selected ACFDs, the framework computes a set of scores                    taining information of passengers on the Titanic, a British pas-
that summarize the properties of the entire dataset:                            senger liner operated by the ‘White Star Line’ that sank in the
North Atlantic Ocean in the early morning hours of 15 April             only the black women working in the private sector), and this
1912. Of the estimated 2,224 passengers and crew aboard, more           information is very useful to guide the phase of DB repair.
than 1,500 died, making the sinking one of modern history’s                 The importance of analyzing more attributes simultaneously
deadliest peacetime commercial marine disasters. The version            it is even more clear if we analyze the ACFDs in Table 5; even
we considered contains 891 samples and 12 attributes, 8 of which        though, overall, women obtain a better treatment (both Ranking
are categorical and 4 are numerical. Specifically the following         Facts and our framework, for the ‘Sex’ attribute found discrimi-
attributes are present: ‘PassengerId’, ‘Survived’, ‘Pclass’, ‘Name’,    nation only against men), by analyzing the dependencies we can
‘Sex’, ‘Age’, ‘Ticket’, ‘Fare’, ‘Cabin’ and a few more. We chose        see that if a woman died she most probably was a third-class pas-
‘Sex’ and ‘Pclass’ (the passenger class, that can be first, second or   senger, and if a man survived he most probably was a first-class
third) as the protected attributes, and selected ‘Survived’ as the      passenger.
target variable.
                                                                        5    CONCLUSION AND FUTURE WORKS
4.2    FAIR-DB Results                                                  We presented FAIR-DB, a novel framework, that, through the
We recall from Example 3.7 the results of FAIR-DB on the U.S.           extraction of a particular type of Functional Dependencies, can
Census Adult Income Dataset, which scores a Cumulative Support          discover bias and discrimination present in a datasets.
of 0.44 and a Difference Mean of 0.16, indicating that many tuples          Future works will include:, (i) the addition to the system of a
show an unfair behavior. Specifically, the dataset highlights bias      dependency repair phase, that, starting from the selected ACFDs
towards all the protected attributes: ‘Sex’, ‘Race’ and ‘Native-        will correct the dataset removing all the unfair behaviors from it
Country’; a deeper analysis of the dependencies confirms that the       (ii) the study of dependencies with high confidence and low sup-
most discriminated groups are: ‘Female’, ‘Black’, ‘NC-Hispanic’         port to highlight interesting, not necessarily frequent, behaviors,
and ‘Amer-Indian-Eskimo’.                                               (iii) the development of a graphical user interface to facilitate
   The Titanic Dataset scores a Cumulative Support of 0.74 and          the interaction of the user with the system, (iv) the study of
a Difference Mean of 0.35; indicating that many tuples are sub-         other (possibly interesting) classes of functional dependencies[4],
jected to an unfair behavior. Specifically, the dataset shows bias      (v) a deeper comparison with Ranking Facts and other similar
towards the protected attributes ‘Sex’ and ‘Pclass’. Indeed the         methods.
most discriminated groups are: ‘Third class passenger’ and ‘Male’,
showing that (i) in the third class, to which more than half of         REFERENCES
the passengers belong, the majority of people did not survive,           [1] Julius A Adebayo et al. 2016. FairML: ToolBox for diagnosing bias in predictive
                                                                             modeling. Ph.D. Dissertation. Massachusetts Institute of Technology.
and (ii) the policy ‘women and children first’ was adopted for the       [2] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie
evacuation into the lifeboats. Table 5 reports the corresponding             Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta,
                                                                             A Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting
ACFDs.                                                                       and mitigating algorithmic bias. IBM Journal of Research and Development 63,
                                                                             4/5 (2019), 4–1.
                                                                         [3] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan
          (Pclass = 1, Sex = Female) → Survived = 1                          Ramamurthy, and Kush R Varshney. 2017. Optimized pre-processing for dis-
          (Pclass = 2, Sex = Female) → Survived = 1                          crimination prevention. In Advances in Neural Information Processing Systems.
         (Survived = 0, Sex = Female) → (Pclass = 3)                         3992–4001.
                                                                         [4] Loredana Caruccio, Vincenzo Deufemia, and Giuseppe Polese. 2015. Relaxed
          (Pclass = 3, Sex = Male) → (Survived = 0)                          functional dependencies—a survey of approaches. IEEE Transactions on Knowl-
          (Survived = 1, Sex = Male) → (Pclass = 1)                          edge and Data Engineering 28, 1 (2015), 147–165.
                                                                         [5] Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http:
Table 5: A few user-selected dependencies from the Ti-                       //archive.ics.uci.edu/ml
tanic Dataset                                                            [6] Wenfei Fan, Floris Geerts, Jianzhong Li, and Ming Xiong. 2010. Discovering
                                                                             conditional functional dependencies. IEEE Transactions on Knowledge and
                                                                             Data Engineering 23, 5 (2010), 683–698.
                                                                         [7] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and
                                                                             Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact.
4.3    Comparison with Ranking Facts                                         In proceedings of the 21th ACM SIGKDD international conference on knowledge
                                                                             discovery and data mining. 259–268.
The results obtained with Ranking Facts[12] are in complete              [8] Donatella Firmani, Letizia Tanca, and Riccardo Torlone. 2019. Ethical Dimen-
                                                                             sions for Data Quality. Journal of Data and Information Quality (JDIQ) 12, 1
accordance with the ones obtained with our framework. Regard-                (2019), 1–5.
ing the first dataset, Ranking Facts finds unfair behaviors across       [9] Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for
all the three protected attributes with discrimination against:              classification without discrimination. Knowledge and Information Systems 33,
                                                                             1 (2012), 1–33.
‘Female’, ‘Black’, ‘NC-Hispanic’ and ‘Amer-Indian-Eskimo’. For          [10] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018.
what concerns the Titanic datasets, Ranking Facts detects unfair             Learning adversarially fair and transferable representations. arXiv preprint
behaviors on both the protected attributes with discrimination               arXiv:1802.06309.
                                                                        [11] Joeri Rammelaere and Floris Geerts. 2018. Revisiting conditional functional
against: ‘Male’ and ‘Third class passenger’. A deeper comparison             dependency discovery: Splitting the “C” from the “FD”. In Joint European Con-
with Ranking Facts will be included in an extension of this work.            ference on Machine Learning and Knowledge Discovery in Databases. Springer,
                                                                             552–568.
   Ranking Facts checks fairness only for one attribute at the          [12] Julia Stoyanovich and Bill Howe. 2019. Nutritional Labels for Data and Models.
time, while, since the ACFD technique can involve more than                  IEEE Data Eng. Bull. 42, 3 (2019), 13–23.
one attribute at a time, our tool can report information about          [13] Julia Stoyanovich, Bill Howe, and HV Jagadish. 2020. Responsible data man-
                                                                             agement. Proceedings of the VLDB Endowment 13, 12 (2020), 3474–3488.
subgroups fairness, actually detecting unfair behaviors at finer        [14] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2016. Introduction to
level of granularity. Instead, the results of Ranking Facts do not           data mining. Pearson Education India.
contain information about the existing bias in subgroups or in          [15] Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre
                                                                             Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. FairTest: Dis-
minorities.                                                                  covering unwarranted associations in data-driven applications. In 2017 IEEE
   This is very important, because the discrimination might be               European Symposium on Security and Privacy (EuroS&P). IEEE, 401–416.
limited to some specific scenario (e.g. not all the women but