FAIR-DB: FunctionAl DependencIes to discoveR Data Bias Fabio Azzalini Chiara Criscuolo Letizia Tanca Politecnico di Milano Politecnico di Milano Politecnico di Milano Milan, Italy Milan, Italy Milan, Italy fabio.azzalini@polimi.it chiara.criscuolo@mail.polimi.it letizia.tanca@polimi.it ABSTRACT only if it conforms to high ethical standard[8] and, to avoid (possi- Computers and algorithms have become essential tools that per- bly unintentional) unethical behaviors and their consequences, vade all aspects of our daily lives; this technology is based on data cleaning tools should also include tools to discover bias in data and, for it to be reliable, we have to make sure that the data data, and technologies that accurately discover discrimination on which it is based on is fair and without bias. and bias to obtain fair databases are badly needed[13]. In this context, Fairness has become a relevant topic of dis- In this paper we present FAIR-DB, a framework that, by dis- cussion within the field of Data Science Ethics, and in general in covering and analyzing some special types of functional depen- Data Science. Today’s applications should therefore be associated dencies, is able to find unfair behaviors in a dataset and guide its with tools to discover bias in data, in order to avoid (possibly correction. unintentional) unethical behavior and consequences; as a result, A Functional Dependency (F D : X → Y ) is a class of database technologies that accurately discover discrimination and bias in integrity constraints that hold between two sets X and Y of at- databases are of paramount importance. tributes in a relation of a database. It specifies that the values of In this work we propose FAIR-DB (FunctionAl dependencIes to the attributes of X uniquely (or functionally) determine the values discoveR Data Bias), a novel solution to detect biases and discover of the attributes of Y . Looking at Table 1 we may spot the follow- discrimination in datasets, that exploits the notion of Functional ing FD: Education-Num → Education, meaning that the number Dependency, a particular type of constraint on the data. The of years already attended at school functionally determines the proposed solution is implemented as a framework that focuses school level. In a FD, X is called antecedent or left-hand-side on the mining of such dependencies, also proposing some new (LHS) while Y is called consequent, or right-hand-side (RHS). metrics for evaluating the bias found in the input dataset. Our The constraints that Functional Dependencies impose are of- tool can identify the attributes of the database that encompass ten too strict for real world datasets since they must hold for discrimination (e.g. gender, ethnicity or religion) and the ones all the values of the attribute sets X and Y. For this reason, re- that instead verify various fairness measures; moreover, based searchers have begun to study generalizations of FDs, called on special aspects of these metrics and the intrinsic nature of Relaxed Functional Dependencies[4], which relax one or more dependencies, the framework provides very precise information constraints of canonical FDs. about the groups treated unequally, obtaining more insights re- Among these, we consider Approximate Functional Depen- garding the bias present in dataset compared to other existing dencies (AFDs), that are uncertain FDs, i.e. they hold only on a tools. Finally, our system also suggests possible future steps, by subset of the tuples, and Conditional Functional Dependencies indicating the most appropriate (already existing) algorithms to (or CFDs), where conditions are used to specify the subset of correct the dataset on the basis of the computed results. tuples on which a dependency holds. In particular, a CFD is a pair X → Y , tp , where X and Y are  sets of attributes, X → Y is a standard functional dependency 1 INTRODUCTION and tp is a pattern tuple over the attributes in X and Y ; for each In recent years, fairness has become an important topic of in- A in X ∪ Y , tp [A] is a constant ‘a’ in dom(A), or an unnamed terest in the Data Science community. Indeed, computers and variable ‘_’. algorithms have made our lives efficient and easier, but among Looking at Table 1 we may have the CFD: Education, Income= the prices we risk to pay is the possible presence of discrimination ‘>50K’ → Native-Country, meaning that, for people whose in- and unfairness in the decisions we make with their support. come is higher than 50K, their education degree functionally A famous example of unfairness in a data science applica- determines their native country. tion regards the Propublica analysis of the COMPAS Recidivism With this type of dependencies we can spot specific concrete Algorithm 1 , a decision-making tool used by judges, probation patterns in the dataset, and thus we are able to analyze behaviors and parole officers to assess a criminal defendant’s likelihood of in correspondence to precise values. Combining the two kinds becoming a recidivist, where their study found that black defen- of relaxed dependencies we obtain the Approximate Conditional dants were far more likely than white defendants to be incorrectly Functional Dependencies (ACFDs), i.e., uncertain CFDs. judged to be at a higher risk of recidivism. In this work we use Approximate Conditional Functional De- Data Science technologies are based on data, and for them to pendencies (ACFDs) to detect biases and discover discrimination be reliable we have to make sure that the data we feed them are in the datasets subject to analysis, by recognizing cases where fair and without bias. As a consequence, in these particular ap- the value of a certain attribute (e.g. gender, ethnicity or religion) plications of data analysis, data can be considered of good quality frequently determines the value of another one (such as range of 1 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal- the proposed salary or social state). sentencing The paper is organized as follows. Section 2 contains the re- © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- lated work. Section 3 details the methodology. Section 4 presents ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) the experiments and, finally, Section 5 concludes the paper. on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) 2 STATE OF THE ART Even though Ranking Facts and our framework have a similar Most of the research done in the area of Data Science Ethics is objective (they both analyze a dataset to discover unfair behav- carried out by the Machine Learning community; in this context, iors), they achieve this goal using two very different strategies. researcher start from defining a notion of fairness in data, and Moreover, the output of the two systems is very different, our then apply it to solve the problem of learning from unfair data framework provides very precise indications of unfairness in- in a prediction task. This notion of fairness is based on the idea tended to be used in the correction of the dataset, while Ranking that data have to satisfy three requirements: Diversity, Coverage Facts employes data visualization tools to display the general and Stability. unfair behaviors found in the dataset. Three possible approaches can be adopted when trying to enforce fairness in a data analysis application: (i) preprocessing 3 METHODOLOGY techniques, i.e. procedures that before, the application of a pre- This section presents the details of the FAIR-DB framework. diction algorithm, make sure that the learning data are fair; (ii) inprocessing techniques, i.e. procedures that ensure that, during 3.1 Preliminary Notions and Framework the learning phase, the algorithm does not picks up the bias Overview present in the data, and (iii) postprocessing techniques, i.e. pro- We now introduce some fundamental notions that will accom- cedures that correct the algorithm’s decisions with the scope of pany us along our discussion. making them fair. Given a dataset D the support of a CFD X → Y , tp is defined  Most of the machine learning research concerns the second as the proportion of tuples t in the dataset D which contain tp , approach; recently, several open-source libraries have been de- that is: veloped to solve the “unfairness problem" during the learning phase of the prediction task. An example is FairML [1], which |t ∈ D; tp ⊆ t| provides an auditing tool for predictive models by quantifying Support(X → Y , tp ) = |D| the relative effects providing various different inputs to a model Confidence is an indication of how often the CFD has been and comparing its predictions. FairTest [15], on the other hand, found to be true. Let tp = (x ∪ y) where x is a tuple over X and y approaches the task of detecting bias by looking for correlations is a tuple over Y . The confidence value of a CFD X → Y , tp is  between predicted labels and protected attributes. the proportion of the tuples t containing x which also contain y: A project that provides the implementation of many inter- esting techniques is AI Fairness 360: An Extensible Toolkit for |t ∈ D; tp ⊆ t| Detecting, Understanding, and Mitigating Unwanted Algorithmic Confidence(X → Y , tp ) = |t ∈ D; x ⊆ t| Bias [2], this work presents a new open-source framework whose Figure 1 represents the workflow of FAIR-DB, composed by aim is to reach algorithmic fairness. The system tries to mitigate the following main steps: algorithmic bias by exploiting techniques such as: Reweighing [9], Optimized Preprocessing [3], Learning Fair representations [10] (1) Data Preparation and Exploration: in this phase we and Disparate Impact Remover [7]. import the data, perform (if needed) data integration and In the machine learning context, the majority of works that apply the typical data preprocessing steps (solve missing try to enforce fairness are related to a prediction task, and more values, apply discretization etc.) needed to clean and pre- specifically to classification algorithms in decision-making tools. pare the data. During this phase, we also might visualize The main difference between these approaches and our frame- the attribute features using different Data Visualization work, that solves unfairness adopting a preprocessing tecnique, techniques. is that our system does not need a classifier to work, because (2) ACFD Discovery and Filtering: in this phase we apply it is based on finding conditions (constraints) that are already an ACFD Discovery algorithm to extract Approximate Con- present in the data, even though possibly with some level of ditional Functional Dependencies from the dataset. This approximation. Furthermore, building a classifier to solve the algorithm takes as input the prepared dataset and three fairness problem requires the policy to be application-oriented, threshold parameters: minimum support, minimum con- greatly limiting the applicability of these systems to scenarios fidence and maximum antecedent size of the ACFDs that where other tasks are needed. we are searching. From the output of ACFD Discovery we A very interesting work on fairness in Machine Learning that discard the dependencies that contain variables. In this employs a preprocessing tecnique is Nutritional Labels for Data and phase, we also discard those some dependencies, keeping Models by Stoyanovich and Howe [12]. The authors developed only those that might reveal unfairness. an interpretability and transparency tool based on the concept (3) ACFDs Selection: for each ACFD, we compute some met- of Nutritional Labels, drawing an analogy to the food industry, rics capturing the "ethical level" of the dependency, in par- where simple and standardized labels convey information about ticular: (i) the Difference metric is a novel score introduced the ingredients and the production processes. Nutritional labels to discover dependencies that highlight the violation of are derived semi-automatically as part of the complex process that fairness, and (ii) for each protected attribute p we com- gave rise to the data or model they describe. The final system is pute its specific p-Difference. According to the values of called Ranking Facts, and automatically derives nutritional labels the metrics, we select the most interesting ACFDs. for ranking. Ranking Facts is a collection of visual widgets that are (4) ACFDs Ranking: we rank the ACFDs in descending or- based on stability, fairness and diversity concepts. In particular, der of importance, according to the three metrics. the Fairness widget quantifies whether the ranked output exhibit (5) ACFDs Selection and Scoring: from the ranking, the statistical parity (a particular definition of group fairness) with user selects the N ACFDs she perceives as the most prob- respect to one or more protected attributes. lematic, and then the system computes some metrics that summarize the level of unfairness of the dataset. values are very precise, but they do not give an overview over Data Preparation and Exploration: the attribute values. For instance, if we have a dependency that Data Preprocessing and Visualization contains a specific value of the attribute ‘Age’ (let us assume that it is 18) we do not know if the dependency could be valid also for values near to 18, like 20 or 16; furthermore, the depen- ACFDs Discovery and Filtering: dency could be not useful if involves only 18-years old people. Apply a ACFD discovery algorithm and For this reason, we suggest to use Discretization for numerical filter its result attributes that have many different values. A similar procedure is often needed also when a categorical attribute can assume a wide range of values. As a last step of this first phase, Data Visualization can be of ACFDs Selection: great help in analyzing the characteristics of the data; in fact, from According to a novel metric, select unethical ACFDs the plots, a user can understand whether groups are present in the dataset and more specifically, if there is a majority class for a certain attribute, and if present can identify minorities. Visualiza- tion and summary statistics are very important to find protected ACFDs Ranking: attributes and understand if the data contains minorities that Rank the ACFDs to facilitate user need to be analyzed more in details. interaction Example 3.2 (Data Preparation and Exploration). Before being able to gather useful insights from the running example, we have to ACFDs User Selection and Scoring: perform some preprocessing operations as Data Cleaning, Feature User selects N ACFDs, Selection and Discretization. We noticed that the majority of the presentation of the summarizing metrics missing values belong to attributes that are not relevant for our analysis (e.g. ‘Marital-Status’), we therefore decided to first per- Figure 1: Steps of the FAIR-DB framework form feature selection and then to remove the few tuples that still had missing values. Regarding feature selection we also removed interesting columns that were related to another one expressing In the next subsections we give a more detailed description of the same meaning; we noticed that two of the columns, ‘Educa- these phases, with the help of the following running example. tion’ and ‘Education-Num’, represented the same information. The Example 3.1 (Running Example). For a part of our experiments, latter can be obtained through a numerical encoding of the first we have used the U.S. Census Adult Dataset2 [5], containing in- one. To extract Functional Dependencies that do not depend on formation about many social factors of US adults, like ‘Income’, specific values of an attribute, it is useful to group into bins the ‘Age’, ‘Workclass’, ‘Education’, ‘Education-Num’ (i.e. the number values of attributes that are continuous. In particular we created of years already attended at school), ‘Marital-Status’, ‘Race’, ‘Sex’ five bins for ‘Hours-Per-Week’ attribute that are: ‘0-20’, ‘21-40’, (i.e. gender), (work) ‘Hours-Per-Week’, ‘Native-Country’, and some ‘41-60’, ‘61-80’, ‘81-100’. We concluded the analysis identifying the more. Table 1 contains examples of its tuples. protected attribute: ‘Sex’, ‘Race’ and ‘Native-Country’, and select- ing ‘Income’ as target variable. We decided to keep both ‘Race’ 3.2 Data Preparation and Exploration and ‘Native-Country’ attributes because they are correlated, but not in all cases, for example the offspring of migrants have the Data Preparation starts with Data Acquisition, when the user same race of their parents, but different native country because in gathers one or more datasets that she will use in the analysis. many cases they may be born in U.S.. To have readable and more After Data Acquisition, we compute Summary Statistics for each effective Functional Dependencies, we keep the attribute ‘Race’ as column of the dataset. This primary phase gives a general idea of is in the original dataset and we group the values of the attribute the dataset and during this step, we can hypothesize the protected ‘Native-Country’ using 4 different values: ‘NC-US’, ‘NC-Hispanic’, columns and identify, if present, the target variable. ‘NC-Non-US/Hispanic’ and ‘NC-Asian-Pacific’. We then perform Data Cleaning, since poor data quality could lead to wrong or incomplete results. Depending on the dataset we are considering, we may need to face several different error types. 3.3 ACFD Discovery and Filtering We continue with Data Integration, where, if present, multiple In this phase we extract the ACFDs from the dataset, using the datasets are merged into a single one. After Data Cleaning and ACFD Discovery algorithm of [11]. The algorithm expects as input Data Integration we may need to perform other steps: Feature the following three parameters: the (minimum) support threshold, Selection and Discretization [14]. the (minimum) confidence threshold, and the size of the largest Feature selection refers to choosing the set of features to use antecedent maxSize. that is appropriate for the subsequent analysis. The goal of feature Given an instance D of a schema R, support threshold δ , confi- selection is to come up with the smallest set of features that best dence threshold ϵ, and maximum antecedent size α, the approxi- captures the characteristics of the problem being addressed [14]. mate CFDs discovery problem is to find all ACFDs ϕ: X → Y , tp  Another important step is Discretization, performed by trans- over R with: forming data from numeric to nominal data type [14]. In our case discretization is particularly important, since Functional Depen- • support(ϕ, D) ≥ δ dencies built on numerical attributes that have many different • confidence(ϕ, D) ≥ ϵ 2 https://archive.ics.uci.edu/ml/datasets/Adult • |X | ≤ α. Age Workclass Education Education-Num Marital-Status Race Sex Hours-Per-Week Native-Country Income 0 90 ? HS-grad 9 Widowed White Female 40 United-States <=50K 1 82 Private HS-grad 9 Widowed White Female 18 United-States <=50K 2 66 ? Some-college 10 Widowed Black Female 40 United-States <=50K 3 54 Private 7th-8th 4 Divorced White Female 40 United-States <=50K Table 1: A few exemplar tuples of th U.S. Census Adult dataset (Education-Degree=‘Middle-school’) → Income=‘≤ 50K’ 3.4 ACFDs Selection (Age-Range=‘15-30’) → income=‘≤ 50K’ This phase is responsible for finding the dependencies that actu- (Education-Degree, Income=‘≤ 50K’) → Native-Country ally reveal unfairness in the dataset, in fact, even if the ACFDs (Native-Country=‘NC-Hispanic’) → Income=‘≤ 50K’ identified in the previous step contain protected attributes they (Income=‘≤ 50K’) → Native-Country=‘NC-US’ do not all necessarily show some unethical behavior. Table 2: CFD Discovery output To do so we have devised two unfairness measures: • Difference: it indicates how much a dependency is ‘uneth- ical’. The more this metric is high, the more the ACFD The ACFDs obtained are in this form: reveals an unfair behavior. • ProtectedAttributeDifference: it indicates how much the dependency shows bias with respect to a specific protected (lhsAttr 1 = v1, ..., lhsAttr N = vN ) → (rhsAttr = v) attribute. The algorithm returns all the dependencies that satisfy the In order to assess the unfair behavior of a dependency, we also aforementioned criteria. The discovered dependencies may suffer take into consideration its support, that indicates the pervasive- from the following shortcomings: ness of the ACFD; unethical dependencies with high support will • It can happen that a dependency does not involve any pro- impact many tuples, and thus will be more important. tected attribute, nor the target attribute, and thus it is use- For each dependency ϕ, we define the Difference metric of ϕ less for the detection of unfair behavior. as the difference between the dependency confidence and the • Some of the dependencies may be ACFDs that contain confidence of the dependency computed without the protected variables; in this case the dependency holds for all the attributes of the LHS of the ACFD. values of each non-constant attribute in its LHS. In our Given a dependency in the form: application this type of dependency does not highlight a significant difference corresponding to some specific ϕ : (X → Y , tp ) values of the data, and thus is useless for our aims. Let Z = (X −{ProtectedAttributes}), that is the LHS of the depen- dency without its protected attributes, and let z be the restriction Example 3.3 (CFD Discovery). The algorithm, applied to the of t to Z . We define the Difference as: dataset resulting from the previous phase, finds 118 ACFDs. Table 2 reports a few of them. We can easily detect the ACFDs that contain Difference(ϕ) = Confidence(ϕ)−NoProtectedAttributeConfidence(ϕ) variables, for example dependency number ϕ 3 : (Education-Degree, where Income = ‘>50K’) →(Native-Country), does not specify the values |t ∈ D; tp ⊆ t| of the attributes ‘Education-Degree’ and ‘Native-Country’. As the NoProtectedAttributeConfidence(ϕ) = user can notice, there are also dependencies that should be removed |t ∈ D; z ⊆ t| from the list because they do not contain any protected attribute. That is: |t ∈ D; tp ⊆ t| |t ∈ D; tp ⊆ t| Now we filter the dependencies, discarding the ones that do Difference(ϕ) = − not satisfy the following two constraints: |t ∈ D; x ⊆ t| |t ∈ D; z ⊆ t| • all the attributes of the dependency must be assigned to a The Difference metric gives us an idea of how much the values value (this type of ACFDs that consists of only constants of the protected attributes influence the value of Y . in both its LHS and RHS is called Constant ACFDs[6]); Example 3.5 (Difference score of a dependency). Analyzing the • at least one protected attribute and the target variable have following dependency: to be present inside the dependency, so that the ACFDs ϕ 1 : (Sex = ‘Female ′, Workclass = ‘Private ′ ) → (Income = ‘ ≤ 50K ′ ) might show bias. During the ACFDs Filtering phase, the user can also add some We can compute the Difference as: constraints, such as deciding which values must appear in the Diff (ϕ 1 ) : Conf (ϕ 1 ) − NoProtAttrConf (ϕ 1′ ) dependencies, for instance requiring that all the ACFDs must where involve only the ‘Females’ because the researcher is interested only in peculiar aspects of women, or deciding which values must ϕ 1′ : (Workclass = ‘Private ′ ) → (Income = ‘ ≤ 50K ′ ) not appear in the dependencies. Three different behaviors can emerge: Example 3.4 (ACFDs Filtering). From Table 2 we discarded the • If the Difference is close to zero, fairness is respected since it third dependency for not being a Constant ACFD and the first three means that females are treated equally to all the elements of dependencies for not containing any protected attribute. After this the population that have the same characteristics (without phase we are left with 84 of the original 118 dependencies. specifying the protected attribute). ACFD Support Difference RaceDiff SexDiff NativeCountryDiff (Native-Country = ‘NC-Hispanic’) → (Income = ‘≤50K’) 0.0439 0.1570 0 0 0.1570 (Sex = ‘Female’) → (Income = ‘≤50K’) 0.2874 0.1352 0 0.1352 0 (Race = ‘Black’) → (Income = ‘≤50K’) 0.0813 0.1190 0.1190 0 0 Table 3: A few of the selected ACFDs using the Difference metric • If the Difference is positive it means that the women that • Cumulative Support: is the percentage of tuples in the work privately and gain less than 50K dollars/year are over- dataset involved by the selected ACFDs. The more this all treated worse than the generality of people that work value is close to 1, the more tuples are impacted by unfair privately and gain less than 50K dollars/year. dependencies. • If the Difference is negative the opposite situation is detected. • Difference Mean: is the mean of all the ‘Difference’ scores of the selected ACFDs. It indicates how much the dataset A dependency could contain in the LHS more than one pro- is unethical according to the dependencies selected. The tected attribute at the same time. For this reason, we introduce the greater the value, the higher the bias in the dataset. last metric: the Protected Attribute Difference (P-Difference), which • Protected Attribute Difference Mean: for each protected is very similar to the Difference measure but is computed sepa- attribute P, we report the mean of its P-Difference over all rately, excluding one protected attribute P at a time (Z = X − P). the selected ACFDs. It indicates how much the dataset is Finally, we choose the ACFDs whose Difference is above the ethical over P according to the selected dependencies. minThreshold, which means that there is a significant inequal- ity between the group involved in the ACFD and the general These summarizing metrics entirely depend on the specific behaviour of the population. ACFDs selected by the user, thus by selecting different sets of dependencies, the user can highlight different aspects of the Example 3.6 (Selected ACFDs). Table 3 reports three of the dataset. seventeen dependencies that satisfied the selection criteria along Note also that the framework allows the user to see some with their relevant metrics. From the example, “Hispanic”, “Female” exemplar tuples impacted by the selected ACFDs. and “Black” groups suffer from discrimination with respect to the rest of the population, in fact, people that belong to one or more of Example 3.7 (ACFDs User Selection and Scoring). The user these groups have an income that is below the 50000 dollars/year chooses N = 15 ACFDs that are interesting according to her needs because of their nationality, sex or race. among the dependencies obtained after the ranking step. The total number of tuples involved by the ACFDs is 13296 while the total 3.5 ACFDs Ranking number of tuples in the dataset is 30169; this results in a Cumula- tive Support of 0.44. The Difference Mean is 0.16. These two scores In a real-world dataset, the number of ACFDs selected in the indicate that a considerable number of tuples, 44%, show a behavior previous step could be very large, even in the order of thousands, that is very different, on average 16%, from the fair one. Finally, the therefore for the user to look at all these dependencies would P-Difference Mean metrics confirm that the dataset is unfair with re- be a very demanding task to complete. Thus, it is necessary to spect to all the protected attributes; the groups more discriminated order the dependencies according to some criterion, enabling the are: ‘Female’, ‘Black’, ‘NC-Hispanic’ and ‘Amer-Indian-Eskimo’. user to analyze the most important and interesting ones first, Table 4 reports a few interesting ACFDs. speeding up the process and reducing the cost. In our framework the user can order the dependencies accord- ing to one of the following criteria: (Sex = ‘Female’) → Income = ‘≤ 50K’ (Race = ‘Black’) → Income = ‘≤ 50K’ • Support-based: the support indicates the proportion of tu- (Race = ‘Amer-Indian-Eskimo’) → Income = ‘≤ 50K’ ples impacted by the dependency: the higher the support, (Native-Country = ‘NC-Hispanic’) → Income = ‘≤ 50K’ the more tuples are involved by the ACFD. Ordering de- pendencies by support highlights the pervasiveness of the Table 4: A few user-selected dependencies from the U.S. dependency. Census Adult Income Dataset • Difference-based: this criterion highlights the dependen- cies where the values of the protected attributes influence most the value of their RHS, therefore this ordering privi- 4 EXPERIMENTAL RESULTS leges the unethical aspect of the dependencies. We now present the results obtained by FAIR-DB on two real- • Mean-based: this method tries to combine both aspects world datasets, and a comparison between our framework and of a dependency: the unethical perspective and its perva- the Ranking Facts system[12]. siveness. Sorting the ACFDs using this criterion results in positioning first the dependencies that have the best 4.1 Datasets trade-off between difference and support. The first dataset we considered is the U.S. Census Adult Income Dataset, already briefly presented in Example 3.1. The version we 3.6 ACFDs User Selection and Scoring considered contains 32561 tuples and 13 attributes, 5 of which In this last phase the user selects from the ranked list N depen- are numerical and 8 are categorical. dencies that are interesting for the research needs. Using only The second dataset we considered is the Titanic Dataset, con- the N selected ACFDs, the framework computes a set of scores taining information of passengers on the Titanic, a British pas- that summarize the properties of the entire dataset: senger liner operated by the ‘White Star Line’ that sank in the North Atlantic Ocean in the early morning hours of 15 April only the black women working in the private sector), and this 1912. Of the estimated 2,224 passengers and crew aboard, more information is very useful to guide the phase of DB repair. than 1,500 died, making the sinking one of modern history’s The importance of analyzing more attributes simultaneously deadliest peacetime commercial marine disasters. The version it is even more clear if we analyze the ACFDs in Table 5; even we considered contains 891 samples and 12 attributes, 8 of which though, overall, women obtain a better treatment (both Ranking are categorical and 4 are numerical. Specifically the following Facts and our framework, for the ‘Sex’ attribute found discrimi- attributes are present: ‘PassengerId’, ‘Survived’, ‘Pclass’, ‘Name’, nation only against men), by analyzing the dependencies we can ‘Sex’, ‘Age’, ‘Ticket’, ‘Fare’, ‘Cabin’ and a few more. We chose see that if a woman died she most probably was a third-class pas- ‘Sex’ and ‘Pclass’ (the passenger class, that can be first, second or senger, and if a man survived he most probably was a first-class third) as the protected attributes, and selected ‘Survived’ as the passenger. target variable. 5 CONCLUSION AND FUTURE WORKS 4.2 FAIR-DB Results We presented FAIR-DB, a novel framework, that, through the We recall from Example 3.7 the results of FAIR-DB on the U.S. extraction of a particular type of Functional Dependencies, can Census Adult Income Dataset, which scores a Cumulative Support discover bias and discrimination present in a datasets. of 0.44 and a Difference Mean of 0.16, indicating that many tuples Future works will include:, (i) the addition to the system of a show an unfair behavior. Specifically, the dataset highlights bias dependency repair phase, that, starting from the selected ACFDs towards all the protected attributes: ‘Sex’, ‘Race’ and ‘Native- will correct the dataset removing all the unfair behaviors from it Country’; a deeper analysis of the dependencies confirms that the (ii) the study of dependencies with high confidence and low sup- most discriminated groups are: ‘Female’, ‘Black’, ‘NC-Hispanic’ port to highlight interesting, not necessarily frequent, behaviors, and ‘Amer-Indian-Eskimo’. (iii) the development of a graphical user interface to facilitate The Titanic Dataset scores a Cumulative Support of 0.74 and the interaction of the user with the system, (iv) the study of a Difference Mean of 0.35; indicating that many tuples are sub- other (possibly interesting) classes of functional dependencies[4], jected to an unfair behavior. Specifically, the dataset shows bias (v) a deeper comparison with Ranking Facts and other similar towards the protected attributes ‘Sex’ and ‘Pclass’. Indeed the methods. most discriminated groups are: ‘Third class passenger’ and ‘Male’, showing that (i) in the third class, to which more than half of REFERENCES the passengers belong, the majority of people did not survive, [1] Julius A Adebayo et al. 2016. FairML: ToolBox for diagnosing bias in predictive modeling. Ph.D. Dissertation. Massachusetts Institute of Technology. and (ii) the policy ‘women and children first’ was adopted for the [2] Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie evacuation into the lifeboats. Table 5 reports the corresponding Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, A Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting ACFDs. and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4–1. [3] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan (Pclass = 1, Sex = Female) → Survived = 1 Ramamurthy, and Kush R Varshney. 2017. Optimized pre-processing for dis- (Pclass = 2, Sex = Female) → Survived = 1 crimination prevention. In Advances in Neural Information Processing Systems. (Survived = 0, Sex = Female) → (Pclass = 3) 3992–4001. [4] Loredana Caruccio, Vincenzo Deufemia, and Giuseppe Polese. 2015. Relaxed (Pclass = 3, Sex = Male) → (Survived = 0) functional dependencies—a survey of approaches. IEEE Transactions on Knowl- (Survived = 1, Sex = Male) → (Pclass = 1) edge and Data Engineering 28, 1 (2015), 147–165. [5] Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http: Table 5: A few user-selected dependencies from the Ti- //archive.ics.uci.edu/ml tanic Dataset [6] Wenfei Fan, Floris Geerts, Jianzhong Li, and Ming Xiong. 2010. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23, 5 (2010), 683–698. [7] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. 4.3 Comparison with Ranking Facts In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268. The results obtained with Ranking Facts[12] are in complete [8] Donatella Firmani, Letizia Tanca, and Riccardo Torlone. 2019. Ethical Dimen- sions for Data Quality. Journal of Data and Information Quality (JDIQ) 12, 1 accordance with the ones obtained with our framework. Regard- (2019), 1–5. ing the first dataset, Ranking Facts finds unfair behaviors across [9] Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for all the three protected attributes with discrimination against: classification without discrimination. Knowledge and Information Systems 33, 1 (2012), 1–33. ‘Female’, ‘Black’, ‘NC-Hispanic’ and ‘Amer-Indian-Eskimo’. For [10] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. what concerns the Titanic datasets, Ranking Facts detects unfair Learning adversarially fair and transferable representations. arXiv preprint behaviors on both the protected attributes with discrimination arXiv:1802.06309. [11] Joeri Rammelaere and Floris Geerts. 2018. Revisiting conditional functional against: ‘Male’ and ‘Third class passenger’. A deeper comparison dependency discovery: Splitting the “C” from the “FD”. In Joint European Con- with Ranking Facts will be included in an extension of this work. ference on Machine Learning and Knowledge Discovery in Databases. Springer, 552–568. Ranking Facts checks fairness only for one attribute at the [12] Julia Stoyanovich and Bill Howe. 2019. Nutritional Labels for Data and Models. time, while, since the ACFD technique can involve more than IEEE Data Eng. Bull. 42, 3 (2019), 13–23. one attribute at a time, our tool can report information about [13] Julia Stoyanovich, Bill Howe, and HV Jagadish. 2020. Responsible data man- agement. Proceedings of the VLDB Endowment 13, 12 (2020), 3474–3488. subgroups fairness, actually detecting unfair behaviors at finer [14] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2016. Introduction to level of granularity. Instead, the results of Ranking Facts do not data mining. Pearson Education India. contain information about the existing bias in subgroups or in [15] Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. FairTest: Dis- minorities. covering unwarranted associations in data-driven applications. In 2017 IEEE This is very important, because the discrimination might be European Symposium on Security and Privacy (EuroS&P). IEEE, 401–416. limited to some specific scenario (e.g. not all the women but