<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A short account of FAIR-DB: a system to discover Data Bias</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fabio</forename><surname>Azzalini</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Politecnico di Milano</orgName>
								<orgName type="department" key="dep2">Dipartimento di Elettronica</orgName>
								<orgName type="department" key="dep3">Informazione e Bioingegneria</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Human Technopole -Center for Analysis, Decisions and Society</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chiara</forename><surname>Criscuolo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Politecnico di Milano</orgName>
								<orgName type="department" key="dep2">Dipartimento di Elettronica</orgName>
								<orgName type="department" key="dep3">Informazione e Bioingegneria</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Letizia</forename><surname>Tanca</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Politecnico di Milano</orgName>
								<orgName type="department" key="dep2">Dipartimento di Elettronica</orgName>
								<orgName type="department" key="dep3">Informazione e Bioingegneria</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A short account of FAIR-DB: a system to discover Data Bias</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">22A1E14A14BDB963A26FC97825F23346</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Fairness</term>
					<term>Data Bias</term>
					<term>Functional Dependencies</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Computers and algorithms are increasingly pervading our daily lives, therefore to trust these systems we have to make sure that the data they use are fair and without bias. As a result, Fairness has become a relevant topic of discussion within the field of Data Science, and technologies that accurately discover discrimination and bias present in datasets are of paramount importance.</p><p>In this work we present FAIR-DB (FunctionAl dependencIes to discoveR Data Bias), a novel framework to detect biases and discover discrimination in datasets. By exploiting various kinds of functional dependencies, our tool can identify those attributes in a database that encompass discrimination (e.g. gender, ethnicity or religion) and the ones that instead satisfy various fairness criteria.</p><p>We compared our framework with two state-of-the-art systems for detecting unfairness in datasets, obtaining overall similar results on a real-world dataset; specifically, the comparison highlighted that FAIR-DB not only provides very precise information about the groups treated unequally, but also that, in comparison with other existing tools, may obtain more insights regarding the bias present in datasets</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In the recent years fairness has become an important topic of interest in the Data Science community. Indeed, computers and algorithms have made our lives efficient and easier, but among the prices we risk to pay is the possible presence of discrimination and unfairness in the decisions we make with their support 1 . Data Science technologies are based on data, and for them to be reliable we have to make sure that the data we feed them are fair and without bias: data can be considered of good quality only if they conform to high ethical standard <ref type="bibr" target="#b0">[1]</ref>: to avoid (possibly unintentional) unethical behaviors and their consequences, data cleaning tools should also include tools to discover bias in data <ref type="bibr" target="#b1">[2]</ref>. In this paper we present FAIR-DB <ref type="bibr" target="#b2">[3]</ref>, a framework that, by discovering and analyzing particular types of functional dependencies, can find unfair behaviors in a dataset and guide its correction. A Functional Dependency (𝐹 𝐷 ∶ 𝑋 → 𝑌) is a class of database integrity constraints that hold SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV), Italy Envelope fabio.azzalini@polimi.it (F. Azzalini); chiara.criscuolo@polimi.it (C. Criscuolo); letizia.tanca@polimi.it (L. Tanca) Orcid 0000-0003-0631-2120 (F. Azzalini); 0000-0003-2607-3171 (L. Tanca) between two sets 𝑋 and 𝑌 of attributes in a relation of a database. It specifies that the values of the attributes of 𝑋 uniquely (or functionally) determine the values of the attributes of 𝑌 <ref type="bibr" target="#b3">[4]</ref>. In a FD, X is called antecedent or left-hand-side (LHS) while Y is called consequent, or right-hand-side (RHS). In Conditional Functional Dependencies (or CFDs) <ref type="bibr" target="#b4">[5]</ref>, conditions are used to specify the subset of tuples on which a dependency holds: a CFD is a pair (𝑋 → 𝑌 , 𝑡 𝑝 ), where 𝑋 → 𝑌 is a standard functional dependency and 𝑡 𝑝 is a pattern tuple over the attributes in 𝑋 and 𝑌; for each 𝐴 in 𝑋 ∪ 𝑌, 𝑡 𝑝 [𝐴] is a constant 'a' in dom(A), or an unnamed variable '_'. In this paper we consider Approximate Conditional Functional Dependencies (ACFDs), i.e., uncertain CFDs that hold only on a subset of the tuples, to detect biases and discover discrimination in the datasets subject to analysis, by recognizing cases where the value of a certain attribute (e.g. gender, ethnicity or religion) frequently determines the value of another one (such as range of the proposed salary or social state).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">State of the Art</head><p>Most of the research done in the area of Data Science Ethics is carried out by the Machine Learning community; three possible approaches can be adopted when trying to enforce fairness in a data analysis application: (i) preprocessing techniques, i.e. procedures that, before the application of a prediction algorithm, make sure that the learning data are fair; (ii) inprocessing techniques, i.e. procedures that ensure that, during the learning phase, the algorithm does not pick up the bias present in the data, and (iii) postprocessing techniques, i.e. procedures that correct the algorithm's decisions with the scope of making them fair. One of the first preprocessing techniques was presented by Pedreschi et al. <ref type="bibr" target="#b5">[6]</ref>: exploiting the concept of association rules and custom metrics, the system can identify potentially discriminatory itemsets. A project that provides the implementation of many interesting techniques is AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias <ref type="bibr" target="#b6">[7]</ref>, this work presents a new open-source framework whose aim is to reach algorithmic fairness. The system tries to mitigate algorithmic bias by exploiting techniques that use statistical measures to compute fairness. In the Machine Learning context, the majority of works that try to enforce fairness are related to a prediction task, and more specifically to classification algorithms in decision-making tools <ref type="bibr" target="#b7">[8]</ref>, <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b6">[7]</ref>. The main difference between these approaches and our framework, that solves unfairness adopting a preprocessing technique, is that our system does not need a classifier to work, because it is based on finding conditions (in form of approximate constraints) that are already present in the data, even though possibly with some level of approximation. Furthermore, building a classifier to solve the fairness problem requires the policy to be application-oriented, greatly limiting the applicability of these systems to scenarios where other tasks are needed. A very interesting work on fairness in Machine Learning that employs a preprocessing technique is Nutritional Labels for Data and Models by Stoyanovich and Howe <ref type="bibr" target="#b9">[10]</ref>. The authors developed an interpretability and transparency tool based on the concept of Nutritional Labels, drawing an analogy to the food industry, where simple and standardized labels convey information about the ingredients and the production processes. Nutritional labels are derived semi-automatically as part of the complex process that gave rise to the data or model they describe. The final system is called Ranking Facts, and automatically derives nutritional labels for ranking. Ranking Facts is a collection of visual widgets, in particular, the Fairness widget quantifies whether the ranked output exhibit statistical parity (a particular definition of group fairness) with respect to one or more protected attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>Before presenting the methodology we first introduce some fundamental notions that will accompany us along our discussion. Given a dataset D, the support of a CFD (𝑋 → 𝑌 , 𝑡 𝑝 ) is defined as the proportion of tuples t in the dataset D which contain 𝑡 𝑝 , that is:</p><formula xml:id="formula_0">Support(X → Y , t p ) = |t ∈ D; t p ⊆ t| |D|</formula><p>The confidence is an indication of how often the CFD has been found to be true. Let</p><formula xml:id="formula_1">𝑡 𝑝 = (𝑥 ∪ 𝑦)</formula><p>where 𝑥 is a tuple over 𝑋 and 𝑦 is a tuple over 𝑌. The confidence value of a CFD (𝑋 → 𝑌 , 𝑡 𝑝 ) is the proportion of the tuples t containing 𝑥 which also contain 𝑦: In the next subsections we present each phase in detail, with the help of the following example:</p><formula xml:id="formula_2">Confidence(X → Y , t p ) = |t ∈ D; t p ⊆ t| |t ∈ D; x ⊆ t|</formula><p>Example 1. Running Example For our experiments, we have used the U.S. Census Adult Dataset<ref type="foot" target="#foot_0">2</ref>  <ref type="bibr" target="#b10">[11]</ref>, containing information about many social factors of US adults, like 'Income', 'Age', 'Workclass', 'Education', 'Education-Num' (i.e. the number of years already attended at school), 'Marital-Status', 'Race', 'Sex', (work) 'Hours-Per-Week', 'Native-Country', and some more.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data Preparation and Exploration</head><p>In this phase we import the data, perform (if needed) data integration and apply the typical data preprocessing steps (solve missing values, apply discretization etc.) needed to clean and prepare the data. As a last step of this first phase, Data Visualization can be of great help in analyzing the characteristics of the data; in fact, from the plots, a user can understand whether groups are present in the dataset and more specifically, if there is a majority class for a certain attribute, and if present can identify minorities. This preliminary phase gives a general idea of the dataset, and during this step we can guesstimate the protected columns and identify, if present, the target variable of our study. Regarding the running example, we selected 'Sex', 'Race' and 'Native-Country' as protected attributes and 'Income' as target variable. A few of the selected ACFDs using the Difference metric</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">ACFD Discovery and Filtering</head><p>In this phase we extract the ACFDs from the dataset, using the algorithm of <ref type="bibr" target="#b11">[12]</ref>. The algorithm expects as input the following three parameters: the (minimum) support threshold, the (minimum) confidence threshold, and the size of the largest antecedent maxSize.</p><p>Given an instance D of a schema R, support threshold 𝛿, confidence threshold 𝜖, and maximum antecedent size 𝛼, the approximate CFDs discovery problem is to find all ACFDs 𝜙: (𝑋 → 𝑌 , 𝑡 𝑝 ) over R with: support(𝜙, 𝐷) ≥ 𝛿, confidence(𝜙, 𝐷) ≥ 𝜖 and |𝑋 | ≤ 𝛼. The ACFDs obtained are in this form: (𝑙ℎ𝑠𝐴𝑡𝑡𝑟1 = 𝑣1, ..., 𝑙ℎ𝑠𝐴𝑡𝑡𝑟𝑁 = 𝑣𝑁 ) → (𝑟ℎ𝑠𝐴𝑡𝑡𝑟 = 𝑣).</p><p>The algorithm returns all the dependencies that satisfy the constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 2. CFD Discovery</head><p>The algorithm, applied to the dataset resulting from the previous phase, finds 118 ACFDs. Table <ref type="table">1</ref> reports a few of them. Now we filter the dependencies, discarding the ones that do not satisfy two constraints:</p><p>• all the attributes must be assigned a value (the ACFDs that contains only constants in both the LHS and RHS are called Constant ACFDs <ref type="bibr" target="#b12">[13]</ref>); • at least one protected attribute and the target variable have to be present inside the dependency, so that the ACFDs might show bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 3. ACFDs Filtering</head><p>From Table <ref type="table">1</ref> we discarded the third dependency for not being a Constant ACFD and the first three dependencies for not containing any protected attribute. After this phase we are left with 84 of the original 118 dependencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">ACFD Selection</head><p>This phase is responsible for finding the dependencies that actually reveal unfairness in the dataset. To do so, we have devised an unfairness measure, called Difference, that indicates how much a dependency is 'unethical'. The higher the difference, the more the ACFD reveals an unfair behavior. In order to assess the unfair behavior of a dependency, we also take into consideration its support, that indicates the pervasiveness of the ACFD; unethical dependencies with high support will impact many tuples, and thus will be more important. We define the Difference of a dependency 𝜙 as the difference between the confidence of 𝜙 and the confidence of the dependency computed without the protected attributes of the LHS of the ACFD. Let 𝜙 ∶ (𝑋 → 𝑌 , 𝑡 𝑝 ) be an ACFD, and 𝑍 = (𝑋 − {𝑃𝑟𝑜𝑡𝑒𝑐𝑡𝑒𝑑𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠}), that is, the LHS of the dependency without its protected attributes, and let 𝑧 be the restriction of 𝑡 to 𝑍. Then: </p><formula xml:id="formula_3">𝜙 1 ∶ (Sex = ‵ Female ′ , Workclass = ‵ Private ′ ) → (Income = ‵ ≤ 50K ′ ) we can compute the Difference as: Diff(𝜙 1 ) ∶ 𝐶𝑜𝑛𝑓 (𝜙 1 ) − Conf(𝜙 ′ 1 ), where 𝜙 ′ 1 ∶ (Workclass = ‵ Private ′ ) → (Income = ‵ ≤ 50K ′ ).</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Three different behaviors can emerge: • If the Difference is close to zero, fairness is respected since it means that females are treated equally to all the elements of the population that have the same characteristics (without specifying the protected attribute). • If the Difference is positive it means that the women that work privately and gain less than</head><p>50K dollars/year are overall treated worse than the generality of people that work privately and gain less than 50K dollars/year. • If the Difference is negative the opposite situation is detected.</p><p>Finally, we choose the ACFDs whose Difference is above the minThreshold, which means that there is a significant inequality between the group involved in the ACFD and the general behaviour of the population. <ref type="table">2</ref> reports three of the seventeen dependencies that satisfied the selection criteria along with their relevant metrics. From the example, "Hispanic", "Female" and "Black" groups suffer from discrimination with respect to the rest of the population, in fact, people that belong to one or more of these groups have an income that is below the 50000 dollars/year because of their nationality, sex or race.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 5. Selected ACFDs Table</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">ACFD Ranking</head><p>In a real-world dataset, the number of ACFDs selected in the previous step could be very large, therefore for the user to look at all these dependencies would be a very demanding task to complete. Thus, it is necessary to order the dependencies according to some criterion, enabling the user to analyze the most important and interesting ones first, speeding up the process and reducing the cost. In our framework the user can order the dependencies according to one of the following criteria:</p><p>• Support-based: the support indicates the proportion of tuples impacted by the dependency; the higher the support, the more tuples are involved by the ACFD. Ordering dependencies by support highlights the pervasiveness of the dependency. • Difference-based: this criterion highlights the dependencies where the values of the protected attributes influence most the value of their RHS, therefore this ordering privileges the unethical aspect of the dependencies. • Mean-based: this method tries to combine both aspects of a dependency: the unethical perspective and its pervasiveness. Sorting the ACFDs using this criterion results in positioning first the dependencies that have the best trade-off between difference and support.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">ACFD User Selection and Scoring</head><p>In this last phase the user selects from the ranked list N dependencies that are interesting for the research needs. Using only the N selected ACFDs, the framework computes two scores that summarize the properties of the entire dataset:</p><p>• Cumulative Support: is the percentage of tuples in the dataset involved by the selected ACFDs. The more this value is close to 1, the more tuples are impacted by unfair dependencies. • Difference Mean: is the mean of all the 'Difference' scores of the selected ACFDs. It indicates how much the dataset is unethical according to the dependencies selected. The greater the value, the higher the bias in the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 6. ACFDs User Selection and Scoring</head><p>The user chooses N = 15 ACFDs that are interesting according to her needs among the dependencies obtained after the ranking step. The total number of tuples involved by the ACFDs is 13296 while the total number of tuples in the dataset is 30169; this results in a Cumulative Support of 0.44. The Difference Mean is 0.16. These two scores indicate that a considerable number of tuples, 44%, show a behavior that is very different, on average 16%, from the fair one. Finally, a deeper analysis of the dependencies confirm that the dataset is unfair with respect to all the protected attributes; the groups more discriminated are: 'Female', 'Black', 'NC-Hispanic' and 'Amer-Indian-Eskimo'. Table <ref type="table">3</ref> reports a few interesting ACFDs.</p><p>(Sex = 'Female') → Income = '≤ 50K' (Race = 'Black') → Income = '≤ 50K' (Race = 'Amer-Indian-Eskimo') → Income = '≤ 50K' (Native-Country = 'NC-Hispanic') → Income = '≤ 50K' Table <ref type="table">3</ref> A few user-selected dependencies from the U.S. Census Adult Income Dataset</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Results</head><p>We now present the results obtained by FAIR-DB on a real-world dataset, and a comparison between our framework and the competitors presented in the Introduction chapter. The dataset we considered is the U.S. Census Adult Income Dataset, already briefly presented in Example 1. The version we considered contains 32561 tuples and 13 attributes, 5 of which are numerical and 8 are categorical.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">FAIR-DB Results</head><p>We recall from Example 6 the results of FAIR-DB on the U.S. Census Adult Income Dataset, which scores a Cumulative Support of 0.44 and a Difference Mean of 0.16, indicating that many tuples show an unfair behavior. Specifically, the dataset highlights bias towards all the protected attributes: 'Sex', 'Race' and 'Native-Country'; a deeper analysis confirms that the most discriminated groups are: 'Female', 'Black', 'NC-Hispanic' and 'Amer-Indian-Eskimo'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Comparison with competitors</head><p>The results obtained by Ranking Facts <ref type="bibr" target="#b9">[10]</ref> and with AI Fairness 360 <ref type="bibr" target="#b6">[7]</ref> are in complete accordance with ours. Ranking Facts finds in the U.S. Census Adult Income Dataset unfair behaviors across all the three protected attributes with discrimination against: 'Female', 'Black', 'NC-Hispanic' and 'Amer-Indian-Eskimo'. AI Fairness 360 analyzed fairness through statistical measures computed for two protected attributes: 'Race' and 'Sex' checking fairness only for one binary attribute at the time. For the most of statistical measures, the 'Race' attribute has a privileged group composed by 'White' people and a discriminated one composed by 'Not White' people, the 'Sex' attribute has a privileged group composed by 'Male' and a discriminated one composed by 'Female'. The two competitors check fairness property only for one binary attribute at the time, while, since the ACFD technique can involve more than one attribute at a time, our tool can report information about subgroups fairness, actually detecting unfair behaviors at finer level of granularity; as a results Ranking Facts and AI Fairness 360 do not contain information about the existing bias in minorities. This is very important, because the discrimination might be limited to some specific scenario (e.g. not all the women but only the black women working in the private sector), and this information is very useful to guide the phase of DB repair. Finally, we present a theoretical comparison between FAIR-DB and the work by Pedreschi et al. <ref type="bibr" target="#b5">[6]</ref>. Their system extracts association rules that signal potentially discriminatory itemsets. The process exploits a specific type of association rules that constrains the target class to be only on the RHS, as a result, the potentially discriminatory itemset can be only on the LHS; in our approach the ACFDs found do not have this constraint. Finally, <ref type="bibr" target="#b5">[6]</ref> does not involve user interaction, therefore the user cannot discard the rules that are not interesting for the specific investigation. Moreover, FAIR-DB provides as a final step a set of summarizing metrics that describes the overall degree of unfairness of the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Works</head><p>We presented FAIR-DB, a novel framework, that, through the extraction of a particular type of Functional Dependencies, can discover bias and discrimination present in a datasets. Future works will include:, (i) the addition to the system of a dependency repair phase, that, starting from the selected ACFDs will correct the dataset removing all the unfair behaviors from it (ii) the study of dependencies with high confidence and low support to highlight interesting, not necessarily frequent, behaviors, (iii) the development of a graphical user interface to facilitate the interaction of the user with the system, (iv) the study of other classes of functional dependencies <ref type="bibr" target="#b4">[5]</ref>, (v) a deeper comparison with Ranking Facts and other similar methods.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>FAIR-DB is</head><label></label><figDesc>composed by the following main steps: Data Preparation and Exploration, ACFD Discovery and Filtering, ACFDs Selection, ACFDs Ranking and ACFDs User Selection and Scoring.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>(</head><label></label><figDesc>Education-Degree='Middle-school') → Income='≤ 50K' (Age-Range='15-30') → Income='≤ 50K' (Education-Degree, Income='≤ 50K') → Native-Country (Native-Country='NC-Hispanic') → Income='≤ 50K' (Income='≤ 50K') → Native-Country='NC-US' Table 1 CFD Discovery output ACFD Support Difference (Native-Country = 'NC-Hispanic') → (Income = '≤50K') 0.0439 0.1570 (Sex = 'Female') → (Income = '≤50K') 0.2874 0.1352 (Race = 'Black') → (Income = '≤50K')</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Example 4 .</head><label>4</label><figDesc>Difference(𝜙) = Confidence(𝜙) − NoProtectedAttributeConfidence(𝜙) where NoProtectedAttributeConfidence(𝜙) = |𝑡 ∈ 𝐷; 𝑡 𝑝 ⊆ 𝑡| |𝑡 ∈ 𝐷; 𝑧 ⊆ 𝑡| That is: Difference(𝜙) = |𝑡 ∈ 𝐷; 𝑡 𝑝 ⊆ 𝑡| |𝑡 ∈ 𝐷; 𝑥 ⊆ 𝑡| − |𝑡 ∈ 𝐷; 𝑡 𝑝 ⊆ 𝑡| |𝑡 ∈ 𝐷; 𝑧 ⊆ 𝑡| The Difference metric gives us an idea of how much the values of the protected attributes influence the value of 𝑌. Difference score of a dependency Analyzing the following dependency:</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://archive.ics.uci.edu/ml/datasets/Adult</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>CEUR Workshop Proceedings (CEUR-WS.org) 1 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<author>
			<persName><forename type="first">D</forename><surname>Firmani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tanca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Torlone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Ethical dimensions for data quality</title>
				<meeting><address><addrLine>JDIQ</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Responsible data management</title>
		<author>
			<persName><forename type="first">J</forename><surname>Stoyanovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Howe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jagadish</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">FAIR-DB: Functional dependencies to discover data bias</title>
		<author>
			<persName><forename type="first">F</forename><surname>Azzalini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Criscuolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tanca</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop Proceedings of the EDBT/ICDT</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page">2021</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Abiteboul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vianu</surname></persName>
		</author>
		<ptr target="http://webdam.inria.fr/Alice/" />
		<title level="m">Foundations of Databases</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Relaxed functional dependencies-a survey of approaches</title>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Deufemia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Polese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Data mining for discrimination discovery</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ruggieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pedreschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Turini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Knowledge Discovery from Data (TKDD)</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1" to="40" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Bellamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Houde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kannan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lohia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mojsilović</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>IBM Journal of Research and Development</publisher>
			<pubPlace>IBM</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Adebayo</surname></persName>
		</author>
		<title level="m">FairML: ToolBox for diagnosing bias in predictive modeling</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
		<respStmt>
			<orgName>Massachusetts Institute of Technology</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Fairtest: Discovering unwarranted associations in data-driven applications</title>
		<author>
			<persName><forename type="first">F</forename><surname>Tramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Atlidakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Geambasu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-P</forename><surname>Hubaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Humbert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Juels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE European Symposium on Security and Privacy</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Nutritional labels for data and models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Stoyanovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Howe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Data Eng. Bull</title>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">UCI machine learning repository</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Graff</surname></persName>
		</author>
		<ptr target="http://archive.ics.uci.edu/ml" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Revisiting conditional functional dependency discovery: Splitting the &quot;C&quot; from the</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rammelaere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Geerts</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
	<note>FD</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Discovering conditional functional dependencies</title>
		<author>
			<persName><forename type="first">W</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Geerts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Xiong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
