<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Tuples to add Tuples to remove</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Functional Dependencies to Mitigate Data Bias</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Azzalini</string-name>
          <email>fabio.azzalini@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Criscuolo</string-name>
          <email>chiara.criscuolo@polimi.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Letizia Tanca</string-name>
          <email>letizia.tanca@polimi.it</email>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Fairness, Data Bias, Functional Dependencies</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Human Technopole - Center for Analysis, Decisions and Society</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1141</year>
      </pub-date>
      <volume>4883</volume>
      <fpage>19</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Technologies based on data are frequently adopted in many sensitive environments to build models that support important and life-changing decisions. As a result, for an application to be ethically reliable, it should be associated with tools to discover and mitigate bias in data, in order to avoid (possibly unintentional) unethical behaviors and the associated consequences.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the recent years fairness has become an important topic of interest in the Data Science
community. Indeed, computers and algorithms have made our lives eficient and easier, but
among the prices we risk to pay is the possible presence of discrimination and unfairness in the
decisions we make with their support. As a result, algorithmic decision systems should work
on unbiased data to obtain fair results, since learning from historical data might mean to learn
also traditional prejudices that are endemic in society, producing unethical decisions.</p>
      <p>
        Last year we presented FAIR-DB [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a framework that, by discovering and analyzing a
particular type of database integrity constraints can find unfair behaviors in a dataset. Specifically,
the system employees approximate conditional functional dependencies (ACFDs) to recognize
the cases where the values of a certain attributes (e.g. gender, ethnicity or religion) frequently
determines the value of another one (such as range of the proposed salary or social state). In
this paper we enhance the framework with an additional functionality to repair datasets found
unfair by the discovery phase of FAIR-DB. The new module, ACFD-Repair method, is a procedure
that, based on the discovered dependencies, determines the smallest set of tuples to be added or
removed from the dataset to mitigate, or completely, remove the previously discovered bias.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Framework Overview</title>
      <p>
        Before presenting the methodology, we first introduce some fundamental notions that will
accompany us along our discussion. A protected attribute is a characteristic for which
nondiscrimination should be established, like age, race, sex, etc [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]; the target variable is the feature
of a dataset about which the user wants to gain a deeper understanding, for example the income,
or a boolean label that indicates whether a loan is authorized or not, etc.
      </p>
      <p>Figure 1 shows the framework of the enhanced version of FAIR-DB.</p>
      <p>
        We now give a brief presentation of the Data Bias discovery step. For a deeper understanding
of this phase, we refer the reader to the following papers: [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The Data Bias discovery
module operates according to the following steps: we import the data and perform (if needed)
data cleaning and integration, we extract the ACFDs from the dataset and discard the ones that
do not contain protected attributes and the target variable, then, to select the ACFDs that show
unfair behaviours for each dependency, we compute some metrics capturing the “ethical level”
of the dependency, to facilitate user interaction we rank the ACFDs in descending order of
importance; finally the user selects the ACFDs she perceives as the most problematic and then,
the system computes some metrics that summarize the level of unfairness of the dataset.
      </p>
      <p>The core of the discovery phase is the Diference , a novel metric that indicates how much
‘unethical’ the behaviour highlighted by a dependency is. The higher this metric, the stronger
the alert for an unfair behavior.</p>
      <p>For each dependency  , we define the Diference metric of  as the diference between
the dependency confidence and the confidence of the dependency computed without the
protected attributes of the Left-Hand-Side (LHS) of the ACFD. Given a dependency in the form
 :( →  ,   ), let  = ( − {   }) , that is the LHS of the ACFD without its
protected attributes, and  ′: ( →  ,   ). We define the Diference as:</p>
      <p>Diference () = Confidence () − Confidence ( ′)
The Diference metric gives us an idea of how much the values of the protected attributes
influence the value of  . In order to assess the unfair behavior of a dependency, we also take into
consideration its support, that indicates the pervasiveness of the ACFD; unethical dependencies
with high support will impact many tuples, and thus will be more important.</p>
      <p>Data Bias
discovery</p>
      <p>Selected
ACFDs</p>
      <p>Unfair
Computation</p>
      <p>Optional</p>
      <p>Greedy
Hit-Count
algorithm
Correction
Algorithm</p>
      <p>Statistics
Computation
‘NC-Hispanic’ and I‘Anpmut eDra-taIsnedtian-Eskdiismcoove’.ry
analysis of the dependencies showsDthataatBitahse grouApsCFmD-oRreepadiri:scriminated are: ‘Female’, ‘Black’,
mitigate bias</p>
      <p>Final Dataset</p>
    </sec>
    <sec id="sec-3">
      <title>3. ACFD-Repair</title>
      <p>The final objective of the repair phase is to create a fair dataset where the unethical behaviors
are greatly mitigated or completely removed. The procedure has to conform to a fundamental
requirement, i.e., since the repaired dataset has to be used later on in a data science pipeline, the
number of modifications, consisting of deletion or addition of tuples, has to be minimized in
order to protect the original distributions of values. Note also that, given the variety of decision
algorithms, the repair procedure must work also for attributes with non-binary values. Figure
2 presents the ACFD-Repair methodology: we now give a detailed description of each phase
using the U.S. Census Adult dataset as running example.</p>
      <sec id="sec-3-1">
        <title>3.1. Unfair Tuple Count Computation</title>
        <p>We propose to try to reduce, or eliminate, unfairness, starting from the list of ACFDs computed
in the Data Bias discovery step. The repair is performed by bringing the value of the Diference
of each user-selected dependency below a minimum threshold  ; this can be achieved in two
ways: (i) removing the tuples matching the dependency so that, after removal, the Diference
value of the dependency will become lower than  ; (ii) adding the tuples that, combined with
the elements that match the dependency, if added, will lower its initial Diference value below  .</p>
        <p>The Unfair Tuple Count computation step is therefore responsible for finding, for each
dependency, the number of tuples that has to be added or removed. For each ACFD  , we define
its Unfair Tuple Count as F( ) = ( , ) where  represents the number of tuples that should be
added, and  the number of tuples that should be removed to repair the discrimination behaviour
shown by the dependency.</p>
        <p>To explain how the Unfair Tuple Count value of each dependency is computed, we make use
of the dependency  1 ∶ (Sex = ‘Female’, Workclass = ‘Private’) → Income = ‘≤50K’. The Diference
of  1 can be computed as: Dif ( 1) = Conf( 1) − Conf( 1′), where  1′: (Workclass = ‘Private’) →
(Income = ‘≤50K’). Rewriting the confidence as a ratio of supports we get the following formula:
1https://archive.ics.uci.edu/ml/datasets/Adult
As already mentioned, to repair  1 we can either add or remove tuples: let’s first focus on
the former. To remove the unfair behaviour highlighted by  1: (Sex = ‘Female’, Workclass =
‘Private’) → Income = ‘≤50K’, intuitively, we can either add males that earn less than 50K $/year
and work in the private sector or add females that earn more than 50K $/year and work in the
private sector. Therefore we have to add to the dataset tuples that satisfy the following two
dependencies:
•  2: (Sex = ‘Male’, Workclass = ‘Private’) → Income = ‘≤50K’
•  3: (Sex = ‘Female’, Workclass = ‘Private’) → Income = ‘&gt;50K’
The two dependencies above define the tuples that, if added to the initial dataset, will lower the
initial Diference value below the threshold  ;  2 and  3, being the ”antagonists” of  1, form
the Opposite Set (OS) of  1. Specifically,  2 has been obtained by changing, one at the time, the
values of the protected attributes: we name the set of such ACFDs Protected attribute Opposite
Set (POS). In the example we have only one, binary protected attribute, therefore the POS will
contain only the dependency  2; in case the protected attribute is non-binary, or there is more
than one protected attribute, the POS would include more than one dependency. On the other
hand,  3 has been obtained by reversing the value of the target class. This ACFD is in the Target
attribute Opposite Set (TOS), that, since the target attribute is required to be binary, will always
contain only one dependency.</p>
        <p>In case we have multiple discriminated minorities, adding tuples that satisfy the dependencies
contained in the POS could result in enhancing discrimination. If we consider  : (Race= ‘Black’)
→ Income = ‘≤50K’, the POS contains the ACFDs that involve all the other values of the protected
attribute ‘Race’: ‘White’, ‘Asian-Pac-Islander’, ‘Other’ and ‘Amer-Indian-Eskimo’; therefore we
would add tuples referring to ‘Amer-Indian-Eskimo’ people that earn less than 50K $/year to
the dataset, thus aggravating their already discriminatory condition.</p>
        <p>Finally, given what we have just presented and the fact that we want to repair the dataset by
improving the condition of discriminated groups, we decide to add, for each ACFD, only the
tuples that satisfy the dependency contained in each respective TOS.</p>
        <p>We now show how to compute the number of tuples to be added to the dataset for the
dependencies belonging to the TOS: we start by analyzing  3, obtained by reversing the value
of the target attribute. Adding  tuples that satisfy  3 will impact the Diference of  1 in the
following way:
Now, to repair  1, we have to impose that Dif ( 1) be lower than  , and, solving the inequality,
we find the number of tuples that need be added by solving for  .</p>
        <p>Now that we demonstrated how to find the number of tuples to be added in order to repair a
dependency, we show how to find the number of tuples that need be removed to accomplish the
same task. To repair an ACFD by removing tuples, we simply need to remove tuples that satisfy
the dependency. To compute the number of tuples to be deleted from the dataset, we start by
noticing that removing from the dataset  tuples that satisfy  1 will modify the Diference of  1
in the following way:
Now, to repair  1, we have to impose that Dif ( 1) has to be lower than  , and, similarly to what
we have done in the previous two cases, find the number of tuples that should be removed by
solving for  .</p>
        <p>ACFD
(Sex = ‘Female’) → Income = ‘≤50K’
(Race= ‘Black’) → Income = ‘≤50K’
(Race = ‘Amer-Indian-Eskimo’) → Income = ‘≤50K’
(Native-Country = ‘NC-Hispanic’) → Income = ≤50K
Example 1. For the repair phase, we continue using the U.S. Census Dataset. Table 1 report
some examples of selected ACFDs to repair the dataset. Table 2 reports, for each selected ACFD, the
number of tuples that should be added according to the TOS computation, or removed to compensate
the Diference metric.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Greedy Hit-Count algorithm</title>
        <p>A basic solution to repair the dataset could simply consist in adding, for each negative ACFD,
the tuples that satisfy its TOS, using, for each tuple, its Unfair Tuple Count as multiplicity.
Unfortunately, this is not a valid option, because it would result in adding too many tuples
to the dataset, which would not respect the key requirement to limit the modifications of the
original dataset.</p>
        <p>
          To add tuples to the initial dataset in an optimized way, we use a modified version of the
Greedy Hit-Count algorithm algorithm presented in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. From each dependency in a TOS, we
generate a pattern, which is an array whose dimension is equal to the number of attributes in
the dataset and where each cell represents the value of a specific attribute. Given a dataset 
with  categorical attributes and an ACFD  , a pattern  is a vector of size  generated from  ,
where  [] is either  (meaning that its value is unspecified) or the value of the corresponding
attribute in  . The pattern computation step is useful for two reasons: (i) to transform an ACFD
into a tuple and decide the value of the free attributes (i.e. the non-instantiated attributes in the
dependency); and (ii) to add tuples to the dataset in an optimized way, exploiting the Greedy
Hit-Count algorithm.
        </p>
        <p>The Greedy Hit-Count algorithm takes as input the set of patterns computed at the previous
step and returns the set of tuples needed to repair the dataset. The idea behind this algorithm is
that a value combination (i.e. a generated tuple) can cover multiple patterns simultaneously,
allowing us to add fewer tuples than the basic solution, and thus minimizing the changes in
the final, repaired dataset. The algorithm stops when all the patterns are covered, returning
the values combinations that hit the maximum number of uncovered patterns. To summarize,
given the set of uncovered patterns as input, this step finds the minimum set of tuples to repair
the dataset, along with the indication of which patterns each tuple can cover.</p>
        <p>Finally, to decide how many copies of each tuple we insert in the dataset, we take the average
of the Unfair Tuple Count of the patterns covered by that tuple.</p>
        <p>
          The current implementation of the algorithm does not prevent the generation of unrealistic
tuples (e.g. combining “Sex”=“Male” with “Pregnant”=“Yes”), since there is no constraint on the
value combinations. We plan, in a future work, to improve the system with the addition of a
validation oracle [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to identify and prevent the insertion of inconsistent tuples.
Example 2. For each ACFD in Table 2, the system determines the TOS, then, from these new
Workclass Race
        </p>
        <p>X X
X Black
X A-I-E
X X</p>
        <p>Sex
Female</p>
        <p>X
X
X</p>
        <p>Hours-Per-Week</p>
        <p>X
X
X
X</p>
        <p>NC
X
X</p>
        <p>X
NC-Hisp</p>
        <p>Age-Range Education Income</p>
        <p>X X &gt;50K
X X &gt;50K
X X &gt;50K
X X &gt;50K
ACFDs, it generates the corresponding patterns (reported in Table 3). Note that, in the patterns,
the target attribute value is changed in order to compensate the Diference of the original ACFDs.
The computed set of patterns is the input of the Greedy Hit-Count algorithm. Table 4 reports the
output composed by 2 tuples, with their cardinalities, that should be added to the dataset. The first
tuple derives from 6 uncovered patterns and the second tuple derives from 2 uncovered patterns.
The two tuples identified from the algorithm have no inconsistencies.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Correction algorithm</title>
        <p>The aim of the Greedy Hit-Count algorithm phase was to try to enhance the fairness of the
initial dataset, by adding a very low number of external elements. Using a greedy approach, and
setting a limit to the maximum number of tuples that can be added/removed, we prevent the
overloading of the final dataset, but this could potentially not be enough to repair the dataset
completely. During this phase, if any of the initial ACFDs is still present; we remove from the
dataset some tuples matching them. To determine the number of tuples to remove, for each
ACFD still present in the dataset we use  , the second value of the Unfair Tuple Count.
Example 3. After adding tuples in the previous step, only one of the original ACFD is still found.
If the user chooses to apply the Correction algorithm, the system computes again, according to
 , the number of tuples to remove for each ACFD and generates the corresponding patterns that,
converted into tuples are removed from the dataset. In this case, to solve the ACFD we need to
remove 1733 tuples, obtaining a final dataset with 29088 tuples.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Statistics Computation</title>
        <p>We now present a set of metrics to analyze the quality of the repair procedure. Specifically, for
each ACFD given as input and still present in the repaired dataset we compute: the Cumulative
Support, the percentage of tuples in the dataset involved by the selected ACFDs and the Mean
Diference , the mean of all the ‘Diference’ scores of the selected ACFDs. Moreover, to make
sure that the dataset can still be used for data science tasks, we compare the distribution of
values of each attribute in the initial dataset with its counterpart in the repaired one.</p>
        <p>Finally, we apply to the repaired dataset the Data Bias discovery procedure to inspect the
ghost ACFDs, i.e. new ACFDs that were not mined from the original dataset and might appear
now after this phase and might introduce new unfair behaviours.
(Sex = ‘Female’, Class = 1) → Survived= 1
(Survived= 0, Sex = ‘Female’) → Class = 3
(Sex = ‘Male’) → Survived= 0
Example 4. Using the corrected version of the dataset we perform again the Data Bias discovery
procedure; we do not have any ACFDs in common with the ones selected from the input dataset, so
the dataset can be considered as repaired. The metrics are actually lower: the Cumulative Support,
representing the percentage of tuples originally involved by unfair dependencies (around a 35%),
after the repair procedure is 0, the Mean Diference goes from 0.13 to 0. We compare the value
distribution of each attribute between the initial and the corrected dataset (we do not report the
plots for brevity) and the distributions remain almost unchanged, apart from a slight increase in
the number of females and private workers. Moreover no unfair ghost ACFD is discovered.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>We now present the results obtained by our system on another real-world dataset: the Titanic
dataset2. The discovery procedure returns the ACFDs that show unfairness, Table 5 reports some
of them. The dataset is unfair with respect to all the protected attributes; specifically, the groups
more discriminated are: ‘Male’ and ‘Third-Class’ passengers. Due to the discrimination we
decide to mitigate the bias applying the ACFD-Repair procedure. According to the user-selected
dependencies, the algorithm computes the number of tuples that should be added to the dataset
to compensate their Diference value, specifically, we add 233 tuples to the dataset. Since no
initial dependency is extracted from the repaired dataset, the correction procedure is skipped
and we consider this version of the dataset as corrected. The obtained metrics for the final
dataset show that the mitigation procedure worked: the Cumulative Support, that was around
a 86%), after the repair procedure is 0, the Mean Diference goes from 0.35 to 0.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Comparison with similar systems</title>
      <p>Three possible approaches can be adopted when trying to enforce fairness in a data analysis
application: preprocessing techniques, i.e. procedures that, before the application of a prediction
algorithm, make sure that the learning data are fair; (ii) inprocessing techniques, i.e. procedures
that ensure that, during the learning phase, the algorithm does not pick up the bias present
in the data, and (iii) and postprocessing techniques, i.e. procedures that correct the algorithm’s
decisions with the scope of making them fair.</p>
      <p>
        One of the first preprocessing techniques is [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The process, on the basis of discrimination
measures used in the legal literature, can identify potentially discriminatory itemsets by
discovering association rules. Furthermore, the authors propose a set of sanitization methods:
given a discriminatory rule, it is sanitized by modifying the itemset distribution in order to
prevent discrimination. For each discrimination measure, they propose a method to achieve
a fair dataset by introducing a reasonable (controlled) pattern distortion. Unfortunately, this
system does not involve user interaction, therefore the user cannot discard the rules that are
not interesting for the specific investigation. Our approach provides as a final step a set of
summarizing metrics that describes the overall degree of unfairness of the dataset.
      </p>
      <p>
        In the Machine Learning context, the majority of works that tries to enforce fairness is related
to a prediction task, and more specifically to classification algorithms. One of these works
is AI Fairness 360[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an open-source framework whose aim is to reach algorithmic fairness
for classifiers. It tries to mitigate data bias, quantified using diferent statistical measures, by
exploiting pre-processing, in-processing and post-processing techniques. The results obtained
by AI Fairness 360 are in complete accordance with ours. The competitor checks fairness property
only for one binary attribute at the time, while, since the ACFD technique can involve more
than one attribute at a time, our tool can detect unfair behaviors at finer level of granularity. The
main diference between these approaches and our framework, that solves unfairness adopting
a preprocessing technique, is that our system does not need a classifier to work, because it is
based on finding conditions (in form of approximate constraints) that are already present in the
data, even though possibly with some level of approximation.
      </p>
      <p>
        Another related topic regards rule-based data-cleaning techniques [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The scope of these
approaches is to eliminate the inconsistencies present in dataset by making sure that all the
rules (integrity constraints) hold on the final, clean one. In our case, the final objective is quite
diferent: first, we don’t modify the values of specific records present in the dataset, but mainly
focus on adding tuples to make it more fair; indeed, we are not interested in ensuring that all
the tuples respect the discovered dependencies, but, rather, that the final repaired dataset does
not contain unfair dependencies.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Works</title>
      <p>
        We presented a novel framework, that, through the extraction of a particular type of Functional
Dependencies, can discover and mitigate bias and discrimination present in datasets.
Future works will include: (i) the study of data equity, operationalizing this definition and
studying how it augments or contradicts existing definitions of fairness, (ii) the development of
a graphical user interface to facilitate the interaction of the user with the system, (iii) the study
of other classes of functional dependencies[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Azzalini</surname>
          </string-name>
          , et al.,
          <article-title>FAIR-DB: Functional dependencies to discover data bias</article-title>
          ,
          <source>Workshop Proceedings of the EDBT/ICDT</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Azzalini</surname>
          </string-name>
          , et al.,
          <article-title>A short account of FAIR-DB: a system to discover data bias</article-title>
          ,
          <source>SEBD</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>Fairness definitions explained</article-title>
          , in: FairWare, IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Asudeh</surname>
          </string-name>
          , et al.,
          <article-title>Assessing and remedying coverage for a given dataset</article-title>
          ,
          <source>in: ICDE</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hajian</surname>
          </string-name>
          , et al.,
          <article-title>Discrimination-and privacy-aware patterns</article-title>
          ,
          <source>Data Min. Knowl. Discov</source>
          .
          <volume>29</volume>
          (
          <year>2015</year>
          )
          <fpage>1733</fpage>
          -
          <lpage>1782</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Bellamy</surname>
          </string-name>
          , et al.,
          <source>AI</source>
          Fairness
          <volume>360</volume>
          :
          <article-title>An extensible toolkit for detecting and mitigating algorithmic bias</article-title>
          ,
          <source>in: IBM Journal of Research and Development, IBM</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chu</surname>
          </string-name>
          , Data cleaning, Morgan &amp; Claypool,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          , et al.,
          <article-title>Relaxed functional dependencies-a survey of approaches</article-title>
          ,
          <source>TKDE</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>