<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Symposium on Advanced Database Systems, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Imputation of Missing Values through Profiling Metadata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bernardo Breve</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Loredana Caruccio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Deufemia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Polese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Salerno</institution>
          ,
          <addr-line>via Giovanni Paolo II, 132, Fisciano (SA), 84084</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Among the several problems related to the management of database instances, missing values represents a crucial factor that could severely compromise the integrity and the meaningfulness of such data representations. Thus, the data imputation research field focuses its eforts on solutions for filling missing values by means of plausible candidates, while still preserving the overall semantic integrity the database instance is characterized by. To keep imputation times low while still keeping high accuracy, the employment of metadata has made its way through research proposals. This discussion paper presents our efort in the definition of RENUVER, a novel data imputation algorithm relying on Relaxed Functional Dependencies (rfds) for identifying value candidates best guaranteeing the semantic integrity of data. Experimental results on real-world datasets highlighted the efectiveness of RENUVER in terms of both iflling accuracy and imputation times, also compared to other well-known approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data imputation</kwd>
        <kwd>Profiling metadata</kwd>
        <kwd>Relaxed Functional Dependencies</kwd>
        <kwd>Data quality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        attributes, yielding an accurate and somewhat fast solution for the imputation of missing values
within relational database instances. In fact, rfds are still widely considered for detecting and
repairing many types of errors, such as duplicates, outliers, and constraint violations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Thus,
we made use them for identifying suitable candidate values for replacing missing ones in the
data imputation process. RENUVER exploits rfds for: i) identifying the candidate tuples useful
for the imputation of missing values, ii) ranking candidate tuples based on their similarity with
respect to the tuples containing missing values, and iii) evaluating each imputation to guarantee
the semantic consistency of the whole dataset.
      </p>
      <p>In particular, RENUVER generates candidate tuples and rank them, according to rfds implying
the attribute on which a value is missing. Moreover, the imputation strategy of RENUVER does
not alter value consistency with respect to the ones in the original dataset. Finally, RENUVER
exploits rfds to also judge whether it is possible to impute a missing value, in order to preserve
the integrity of data and to avoid the insertion of inconsistent information.</p>
      <p>
        The efectiveness of RENUVER has been evaluated on real-world datasets1 in terms of accuracy,
and execution time. In order to extract rfds, we relied on an existing rfd discovery algorithm
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], since the problem of discovering rfds is out of the scope of this paper. Moreover, we
introduce a novel method for the automatic evaluation of data imputation results, which permits
to judge the imputed values even with diferent syntactical representations. Evaluation results
demonstrate that RENUVER outperforms other data imputation approaches [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ].
      </p>
      <p>The paper is organized as follows: Section 2 provides preliminary notions on rfds. Section
3 introduces RENUVER’s logic through the employment of the rfds in the data imputation
problem. An experimental evaluation measuring the efectiveness RENUVER is presented in
Section 4. Finally, conclusions and further research are reported in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>Before describing how we approached the imputation problem through the employment of
rfds, let us introduce some propaedeutics notions to our methodology.</p>
      <p>Functional Dependency. Given a relational database schema ℛ, and  = {1, . . . , }
one of its relation schemas, and a tuple  ∈ , we use [], with 0 ≤  ≤ , to denote the
projection of  onto ; similarly, for a set of attributes  = {1 , . . . ,  }, with 1 ≤  ≤ ,
[] ∈ (1 ) × . . . × ( ) represents the projection of  onto , also denoted with
Π  (). An fd on ℛ is a statement  →  ( implies  ), with ,  ⊆ (), such that,
given an instance  of ,  →  is satisfied in  if and only if for each pair of tuples (1, 2)
in , whenever 1[] = 2[], then 1[ ] = 2[ ]. The sets of attributes  and  are named
Left-Hand-Side (LHS) and Right-Hand-Side (RHS) of the fd, respectively.</p>
      <p>
        With respect to fd definition, the rfd generalizes the comparison paradigm, by including
similarity/distance-based comparisons between tuple projections, also admitting the possibility
for a dependency to hold only on a subset of tuples. The latter can be defined through either
a coverage measure, quantifying the portion of the dataset on which a dependency holds or
a condition restricting the domain on which a dependency can hold [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Since the proposed
1https://github.com/DastLab/RENUVER-evaluation-datasets
approach exploits only rfds relying on a similarity/distance-based tuple comparison method,
in what follows we provide only the definition of this type of rfds, known as rfdc. For a more
general definition of rfd, see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
rfdc. Given a relational database schema ℛ, and  = {1, . . . , } one of its relation
schemas, an rfdc  on ℛ
Φ1 → Φ2
(1)
where
• ,  ⊆ ();
• Φ 1 contains (for each attribute  ∈ ) a constraint [] that can be used to determine
whether pair of tuples with values in () are “similar” enough (likewise for each
attribute  ∈  with  [ ] ∈ Φ 2). More specifically, each [] ( [ ] resp.) requires
the specification of a similarity/distance function defined on the domain of  ( , resp.),
an operator, and a threshold setting the boundaries for the satisfaction of the constraint.
holds on a relation instance  (denoted by  |=  ) if and only if for each pair of tuples (1, 2)
∈  for which 1[] and 2[] satisfy the constraint [] for each  ∈ , then 1[ ] and
2[ ] satisfy the constraint [] for each  ∈  .
      </p>
      <p>For sake of simplicity, in the following, we apply a more compact notation for the constraints,
showing only the operator and the numeric threshold associated with each attribute.
Example. Let us consider the sample relation shown in Table 1, derived from a database
of restaurants in USA. Within this database, each tuple represents a restaurant providing
information about its name, address, city, phone number, type of cuisine, and class. The latter
is a numeric id associated to the type of cuisine. On such dataset, the following rfdc holds:
Name(≤ 4→)− Phone(≤ 1) which states that, if two restaurants have a similar name, then they
also have a similar phone number. This should be true despite the names and/or the phone
numbers of restaurants being written in diferent ways or using diferent abbreviations.</p>
      <p>
        From a theoretical point of view, rfdcs permit to use any type of similarity/distance functions,
e.g., edit distance, abs diferences, and so forth. However, they are usually inherited from the
functions involved in the automatic rfdc discovery process [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For the scope of this proposal,
without loss of generality, we can consider rfdcs with a single attribute on the RHS, and the
associated constraint 2. In particular, we considered 2 composed of a distance function, the
operator ≤ , and a distance threshold.
      </p>
      <p>A particular type of rfdc is the key-rfdc, which is defined in the following.</p>
      <p>Key rfdc. Given a relation schema , and an instance  of , an rfdc  : Φ1 → 2 is said
to be key if and only if  holds on  ( |=  ), but there is no pair of distinct tuples (1, 2) ∈ ,
for which 1[] and 2[] satisfy all the constraints in Φ 1[].</p>
    </sec>
    <sec id="sec-3">
      <title>3. The RENUVER imputation approach</title>
      <p>In this section, we formalize the data imputation problem by defining some of its underlying
concepts, then describing the basics of the proposed imputation approach. Let us start defining
the concept of missing value.</p>
      <p>Missing value. Given a relation schema , defined over a set of attributes (), an
instance  of , an attribute  ∈ (), and a tuple  ∈ , a missing value of tuple  on the
attribute , denoted as [] = _, is such that [] is null.</p>
      <p>Here,  is said to be an incomplete instance, and ˆ ⊆  contains only incomplete tuples.</p>
      <p>The general missing value imputation problem is formally defined as follows.
Missing value imputation problem. Given a relation schema , and an instance  of , for
every tuple  ∈  and every attribute  ∈ () for which [] = _, the imputation problem
consists of finding a plausible value  ∈ (), such that the database instance ′ resulting
from the imputation process does not contain inconsistent values.</p>
      <p>A missing value imputation approach also requires the application of constraints for
evaluating the consistency of values at the end of the imputation process. The proposed approach
exploits rfds to both guarantee the verification of the semantic consistency, and to drive the
searching of meaningful candidates for all missing values.</p>
      <p>Semantically consistent imputation. Given a relation schema , defined over a set of
attributes (), an instance  of ,</p>
      <p>and a set of rfdcs, Σ , holding on  ( |= Σ ), an instance ′ of  resulting from an imputation
process  over the instance , denoted as ′ = (), is semantically consistent if ′ |= Σ . One of
the possible strategies that could guarantee the semantic consistency of the imputation process
is to find candidate values for [] = _ by considering a set  ⊆  of plausible candidate
tuples for imputing [], such that ∀ ∈ , [] ̸= _ and  is similar to  on some
attributes beyond .</p>
      <p>In what follows we define the criteria used by RENUVER for deciding when a tuple can be
considered as a plausible candidate, which is based on rfdcs.</p>
      <p>Plausible candidate tuple. Given a missing value []=_ over a database instance  of a
relation schema , and an rfdc  : Φ1 → 2 holding on , a tuple ′ ∈  can be considered
as a plausible candidate tuple for imputing [] according to  if  and ′, are similar according
to the constraints in Φ 1.</p>
      <p>The candidate tuple generation process performed according to the definition presented
above, has to be generalized in order to perform the imputation process on tuples containing
more than one missing value, and for each  ∈ ˆ.</p>
      <p>
        Missing value imputation for a tuple. Let  be a relational schema defined over a set
of attributes (),  an instance of ,  a tuple of ,  ⊂ () a set of attributes such
that for each  ∈  [] = _, and Σ a set of rfdcs holding on . An imputation process for 
consists of selecting a plausible candidate tuple  for each  ∈  such that [] = _, so that
[] can be set equal to  []. However, when for a [] = _ it is not possible to identify a
plausible candidate tuple guaranteeing a semantic consistent imputation, it is better to leave
[] unimputed. Although this strategy has been widely applied in other approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], it
a) Data pre-processing
      </p>
      <p>Name(≤ 8), Phone(≤ 0), Class(≤ 1) ➝ Type(≤ 0)
Class(≤ 0) ➝ Type(≤ 5)
City(≤ 2) ➝ Phone(≤ 2)
Name(≤ 4) ➝ Phone(≤ 1)
Name(≤ 8), Phone(≤ 0) ➝ City(≤ 9)
Name(≤ 6), City(≤ 9) ➝ Phone(≤ 0)</p>
      <p>Phone(≤ 1) ➝ Class(≤ 0)
... ...
b) RFDc selection
0Phone : Na:mNea(m≤e6()≤, 6C)i,tyC(≤ity9()≤ ➝9) ➝PhoPnheo(n≤e0()≤ 0)
Phone
Phone
: Name(≤ 4) ➝ Phone(≤ 1)</p>
      <p>: Name(≤ 4) ➝ Phone(≤ 1)
: City(≤ 2) ➝ Phone(≤ 2)</p>
      <p>: City(≤ 2) ➝ Phone(≤ 2)
c) Imputing missing values</p>
      <p>Phone
Phone
Phone
: Name(≤ 6), City(≤ 9) ➝ Phone(≤ 0)
: Name(≤ 4) ➝ Phone(≤ 1)
: City(≤ 2) ➝ Phone(≤ 2)
: Phone(≤ 1) ➝ Class(≤ 0)</p>
      <p>violated!
: Phone(≤ 1) ➝ Class(≤ 0)</p>
      <p>NOT violated!</p>
      <p>Name
t1 Granita
t2 Chinos Main
t3 Citrus
t4 Citrus
t5 Fenix
t6 Fenix Argyle
t7 C. Main</p>
      <p>City
Malibu</p>
      <p>LA
Los Angeles
Los Angeles
Hollywood</p>
      <p>_
Los Angeles</p>
      <p>
        Phone
yields to another important issue that RENUVER deals with, i.e., minimizing the number of
non-imputed values.
we show how the aforesaid definitions empower the imputation of a missing value in the
Restaurant dataset, previously introduced. In details, we can identify three major phases
yielding the imputation of certain missing value, that are:
• Pre-processing: during this phase, missing values within a database instance are
identi2 A deep overview of RENUVER, together with a more exhaustive evaluation has been carried out in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
ifed and isolated. Furthermore, RENUVER excludes all key-rfdcs from the set of the rfdcs
which can be employed for the imputation of any missing value (see Figure 1.a).
• rfdc selection: following the selection of a missing value to impute, during this phase
RENUVER identifies all the rfdcs that can be useful for its imputation. rfdcs are then
organized in a set of clusters according to their threshold on the RHS (see Figure 1.b).
• Imputing missing values: during this phase, RENUVER performs a series of operations
leading to the imputation of a missing value by retrieving the value from a set of plausible
candidate tuples relying on the same database instance (see Figure 1.c). In particular,
RENUVER iteratively performs the following operations:
– generates a set of plausible candidate tuples that satisfy the LHS constraints of an
rfdcs belonging to one of the clusters previously generated.
– computes a distance value for each plausible candidate tuple with respect to the
tuple having the missing value. The evaluation is performed by considering the LHS
attributes of the rfdcs selected. Finally the candidate tuple having the minimum
distance is the exploited for the imputation of the missing value.
– verifies whether the imputed value causes a violation of holding rfdcs. In this case,
      </p>
      <p>RENUVER selects the next plausible candidate tuple with the lowest distance value.</p>
      <p>These operations are repeated for each cluster as long as the imputation is not successful.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>
        In this section, we present a comparative evaluation of RENUVER w.r.t. other approaches
exploiting diferent imputation strategies. In particular, we benchmarked RENUVER against an
holistic-machine learning-based approach, namely Holoclean [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], (considering its
attentionbased expansion module AimNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) and a diferential dependencies guided approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
named Derand, for which we employed the same rfdcs as RENUVER. All evaluations were
performed under the same conditions on an iMac Pro with an 8-core CPU and 32GB RAM.
Datasets. The considered algorithms have been evaluated on two real-world datasets 2 in
order to perform a stress test on RENUVER and all compared imputation approaches, aiming
to determine their time and memory requirements. To this end, we stopped the executions
exceeding 48 hours of execution time and/or 30GB of memory consumption, respectively.
      </p>
      <p>Furthermore, in order to obtain an accurate comparison between the imputed values and the
expected ones, missing values have been artificially injected in a random manner. Moreover,
to avoid an arrangement of missing values over one algorithm, for each missing injection we
produced five diferent datasets, yielding a total of twenty-five variants of the same dataset. The
metrics adopted for the comparison are then averaged over each missing rate.
Evaluation metrics. The efectiveness of the data imputation approaches have been evaluated
by considering three diferent metrics: precision, recall, F1-measure. Which can be formally
defined as:
precision = |tru|ei m⋂︀piumtpedu|ted| recall = |tru|em⋂i︀smsi nisgs|ing| F1-measure = 2 × pprreecciissiioonn+×rreeccaallll
where true represents the correctly imputed missing values at the end of the imputation process,
imputed represents all the imputed missing values, and missing the missing values in the dataset.
registered the best performances on all the considered qualitative metrics.</p>
      <p>The second evaluation session is focused on the Physician dataset, by fixing the missing rate
and by varying the number of tuples to be considered. This dataset is particularly complex to
analyze, since it also contains a high number of attributes (i.e., 13 attributes). In fact, this dataset
allowed us to catch a time and/or memory limit for all considered approaches (i.e., RENUVER,
Derand, and Holoclean), as shown in Table 2. In particular, we can notice that, on average,
both RENUVER and Holoclean registered faster execution times than Derand. In fact, the latter
exceeds the time limit of 48h on the datasets having 2072 and 10359 tuples, respectively. On the
other hand, Holoclean manages to achieve reasonable executions times, but the huge amount of
consumed memory makes it exceed the 30GB memory limit on the dataset having 10359 tuples.
Finally, RENUVER also exceeds the time limit on the largest dataset, despite a more reasonable
memory consumption. This evaluation session proved the capability of RENUVER to outperform
the compared approaches on the considered qualitative metrics. It also emphasized that Derand’s
execution times are strongly dependent on the number of missing values, whereas although
Holoclean provided overall faster execution times, it resulted heavily memory-consuming.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we proposed RENUVER, a data imputation algorithm that exploits relaxed
functional dependencies. The latter enables RENUVER to select and evaluate tuple candidates to be
used during the imputation process. The whole imputation process preserves the semantic
consistency of the data, by guaranteeing that no imputation can violate any rfdc. Evaluation
results demonstrated that RENUVER outperforms recent approaches using diferent imputation
strategies: machine learning-based (Holoclean) and dependency-based (Derand).</p>
      <p>
        In the future, we would like to extend RENUVER with the possibility of selecting plausible
candidate tuples among multiple datasets. Finally, we would like to study the applicability of
RENUVER over incremental scenarios, like for example those related to the imputation of time
series [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which would require the usage of incremental rfdc discovery algorithms [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Molinaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          ,
          <article-title>Customized policies for handling partial information in relational databases</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>25</volume>
          (
          <year>2012</year>
          )
          <fpage>1254</fpage>
          -
          <lpage>1271</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Montesdeoca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luengo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>A first approach on big data missing values imputation</article-title>
          ,
          <source>in: Proceedings of 5th International Conference on Internet of Things</source>
          ,
          <article-title>Big Data and Security (IoTBDS)</article-title>
          ,
          <source>SciTePress</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Breve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Polese, RENUVER: A missing value imputation algorithm based on relaxed functional dependencies</article-title>
          , in: To appear
          <source>in Proceedings of the 25th International Conference on Extending Database Technology</source>
          , (EDBT),
          <source>OpenProceedings.org</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chu</surname>
          </string-name>
          , et al.,
          <article-title>Trends in cleaning relational data: consistency and deduplication, Foundations and Trends® in Databases 5 (</article-title>
          <year>2015</year>
          )
          <fpage>281</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          , G. Polese,
          <article-title>Discovering relaxed functional dependencies based on multi-attribute dominance</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>33</volume>
          (
          <year>2021</year>
          )
          <fpage>3212</fpage>
          -
          <lpage>3228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <article-title>Holoclean: holistic data repairs with probabilistic inference</article-title>
          ,
          <source>Proceedings of VLDB Endowment</source>
          <volume>10</volume>
          (
          <year>2017</year>
          )
          <fpage>1190</fpage>
          -
          <lpage>1201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Chen,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Enriching data imputation under similarity rule constraints</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>32</volume>
          (
          <year>2020</year>
          )
          <fpage>275</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>C.-C. Huang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-M. Lee</surname>
          </string-name>
          ,
          <article-title>A grey-based nearest neighbor approach for missing attribute value prediction</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>20</volume>
          (
          <year>2004</year>
          )
          <fpage>239</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          , G. Polese,
          <article-title>Relaxed functional dependencies-A survey of approaches</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>28</volume>
          (
          <year>2016</year>
          )
          <fpage>147</fpage>
          -
          <lpage>165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , I. Ilyas, T. Rekatsinas,
          <article-title>Attention-based learning for missing data imputation in holoclean</article-title>
          ,
          <source>Proceedings of Machine Learning and Systems</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>307</fpage>
          -
          <lpage>325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Khayati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tymchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          ,
          <article-title>Mind the gap: An experimental evaluation of imputation of missing values techniques in time series</article-title>
          ,
          <source>Proceedings VLDB Endowment</source>
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>768</fpage>
          -
          <lpage>782</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Polese, Incremental discovery of functional dependencies with a bit-vector algorithm</article-title>
          ,
          <source>in: Proceedings of Italian Symposium on Advanced Database Systems</source>
          , volume
          <volume>2400</volume>
          <source>of SEBD '19</source>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          .org,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          ,
          <article-title>Incremental discovery of imprecise functional dependencies</article-title>
          ,
          <source>Journal of Data and Information Quality (JDIQ) 12</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>