<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Imputation of Missing Values through Profiling Metadata</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Bernardo</forename><surname>Breve</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Salerno</orgName>
								<address>
									<addrLine>via Giovanni Paolo II, 132, Fisciano (SA</addrLine>
									<postCode>84084</postCode>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Loredana</forename><surname>Caruccio</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Salerno</orgName>
								<address>
									<addrLine>via Giovanni Paolo II, 132, Fisciano (SA</addrLine>
									<postCode>84084</postCode>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vincenzo</forename><surname>Deufemia</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Salerno</orgName>
								<address>
									<addrLine>via Giovanni Paolo II, 132, Fisciano (SA</addrLine>
									<postCode>84084</postCode>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>Polese</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Salerno</orgName>
								<address>
									<addrLine>via Giovanni Paolo II, 132, Fisciano (SA</addrLine>
									<postCode>84084</postCode>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Imputation of Missing Values through Profiling Metadata</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1BDAF79D0642BC11F225BA3512EE071A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data imputation</term>
					<term>Profiling metadata</term>
					<term>Relaxed Functional Dependencies</term>
					<term>Data quality</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Among the several problems related to the management of database instances, missing values represents a crucial factor that could severely compromise the integrity and the meaningfulness of such data representations. Thus, the data imputation research field focuses its efforts on solutions for filling missing values by means of plausible candidates, while still preserving the overall semantic integrity the database instance is characterized by. To keep imputation times low while still keeping high accuracy, the employment of metadata has made its way through research proposals. This discussion paper presents our effort in the definition of RENUVER, a novel data imputation algorithm relying on Relaxed Functional Dependencies (rfds) for identifying value candidates best guaranteeing the semantic integrity of data. Experimental results on real-world datasets highlighted the effectiveness of RENUVER in terms of both filling accuracy and imputation times, also compared to other well-known approaches.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the advent of big data, the presence of missing values inside database instances has been widely recognized as a complex problem to handle, especially for Relational Database Management Systems <ref type="bibr" target="#b0">[1]</ref>. Moreover, several application contexts might require the absence of this data quality issue inside their datasets. For instance, machine learning processes could not provide good accuracy scores if trained on data with many missing values. In general, it is not possible to infer reliable knowledge using datasets with incomplete information <ref type="bibr" target="#b1">[2]</ref>.</p><p>The identification of the best values in a dataset to impute the missing ones is an extremely complex task, since it entails the evaluation of all possible combinations in the value distribution. Most of the approaches proposed in the literature focus on maximizing the number of imputed values, overshadowing the accuracy of single imputations. This discussion paper presents the data imputation algorithm proposed in <ref type="bibr" target="#b2">[3]</ref>, namely RENUVER (Rfd basEd NUll ValuE Repairer), which relies on Relaxed Functional Dependencies (rfds) for imputing missing values within a relational database instance. By adopting the concept of rfds as metadata for supporting the imputation process, we are able to perform a broader analysis of the correlations among SEBD 2022: The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy bbreve@unisa.it (B. Breve); lcaruccio@unisa.it (L. Caruccio); deufemia@unisa.it (V. Deufemia); gpolese@unisa.it (G. Polese) 0000-0002-3898-7512 (B. Breve); 0000-0002-2418-1606 (L. Caruccio); 0000-0002-6711-3590 (V. Deufemia); 0000-0002-8496-2658 (G. Polese) attributes, yielding an accurate and somewhat fast solution for the imputation of missing values within relational database instances. In fact, rfds are still widely considered for detecting and repairing many types of errors, such as duplicates, outliers, and constraint violations <ref type="bibr" target="#b3">[4]</ref>. Thus, we made use them for identifying suitable candidate values for replacing missing ones in the data imputation process. RENUVER exploits rfds for: i) identifying the candidate tuples useful for the imputation of missing values, ii) ranking candidate tuples based on their similarity with respect to the tuples containing missing values, and iii) evaluating each imputation to guarantee the semantic consistency of the whole dataset.</p><p>In particular, RENUVER generates candidate tuples and rank them, according to rfds implying the attribute on which a value is missing. Moreover, the imputation strategy of RENUVER does not alter value consistency with respect to the ones in the original dataset. Finally, RENUVER exploits rfds to also judge whether it is possible to impute a missing value, in order to preserve the integrity of data and to avoid the insertion of inconsistent information.</p><p>The effectiveness of RENUVER has been evaluated on real-world datasets<ref type="foot" target="#foot_0">1</ref> in terms of accuracy, and execution time. In order to extract rfds, we relied on an existing rfd discovery algorithm <ref type="bibr" target="#b4">[5]</ref>, since the problem of discovering rfds is out of the scope of this paper. Moreover, we introduce a novel method for the automatic evaluation of data imputation results, which permits to judge the imputed values even with different syntactical representations. Evaluation results demonstrate that RENUVER outperforms other data imputation approaches <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>.</p><p>The paper is organized as follows: Section 2 provides preliminary notions on rfds. Section 3 introduces RENUVER's logic through the employment of the rfds in the data imputation problem. An experimental evaluation measuring the effectiveness RENUVER is presented in Section 4. Finally, conclusions and further research are reported in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries</head><p>Before describing how we approached the imputation problem through the employment of rfds, let us introduce some propaedeutics notions to our methodology. Functional Dependency. Given a relational database schema ℛ, and 𝑅 = {𝐴 1 , . . . , 𝐴 𝑚 } one of its relation schemas, and a tuple 𝑡 ∈ 𝑟, we use 𝑡[𝐴 𝑖 ], with 0 ≤ 𝑖 ≤ 𝑚, to denote the projection of 𝑡 onto 𝐴 𝑖 ; similarly, for a set of attributes 𝑋 = {𝐴 𝑖 1 , . . . , 𝐴 𝑖 𝑘 }, with 1 ≤ 𝑘 ≤ 𝑚, 𝑡[𝑋] ∈ 𝑑𝑜𝑚(𝐴 𝑖 1 ) × . . . × 𝑑𝑜𝑚(𝐴 𝑖 𝑘 ) represents the projection of 𝑡 onto 𝑋, also denoted with Π 𝑋 (𝑡). An fd on ℛ is a statement 𝑋 → 𝑌 (𝑋 implies 𝑌 ), with 𝑋, 𝑌 ⊆ 𝑎𝑡𝑡𝑟(𝑅), such that, given an instance 𝑟 of 𝑅, 𝑋 → 𝑌 is satisfied in 𝑟 if and only if for each pair of tuples (𝑡</p><formula xml:id="formula_0">1 , 𝑡 2 ) in 𝑟, whenever 𝑡 1 [𝑋] = 𝑡 2 [𝑋], then 𝑡 1 [𝑌 ] = 𝑡 2 [𝑌 ].</formula><p>The sets of attributes 𝑋 and 𝑌 are named Left-Hand-Side (LHS) and Right-Hand-Side (RHS) of the fd, respectively.</p><p>With respect to fd definition, the rfd generalizes the comparison paradigm, by including similarity/distance-based comparisons between tuple projections, also admitting the possibility for a dependency to hold only on a subset of tuples. The latter can be defined through either a coverage measure, quantifying the portion of the dataset on which a dependency holds or a condition restricting the domain on which a dependency can hold <ref type="bibr" target="#b8">[9]</ref>. Since the proposed </p><formula xml:id="formula_1">) ∈ 𝑟 for which 𝑡 1 [𝑋] and 𝑡 2 [𝑋] satisfy the constraint 𝜑 𝑖 [𝑋 𝑖 ] for each 𝑋 𝑖 ∈ 𝑋, then 𝑡 1 [𝑌 ] and 𝑡 2 [𝑌 ] satisfy the constraint 𝜑 𝑖 [𝑌 𝑖 ] for each 𝑌 𝑖 ∈ 𝑌 .</formula><p>For sake of simplicity, in the following, we apply a more compact notation for the constraints, showing only the operator and the numeric threshold associated with each attribute. <ref type="table" target="#tab_0">1</ref>, derived from a database of restaurants in USA. Within this database, each tuple represents a restaurant providing information about its name, address, city, phone number, type of cuisine, and class. The latter is a numeric id associated to the type of cuisine. On such dataset, the following rfd c holds: Name (≤4) − → Phone (≤1) which states that, if two restaurants have a similar name, then they also have a similar phone number. This should be true despite the names and/or the phone numbers of restaurants being written in different ways or using different abbreviations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example. Let us consider the sample relation shown in Table</head><p>From a theoretical point of view, rfd c s permit to use any type of similarity/distance functions, e.g., edit distance, abs differences, and so forth. However, they are usually inherited from the functions involved in the automatic rfd c discovery process <ref type="bibr" target="#b4">[5]</ref>. For the scope of this proposal, without loss of generality, we can consider rfd c s with a single attribute on the RHS, and the associated constraint 𝜑 2 . In particular, we considered 𝜑 2 composed of a distance function, the operator ≤, and a distance threshold.</p><p>A particular type of rfd c is the key-rfd c , which is defined in the following. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Key rfd</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The RENUVER imputation approach</head><p>In this section, we formalize the data imputation problem by defining some of its underlying concepts, then describing the basics of the proposed imputation approach. Let us start defining the concept of missing value.</p><p>Missing value. Given a relation schema 𝑅, defined over a set of attributes 𝑎𝑡𝑡𝑟(𝑅), an instance 𝑟 of 𝑅, an attribute 𝐴 ∈ 𝑎𝑡𝑡𝑟(𝑅), and a tuple 𝑡 ∈ 𝑟, a missing value of tuple 𝑡 on the attribute 𝐴, denoted as</p><formula xml:id="formula_2">𝑡[𝐴] = _, is such that 𝑡[𝐴] is null.</formula><p>Here, 𝑟 is said to be an incomplete instance, and 𝑟 ˆ⊆ 𝑟 contains only incomplete tuples. The general missing value imputation problem is formally defined as follows.</p><p>Missing value imputation problem. Given a relation schema 𝑅, and an instance 𝑟 of 𝑅, for every tuple 𝑡 ∈ 𝑟 and every attribute 𝐴 ∈ 𝑎𝑡𝑡𝑟(𝑅) for which 𝑡[𝐴] = _, the imputation problem consists of finding a plausible value 𝑎 ∈ 𝑑𝑜𝑚(𝐴), such that the database instance 𝑟 ′ resulting from the imputation process does not contain inconsistent values.</p><p>A missing value imputation approach also requires the application of constraints for evaluating the consistency of values at the end of the imputation process. The proposed approach exploits rfds to both guarantee the verification of the semantic consistency, and to drive the searching of meaningful candidates for all missing values. In what follows we define the criteria used by RENUVER for deciding when a tuple can be considered as a plausible candidate, which is based on rfd c s.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Semantically</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Plausible candidate tuple. Given a missing value 𝑡[𝐴]</head><p>=_ over a database instance 𝑟 of a relation schema 𝑅, and an rfd c 𝜙 : 𝑋 Φ 1 → 𝐴 𝜑 2 holding on 𝑟, a tuple 𝑡 ′ ∈ 𝑟 can be considered as a plausible candidate tuple for imputing 𝑡[𝐴] according to 𝜙 iff 𝑡 and 𝑡 ′ , are similar according to the constraints in Φ 1 .</p><p>The candidate tuple generation process performed according to the definition presented above, has to be generalized in order to perform the imputation process on tuples containing more than one missing value, and for each 𝑡 ∈ 𝑟 ˆ.   <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Missing value imputation</head><p>yields to another important issue that RENUVER deals with, i.e., minimizing the number of non-imputed values. Figure <ref type="figure" target="#fig_1">1</ref> summarizes the imputation logic of RENUVER<ref type="foot" target="#foot_1">2</ref> through an example. In particular, we show how the aforesaid definitions empower the imputation of a missing value in the Restaurant dataset, previously introduced. In details, we can identify three major phases yielding the imputation of certain missing value, that are:</p><p>• Pre-processing: during this phase, missing values within a database instance are identi-fied and isolated. Furthermore, RENUVER excludes all key-rfd c s from the set of the rfd c s which can be employed for the imputation of any missing value (see Figure <ref type="figure" target="#fig_1">1</ref>.a).</p><p>• rfd c selection: following the selection of a missing value to impute, during this phase RENUVER identifies all the rfd c s that can be useful for its imputation. rfd c s are then organized in a set of clusters according to their threshold on the RHS (see Figure <ref type="figure" target="#fig_1">1</ref>.b).</p><p>• Imputing missing values: during this phase, RENUVER performs a series of operations leading to the imputation of a missing value by retrieving the value from a set of plausible candidate tuples relying on the same database instance (see Figure <ref type="figure" target="#fig_1">1</ref>.c). In particular, RENUVER iteratively performs the following operations:</p><p>generates a set of plausible candidate tuples that satisfy the LHS constraints of an rfd c s belonging to one of the clusters previously generated.</p><p>computes a distance value for each plausible candidate tuple with respect to the tuple having the missing value. The evaluation is performed by considering the LHS attributes of the rfd c selected. Finally the candidate tuple having the minimum distance is the exploited for the imputation of the missing value.</p><p>verifies whether the imputed value causes a violation of holding rfd c s. In this case, RENUVER selects the next plausible candidate tuple with the lowest distance value. These operations are repeated for each cluster as long as the imputation is not successful.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Evaluation</head><p>In this section, we present a comparative evaluation of RENUVER w.r.t. other approaches exploiting different imputation strategies. In particular, we benchmarked RENUVER against an holistic-machine learning-based approach, namely Holoclean <ref type="bibr" target="#b5">[6]</ref>, (considering its attentionbased expansion module AimNet <ref type="bibr" target="#b9">[10]</ref>) and a differential dependencies guided approach <ref type="bibr" target="#b6">[7]</ref> named Derand, for which we employed the same rfd c s as RENUVER. All evaluations were performed under the same conditions on an iMac Pro with an 8-core CPU and 32GB RAM.</p><p>Datasets. The considered algorithms have been evaluated on two real-world datasets 2 in order to perform a stress test on RENUVER and all compared imputation approaches, aiming to determine their time and memory requirements. To this end, we stopped the executions exceeding 48 hours of execution time and/or 30GB of memory consumption, respectively. Furthermore, in order to obtain an accurate comparison between the imputed values and the expected ones, missing values have been artificially injected in a random manner. Moreover, to avoid an arrangement of missing values over one algorithm, for each missing injection we produced five different datasets, yielding a total of twenty-five variants of the same dataset. The metrics adopted for the comparison are then averaged over each missing rate.</p><p>Evaluation metrics. The effectiveness of the data imputation approaches have been evaluated by considering three different metrics: precision, recall, F1-measure. Which can be formally defined as: precision = |true  Results. The first evaluation session is focused on the Restaurant dataset by considering the following missing rates: [5%, 10%, 20%, 30%, 40%] (see Table <ref type="table" target="#tab_3">2</ref>). We can notice that the fastest approach is Holoclean, whereas Derand registered severely higher execution times, exceeding the 48h time limit starting from the 10% of missing rate. The faster execution times of Holoclean can be justified by the conspicuously lower number of metadata to be processed during the imputation process, i.e., 9 Denial of Constraints, compared to 1961 rfd c s. Nevertheless, RENUVER registered the best performances on all the considered qualitative metrics.</p><p>The second evaluation session is focused on the Physician dataset, by fixing the missing rate and by varying the number of tuples to be considered. This dataset is particularly complex to analyze, since it also contains a high number of attributes (i.e., 13 attributes). In fact, this dataset allowed us to catch a time and/or memory limit for all considered approaches (i.e., RENUVER, Derand, and Holoclean), as shown in Table <ref type="table" target="#tab_3">2</ref>. In particular, we can notice that, on average, both RENUVER and Holoclean registered faster execution times than Derand. In fact, the latter exceeds the time limit of 48h on the datasets having 2072 and 10359 tuples, respectively. On the other hand, Holoclean manages to achieve reasonable executions times, but the huge amount of consumed memory makes it exceed the 30GB memory limit on the dataset having 10359 tuples. Finally, RENUVER also exceeds the time limit on the largest dataset, despite a more reasonable memory consumption. This evaluation session proved the capability of RENUVER to outperform the compared approaches on the considered qualitative metrics. It also emphasized that Derand's execution times are strongly dependent on the number of missing values, whereas although Holoclean provided overall faster execution times, it resulted heavily memory-consuming.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this paper, we proposed RENUVER, a data imputation algorithm that exploits relaxed functional dependencies. The latter enables RENUVER to select and evaluate tuple candidates to be used during the imputation process. The whole imputation process preserves the semantic consistency of the data, by guaranteeing that no imputation can violate any rfd c . Evaluation results demonstrated that RENUVER outperforms recent approaches using different imputation strategies: machine learning-based (Holoclean) and dependency-based (Derand).</p><p>In the future, we would like to extend RENUVER with the possibility of selecting plausible candidate tuples among multiple datasets. Finally, we would like to study the applicability of RENUVER over incremental scenarios, like for example those related to the imputation of time series <ref type="bibr" target="#b10">[11]</ref>, which would require the usage of incremental rfd c discovery algorithms <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>consistent imputation. Given a relation schema 𝑅, defined over a set of attributes 𝑎𝑡𝑡𝑟(𝑅), an instance 𝑟 of 𝑅, and a set of rfd c s, Σ, holding on 𝑟 (𝑟 |= Σ), an instance 𝑟 ′ of 𝑅 resulting from an imputation process 𝐼 over the instance 𝑟, denoted as 𝑟 ′ = 𝐼(𝑟), is semantically consistent iff 𝑟 ′ |= Σ. One of the possible strategies that could guarantee the semantic consistency of the imputation process is to find candidate values for 𝑡[𝐴] = _ by considering a set 𝑇 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ⊆ 𝑟 of plausible candidate tuples for imputing 𝑡[𝐴], such that ∀𝑡 𝑘 ∈ 𝑇 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 , 𝑡 𝑘 [𝐴] ̸ = _ and 𝑡 𝑘 is similar to 𝑡 on some attributes beyond 𝐴.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example of RENUVER imputation on the Restaurant dataset of Table1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>= 2 × precision×recall precision+recall where true represents the correctly imputed missing values at the end of the imputation process, imputed represents all the imputed missing values, and missing the missing values in the dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 A</head><label>1</label><figDesc>sample of the Restaurant dataset. Given a relational database schema ℛ, and 𝑅 = {𝐴 1 , . . . , 𝐴 𝑚 } one of its relation schemas, an rfd c 𝜙 on ℛ 𝑋 Φ 1 → 𝑌 Φ 2</figDesc><table><row><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell>𝑡 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell>𝑡 2</cell><cell>Chinois Main</cell><cell>LA</cell><cell>310-392-9025</cell><cell>French</cell><cell>5</cell></row><row><cell>𝑡 3</cell><cell>Citrus</cell><cell>Los Angeles</cell><cell>213/857-0034</cell><cell>Californian</cell><cell>6</cell></row><row><cell>𝑡 4</cell><cell>Citrus</cell><cell>Los Angeles</cell><cell>_</cell><cell>Californian</cell><cell>6</cell></row><row><cell>𝑡 5</cell><cell>Fenix</cell><cell>Hollywood</cell><cell>213/848-6677</cell><cell>_</cell><cell>5</cell></row><row><cell>𝑡 6</cell><cell>Fenix Argyle</cell><cell>_</cell><cell>213/848-6677</cell><cell>French (new)</cell><cell>5</cell></row><row><cell>𝑡 7</cell><cell>C. Main</cell><cell>Los Angeles</cell><cell>_</cell><cell>French</cell><cell>5</cell></row><row><cell cols="6">approach exploits only rfds relying on a similarity/distance-based tuple comparison method,</cell></row><row><cell cols="6">in what follows we provide only the definition of this type of rfds, known as rfd c . For a more</cell></row><row><cell cols="2">general definition of rfd, see [9].</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>rfd c .</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note><ref type="bibr" target="#b0">(1)</ref> where• 𝑋, 𝑌 ⊆ 𝑎𝑡𝑡𝑟(𝑅); • Φ 1 contains (for each attribute 𝑋 𝑖 ∈ 𝑋) a constraint 𝜑 𝑖 [𝑋 𝑖 ] that can be used to determine whether pair of tuples with values in 𝑑𝑜𝑚(𝑋 𝑖 ) are "similar" enough (likewise for each attribute 𝑌 𝑗 ∈ 𝑌 with 𝜑 𝑗 [𝑌 𝑗 ] ∈ Φ 2 ). More specifically, each 𝜑 𝑖 [𝑋 𝑖 ] (𝜑 𝑗 [𝑌 𝑗 ] resp.) requires the specification of a similarity/distance function defined on the domain of 𝑋 𝑖 (𝑌 𝑗 , resp.), an operator, and a threshold setting the boundaries for the satisfaction of the constraint. holds on a relation instance 𝑟 (denoted by 𝑟 |= 𝜙) if and only if for each pair of tuples (𝑡 1 , 𝑡 2</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Given a relation schema 𝑅, and an instance 𝑟 of 𝑅, an rfd c 𝜙 : 𝑋 Φ 1 → 𝐴 𝜑 2 is said to be key if and only if 𝜙 holds on 𝑟 (𝑟 |= 𝜙), but there is no pair of distinct tuples (𝑡 1 , 𝑡 2 ) ∈ 𝑟, for which 𝑡 1 [𝑋] and 𝑡 2 [𝑋] satisfy all the constraints in Φ 1 [𝑋].</figDesc><table /><note>c .</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>for a tuple. Let 𝑅 be a relational schema defined over a set of attributes 𝑎𝑡𝑡𝑟(𝑅), 𝑟 an instance of 𝑅, 𝑡 a tuple of 𝑟, 𝑍 ⊂ 𝑎𝑡𝑡𝑟(𝑅) a set of attributes such that for each 𝐴 ∈ 𝑍 𝑡[𝐴] = _, and Σ a set of rfd c s holding on 𝑟. An imputation process for 𝑡 consists of selecting a plausible candidate tuple 𝑡 𝑗 for each 𝐴 ∈ 𝑍 such that 𝑡[𝐴] = _, so that 𝑡[𝐴] can be set equal to 𝑡 𝑗 [𝐴]. However, when for a 𝑡[𝐴] = _ it is not possible to identify a plausible candidate tuple guaranteeing a semantic consistent imputation, it is better to leave 𝑡[𝐴] unimputed. Although this strategy has been widely applied in other approaches<ref type="bibr" target="#b6">[7]</ref>, it</figDesc><table><row><cell></cell><cell cols="2">Name (≤ 8) , Phone (≤ 0) , Class (≤ 1) ➝ Type (≤ 0)</cell><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell></cell><cell cols="2">Class (≤ 0) ➝ Type (≤ 5)</cell><cell>t 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell cols="2">City (≤ 2) ➝ Phone (≤ 2)</cell><cell>t 2</cell><cell>Chinos Main</cell><cell>LA</cell><cell>310-932-9025</cell><cell>French</cell><cell>5</cell></row><row><cell></cell><cell cols="2">Name (≤ 4) ➝ Phone (≤ 1)</cell><cell>t 3</cell><cell>Citrus</cell><cell cols="2">Los Angeles 213/857-0034</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell cols="2">Name (≤ 8) , Phone (≤ 0) ➝ City (≤ 9)</cell><cell>t 4</cell><cell>Citrus</cell><cell>Los Angeles</cell><cell>_</cell><cell>Californian</cell><cell>6</cell></row><row><cell cols="3">Name (≤ 6) , City (≤ 9) ➝ Phone (≤ 0) Phone (≤ 1) ➝ Class (≤ 0) ... ...</cell><cell>t 5 t 6 t 7</cell><cell>Fenix Fenix Argyle C. Main</cell><cell cols="3">Hollywood 213/848-6677 _ 213/848-6677 French (new) _ Los Angeles _ French</cell><cell>5 5 5</cell></row><row><cell cols="3">b) RFDc selection</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell>0</cell><cell cols="2">: Name (≤ 6) , City (≤ 9) ➝ Phone (≤ 0) : Name (≤ 6) , City (≤ 9) ➝ Phone (≤ 0)</cell><cell>t 2 t 3</cell><cell>Chinos Main Citrus</cell><cell cols="2">LA Los Angeles 213/857-0034 310-932-9025</cell><cell>French Californian</cell><cell>5 6</cell></row><row><cell></cell><cell cols="2">: Name (≤ 4) ➝ Phone (≤ 1) : Name (≤ 4) ➝ Phone (≤ 1)</cell><cell>t 4 t 5</cell><cell>Citrus Fenix</cell><cell cols="2">Los Angeles Hollywood 213/848-6677 _</cell><cell>Californian _</cell><cell>6 5</cell></row><row><cell></cell><cell cols="2">: City (≤ 2) ➝ Phone (≤ 2)</cell><cell>t 6</cell><cell>Fenix Argyle</cell><cell>_</cell><cell cols="2">213/848-6677 French (new)</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell>: City (≤ 2) ➝ Phone (≤ 2)</cell><cell>t 7</cell><cell>C. Main</cell><cell>Los Angeles</cell><cell>_</cell><cell>French</cell><cell>5</cell></row><row><cell cols="3">c) Imputing missing values</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell>Phone</cell><cell>: Name (≤ 6) , City (≤ 9) ➝ Phone (≤ 0)</cell><cell>t 2 t 3</cell><cell>Chinos Main Citrus</cell><cell cols="2">LA Los Angeles 213/857-0034 310-932-9025</cell><cell>French Californian</cell><cell>5 6</cell></row><row><cell></cell><cell>Phone</cell><cell>Name (≤ 4) ➝ Phone (≤ 1)</cell><cell>4 t 5</cell><cell>Citrus Fenix</cell><cell cols="2">Los Angeles Hollywood 213/848-6677 _</cell><cell>Californian _</cell><cell>6 5</cell></row><row><cell></cell><cell></cell><cell>: City (≤ 2) ➝ Phone (≤ 2)</cell><cell>t 6 t 7</cell><cell>Fenix Argyle C. Main</cell><cell>_ Los Angeles</cell><cell cols="2">213/848-6677 French (new) _ French</cell><cell>5 5</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Imputing t 7 [Phone] with t 3 [Phone]</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 2</cell><cell>Chinos Main</cell><cell>LA</cell><cell>310-932-9025</cell><cell>French</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 3</cell><cell>Citrus</cell><cell cols="2">Los Angeles 213/857-0034</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell></cell><cell>violated!</cell><cell>t 4 t 5</cell><cell>Citrus Fenix</cell><cell cols="2">Los Angeles Hollywood 213/848-6677 _</cell><cell>Californian _</cell><cell>6 5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 6</cell><cell>Fenix Argyle</cell><cell>_</cell><cell cols="2">213/848-6677 French (new)</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 7</cell><cell>C. Main</cell><cell cols="2">Los Angeles 213/857-0034</cell><cell>French</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">Imputing t 7 [Phone] with t 2 [Phone]</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Name</cell><cell>City</cell><cell>Phone</cell><cell>Type</cell><cell>Class</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 1</cell><cell>Granita</cell><cell>Malibu</cell><cell>310/456-0488</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 2</cell><cell>Chinos Main</cell><cell>LA</cell><cell>310-932-9025</cell><cell>French</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 3</cell><cell>Citrus</cell><cell cols="2">Los Angeles 213/857-0034</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 4</cell><cell>Citrus</cell><cell>Los Angeles</cell><cell>_</cell><cell>Californian</cell><cell>6</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 5</cell><cell>Fenix</cell><cell cols="2">Hollywood 213/848-6677</cell><cell>_</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 6</cell><cell>Fenix Argyle</cell><cell>_</cell><cell cols="2">213/848-6677 French (new)</cell><cell>5</cell></row><row><cell></cell><cell></cell><cell></cell><cell>t 7</cell><cell>C. Main</cell><cell cols="2">Los Angeles 310-932-9025</cell><cell>French</cell><cell>5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Comparative evaluation of RENUVER on the Restaurants and Physician datasets.</figDesc><table><row><cell cols="2">Dataset</cell><cell cols="6">#Tuples #Attributes #Missing val. #rfdcs #DCs</cell><cell cols="2">Dataset</cell><cell>#Tuples</cell><cell cols="2">#Attributes #Missing val. #rfdcs #DCs</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>259 (5%)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>104 (0.05%)</cell><cell></cell><cell>13 (1%)</cell><cell>1430</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">518 (10%)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>208 (0.1%)</cell><cell></cell><cell>27 (1%)</cell><cell>2553</cell></row><row><cell cols="2">Restaurant</cell><cell>864</cell><cell>6</cell><cell cols="2">1037 (20%)</cell><cell>1961</cell><cell>9</cell><cell cols="2">Physician</cell><cell>1036 (0.5%)</cell><cell>13</cell><cell>135 (1%)</cell><cell>3895</cell><cell>74</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">1555 (30%)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>2072 (1%)</cell><cell></cell><cell>269 (1%)</cell><cell>5708</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">2074 (40%)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>10359 (5%)</cell><cell></cell><cell>1319 (1%)</cell><cell>6137</cell></row><row><cell cols="6">Dataset Approach Recall Precision F1-Meas.</cell><cell>Time</cell><cell>Mem.</cell><cell cols="5">Dataset Approach Recall Precision F1-Meas.</cell><cell>Time</cell><cell>Mem.</cell></row><row><cell></cell><cell></cell><cell></cell><cell>0.329</cell><cell>0.864</cell><cell cols="3">0.476 14m 29s 1.38 GB</cell><cell></cell><cell></cell><cell></cell><cell>0.338</cell><cell>1</cell><cell>0.505</cell><cell>470ms 1.48 GB</cell></row><row><cell>Restaurant</cell><cell>(varying the missing rate)</cell><cell>RENUVER Derand Holoclean</cell><cell>0.296 0.294 0.258 0.232 0.295 ----0.275 0.099 0.071 0.064</cell><cell>0.832 0.845 0.828 0.726 0.419 ----0.544 0.218 0.153 0.192</cell><cell cols="3">0.437 23m 21s 1.31 GB 0.436 33m 20s 1.36 GB 0.394 36m 37s 1.37 GB 0.349 30m 23s 1.38 GB 0.345 47h 13m 7.21 GB -TL --TL --TL --TL -0.362 14s 0.99 GB 0.131 15s 0.99 GB 0.095 14s 0.99 GB 0.095 11s 0.78 GB</cell><cell>Physician</cell><cell>(varying the number of tuples)</cell><cell>RENUVER Derand Holoclean</cell><cell>0.328 0.326 0.254 -0.121 0.125 0.110 --0.230 0.115 0.097 0.156</cell><cell>0.547 0.607 0.483 -0.210 0.190 0.121 --0.300 0.120 0.114 0.167</cell><cell>0.410 0.424 0.333 -0.151 0.150 9h 49m 3.32 GB 3s 1.79 GB 1m 19s 0.71 GB 15m 1s 1.30 GB TL -1h 10s 1.25 GB 0.115 25h 40m 8.21 GB -TL --TL -0.599 7s 3.95 GB 0.117 12s 5.15 GB 0.104 1m 8s 6.16 GB 0.161 8m 21s 26.89 GB</cell></row><row><cell></cell><cell></cell><cell></cell><cell>0.165</cell><cell>0.419</cell><cell>0.237</cell><cell cols="2">10s 0.79 GB</cell><cell></cell><cell></cell><cell></cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>ML</cell></row><row><cell cols="8">TL: time limit of 48 hours exceeded − ML: memory limit of 30 GB exceeded</cell><cell cols="5">TL: time limit of 48 hours exceeded − ML: memory limit of 30 GB exceeded</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/DastLab/RENUVER-evaluation-datasets</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">A deep overview of RENUVER, together with a more exhaustive evaluation has been carried out in<ref type="bibr" target="#b2">[3]</ref>.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Customized policies for handling partial information in relational databases</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">V</forename><surname>Martinez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Molinaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Grant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Subrahmanian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="1254" to="1271" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A first approach on big data missing values imputation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Montesdeoca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Luengo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Maillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>García-Gil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Herrera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 5th International Conference on Internet of Things, Big Data and Security (IoTBDS), SciTePress</title>
				<meeting>5th International Conference on Internet of Things, Big Data and Security (IoTBDS), SciTePress</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="315" to="323" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">RENUVER: A missing value imputation algorithm based on relaxed functional dependencies</title>
		<author>
			<persName><forename type="first">B</forename><surname>Breve</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Deufemia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Polese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on Extending Database Technology</title>
				<meeting>the 25th International Conference on Extending Database Technology</meeting>
		<imprint>
			<publisher>OpenProceedings</publisher>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Trends in cleaning relational data: consistency and deduplication</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Ilyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends® in Databases</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="281" to="393" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Discovering relaxed functional dependencies based on multi-attribute dominance</title>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Deufemia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Polese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="3212" to="3228" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Holoclean: holistic data repairs with probabilistic inference</title>
		<author>
			<persName><forename type="first">T</forename><surname>Rekatsinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Ilyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ré</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of VLDB Endowment</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="1190" to="1201" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Enriching data imputation under similarity rule constraints</title>
		<author>
			<persName><forename type="first">S</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="275" to="287" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A grey-based nearest neighbor approach for missing attribute value prediction</title>
		<author>
			<persName><forename type="first">C.-C</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-M</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Intelligence</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="239" to="252" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Relaxed functional dependencies-A survey of approaches</title>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Deufemia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Polese</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="147" to="165" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Attention-based learning for missing data imputation in holoclean</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ilyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rekatsinas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of Machine Learning and Systems</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="307" to="325" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mind the gap: An experimental evaluation of imputation of missing values techniques in time series</title>
		<author>
			<persName><forename type="first">M</forename><surname>Khayati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lerner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tymchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cudré-Mauroux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings VLDB Endowment</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="768" to="782" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Incremental discovery of functional dependencies with a bit-vector algorithm</title>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cirillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Deufemia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Polese</surname></persName>
		</author>
		<idno>CEUR-WS.org</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Italian Symposium on Advanced Database Systems</title>
				<meeting>Italian Symposium on Advanced Database Systems</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2400</biblScope>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
	<note>SEBD &apos;19</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Incremental discovery of imprecise functional dependencies</title>
		<author>
			<persName><forename type="first">L</forename><surname>Caruccio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Cirillo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Data and Information Quality (JDIQ)</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="1" to="25" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
