<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Injecting Conceptual Constraints into Data Fabrics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paolo Ciaccia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Martinenghi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Torlone</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Elettronica, Informazione e Bioingegneria</institution>
          ,
          <addr-line>Politecnico di Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Informatica - Scienza e Ingegneria, Università di Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dipartimento di Ingegneria, Università Roma Tre</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Unlike traditional sources managed by DBMSs, data lakes do not provide any guarantee about the quality of the data they store, which can severely limit their use for analysis purposes. The recent notion of data fabric, which introduces a semantic layer allowing uniform access to underlying data sources, makes it possible to tackle this problem by specifying conceptual constraints to which data sources must adhere to be considered meaningful. Along these lines, in this discussion paper, we exploit the data fabric approach by proposing a general methodology for data curation in data fabrics based on: (i) the specification of integrity constraints over a conceptual representation of the data lake and (ii) the automatic translation and enforcement of such constraints over the actual data. We discuss the advantages of this idea and the challenges behind its implementation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In traditional big data analysis, activities such as cleaning, transforming, and integrating source
data are essential but they usually make knowledge extraction a very long and tedious process.
For this reason, data-driven organizations have recently adopted an agile strategy that dismisses
any data processing before their actual consumption. This is done by building and maintaining
a repository, called “data lake”, for storing any kind of data in its native format. A dataset in the
lake is usually just a collection of raw data, either gathered from internal applications (e.g., logs
or user-generated data) or from external sources (e.g., open data), that is made persistent on a
storage system, usually distributed, “as is”, without going through an ETL process.</p>
      <p>
        Unfortunately, reducing the engineering efort upfront just delays the traditional issues
of data pre-processing since this approach does not eliminate the need for high-quality data
and schema understanding. Therefore, to guarantee reliable results, a long process of data
preparation (a.k.a. data wrangling) is required over the portion of the data lake that is relevant
for a business purpose before any meaningful analysis can be performed on it [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. This
process typically consists of pipelines of operations such as: source and feature selection, data
enrichment, data transformation, data curation, and data integration. A number of
state-ofthe-art applications can support these activities, including (i) data and metadata catalogs, for
understanding and selecting the appropriate datasets [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">5, 6, 7, 8</xref>
        ]; (ii) tools for full-text indexing,
for providing keyword search and other advanced search capabilities [
        <xref ref-type="bibr" rid="ref10 ref7 ref9">7, 9, 10</xref>
        ]; (iii) data profilers,
for collecting meta-information from datasets [
        <xref ref-type="bibr" rid="ref1 ref11 ref9">1, 9, 11</xref>
        ]; (iv) distributed data processing engines
like Spark [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and (v) tools and libraries for data manipulation and analysis, such as Pandas1
and Scikit-learn,2 in conjunction with data science notebooks, such as Jupyter3 and Zeppelin.4
Still, data preparation is an involved, fragmented, and time-consuming process, thus making
the extraction of valuable knowledge from the lake hard.
      </p>
      <p>
        In this scenario, the recent data fabric approach comes to the rescue, by proposing the
construction and maintenance of a semantic representation of the underlying data for data
discovery, understanding, and searching [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ]. We argue that this can also be profitably
exploited for evaluating and improving the quality of data. This is because a representation
of the real-world concepts and relationships that the data capture (e.g., employees, customers,
products, locations, sales, and so on) provides an ideal setting for identifying the constraints
that hold in the application domain of reference (e.g., the fact that, for business purposes, all
the products for sale must be classified in categories). If we are able to map and enforce such
constraints on the underlying data, their quality naturally improves and makes the subsequent
analysis more efective and less prone to errors.
      </p>
      <p>
        Building on this idea, in this paper5 we propose a principled approach to data curation in data
lakes based on the identification and enforcement of conceptual constraints. The approach is
based on the following main activities: (1) the gathering of metadata from the data lake (or from
a portion of interest for a specific business goal) in the form of a conceptual schema, (2) the
analysis of the conceptual schema and the specification of integrity constraints over it, (3) the
automatic translation of the constraints defined at the conceptual level into constraints over
the datasets in the data lake, (4) the enforcement of the integrity constraints so obtained over
the actual data. While there is a large body of works on extracting and collecting metadata
from data sources [
        <xref ref-type="bibr" rid="ref1 ref11 ref9">1, 9, 11</xref>
        ] and on repairing data given a set of integrity constraints [
        <xref ref-type="bibr" rid="ref17 ref18 ref19">17, 18, 19</xref>
        ],
corresponding to steps (1) and (4) above, to our knowledge the issue of exploiting conceptual
representations for data lake curation has never been explored before.
      </p>
      <p>The rest of the paper is devoted to the presentation of some initial steps towards this goal.</p>
      <p>Specifically, in Section 2 we state the problem by recalling the typical data life-cycle in a data
lake and by illustrating, in this framework, our proposal for data curation. Then, in Section 3 we
state the basic notions (datasets, schemas, constraints, and mappings) underlying our approach.
This is done by means of very general definitions, in order to make the approach independent
of any specific data model and format. In Section 4 we provide some details of our solution
through an example. Finally, in Section 5 we discuss the related works, the main issues involved
in the implementation of our proposal, and the work that needs to be done to tackle these issues.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Using Conceptual Constraints in Data Curation</title>
      <p>Metadata plays a fundamental role in typical activities part of the life-cycle of data analysis, such
as data ingestion, data integration, data preparation, and knowledge extraction. Its management</p>
      <sec id="sec-2-1">
        <title>1https://pandas.pydata.org/</title>
        <p>
          2https://scikit-learn.org/
3https://jupyter.org/
4https://zeppelin.apache.org/
5A preliminary version has appeared in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          ProjNo {key}
*
*
involves building and maintaining a repository of information describing all the various kinds
of data that are produced in the above stages of data processing [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>In order to harmonize the data stored in the data lake with the other components of the
data fabric, a conceptual representation, i.e., a conceptual schema, of the metadata describing
the content of interest of the data lake is needed. This includes concepts (such as entities,
relationships, and generalizations) that map to the actual components (such as attributes,
documents, and labels) of datasets stored in the data lake.</p>
        <p>The availability of a conceptual schema  of data lake  can provide a number of important
benefits: (i) it allows the analysts to have a general and system-independent vision of the data
available in , (ii) it provides an abstract view of the data lake content which can be used to
define and possibly specify queries over , and (iii) it allows the specification of real-world
constraints that, enforced on , improve the overall quality of its content.</p>
        <p>
          In this paper, we focus on Problem (iii) above that, to the best of our knowledge, has not been
studied before, apart from our preliminary study [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. As shown graphically in Figure 1, the
methodology we propose requires the tasks that follow.
        </p>
        <p>
          1. A (portion of interest of a) data lake  is initially transformed into a “standardized” format,
obtained by adapting source data to that of the system chosen for storing a “curated”
version of .
2. The skeleton ̂︀ of a conceptual schema is built from . Basically, ̂︀ includes the main
entities and relationship involved in  as well as a mapping between the components of 
and the elements of ̂︀. This task can be done manually and/or using available techniques
and tools for semantic annotation or column-type discovery in data lakes [
          <xref ref-type="bibr" rid="ref20 ref21 ref22">20, 21, 22</xref>
          ].
3. ̂︀ is refined, possibly incrementally, into an “evolved” schema  by adding a collection of
real-world constraints. For instance, by stating that an entity is a special case of another
entity or that an entity can only participate in a single occurrence of a certain relationship.
Typically, this step requires knowledge of the specific domain (e.g., that a department has
a single manager).
        </p>
        <p>
          4. The constraints represented by  are mapped to constraints  over the actual data stored
in .  can be expressed in several ways, depending on the system used to store and
manage .
5. The constraints  are checked on  and, possibly, enforced on  by means of a repairing
technique [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], if any violation occurs. Again, this can be done in several ways, depending
on the tools available for storing, querying, and manipulating data in the data lake [
          <xref ref-type="bibr" rid="ref19 ref24">19, 24</xref>
          ].
        </p>
        <p>We can notice that in the process above no specific work has specifically addressed point 4.
In the rest of the paper, we focus on this challenging task by first introducing the relevant
elements of the problem (Section 3), and by then illustrating the main ideas for its solution
through an example (Section 4).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data and Metadata Management</title>
      <p>Let us now fix some basic notions that we will refer to in the following. Our definitions are
deliberately abstract so as to be as general as possible, without the need to commit to any
specific data lake model and format.</p>
      <p>Dataset. We consider that a dataset (, ) has a name  and is composed of a set  of
attributes and a set  of data items. Each data item in  is a set of attribute-value pairs, with
attributes taken from . Figure 2 shows an example of datasets still in a “raw” format, reporting
data about the finance and tech departments of a company. After curation, the so-obtained
datasets also take part in the data lake.</p>
      <p>Data Lake. For our purposes, a data lake  = (, ℳ) can be modeled as a collection</p>
      <p>Name
Homer
Marge
Bart</p>
      <p>Lisa
D_Dept
DeptCode
D01
D02</p>
      <p>Salary
100K
150K
80K
50K</p>
      <p>DeptCode
D01
D02
D02
D01
of datasets having distinct names, plus a set of metadata ℳ, including a (possibly empty) set
of constraints  on the datasets. We also refer to  as the instance of . Figure 3 shows
a collection of partially curated datasets in  (D_Emp, D_Dept, and D_Act) that have been
obtained from the raw datasets of Figure 2 by unnesting employees from departments and
activities from employees. The metadata include, e.g., cross-dataset constraints, such as the
fact that DeptCodes appearing in D_Emp must also appear in D_Dept, as well as, say, domain
constraints such as the fact that Level must be an integer (so employee E_05 violates this).
Conceptual schema and constraints. We consider that the domain of interest for analysis
purposes is represented by a conceptual schema , expressed by means of a suitable language
ℒ . Examples are Entity-Relationship (E-R) diagrams, RDF(S), UML’s class diagrams, and
Description Logic languages, such as those underlying the OWL 2 standard and its profiles. 6.
Besides specific diferences, each of these languages allows for the definition of concepts (i.e.,
classes of objects, entities), relationships (a.k.a. as roles) among them, and properties (of concepts
and relationships). A conceptual schema includes conceptual constraints that characterize the
elements of the schema . For instance, in the E-R formalism we can state that two entities 1
and 2 have a common generalizing entity  (subset(1, ) and subset(2, )) and that
1 and 2 are disjoint (disjoint(1, 2)).</p>
      <p>Mapping. The connection between the conceptual schema  and the data lake (, ℳ) is
based on a mapping  , i.e., a set of assertions relating the elements in  to the datasets in
. For instance, an entity Departments in  could be mapped to the projection of dataset
D_Dept on just the attributes DeptCode and DeptName, with the MgrNo attribute representing
a relationship between Departments and Employees.</p>
      <p>Problem statement. Our goal is to check whether an instance  satisfies the conceptual
constraints represented by schema . To this end, we formally define constraint satisfaction as
follows.</p>
      <p>Definition 1.</p>
      <p>An instance  is legal with respect to a conceptual schema  through a mapping</p>
      <sec id="sec-3-1">
        <title>6https://www.w3.org/TR/owl2-profiles/</title>
        <p>Departments
DDDeeeppptttCCCooodddeee
DeptName
1:1</p>
        <sec id="sec-3-1-1">
          <title>Direct</title>
          <p>1:N</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Work</title>
          <p>if  () yields a conceptual instance that satisfies all the constraints in , and shortly indicate
this circumstance as  |=  ().</p>
          <p>Clearly, explicitly applying the mapping to generate a conceptual instance is impractical for
performance reasons. Therefore, in this paper we propose a diferent solution, which essentially
consists in transforming the constraints in  into corresponding constraints on the data lake.
This leads to the following problem:
given a data lake  = (, ℳ), a conceptual schema , a mapping  between  and ,
determine a set of constraints  on  such that  satisfies ℳ ∪  if and only if  is legal
with respect to  through mapping  , i.e.: ℳ ∪  |=  ⇐⇒  |=  ().</p>
          <p>Once the conceptual constraints on the data lake  have been generated, they may be used
to check if  is consistent and, eventually, to repair .</p>
          <p>
            Before proceeding, we remark that, unlike OBDA (Ontology-Based Data Access)
approaches [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ], we do not use  for the purpose of obtaining results from  given a query
on . Rather,  is the key ingredient to define and enforce on the data lake the conceptual
constraints in .
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. An Example</title>
      <p>The E-R schema  in Figure 4 describes a simplified scenario regarding the departments of a
company. The schema includes structural information (such as the fact that Employees have a
Name and a Salary) as well as constraints (such as the fact that Managers are also Employees or
that each Department has at least one Employee). Notice that the schema  deliberately does
not include the NoHours attribute that characterizes each activity of a researcher (see dataset
D_Act in Figure 3). This is to emphasize that  only focuses on that part of the data lake that is
of interest for the analysis, which here does not include, as we assume, the NoHours attribute.</p>
      <p>Besides basic constraints on attributes, such as non-nullability and domain of admitted values
(which, in the following, we will omit for brevity), relevant constraints in , here informally
described as self-explanatory predicates, are:
unique(DeptCode,Departments)
. . .
subset(Managers,Employees)
subset(Researchers,Employees)
disjoint(Managers,Researchers)
card(Departments,Direct,1,1)
card(Employees,Work,1,1)
card(Departments,Work,1,n)
every employee is identified by EmpNo
every department is identified by DeptCode
managers are employees
researchers are employees
no manager is a researcher
every department has exactly one manager
every employee works in exactly one department
every department has at least one employee
Now, consider the datasets in Figure 3, whose structure is reported below for the sake of clarity:
D_Emp(EmpNo,Name,Salary,DeptCode,Level,CV,PID,PName,Budget),
D_Dept(DeptCode,DeptName,MgrNo),</p>
      <p>D_Act(ResNo,Activity,NoHours).</p>
      <p>Then, we can define the mapping  by means of the following statements, one for each entity
and relationship in :7</p>
      <p>The constraints  corresponding to this mapping, include, among others, the following
ones:
• Uniqueness of DeptCode:</p>
      <p>1 : ∀1, 2 ∈ D_Dept : 1.DeptCode = 2.DeptCode ⇒ 1 = 2
• Disjointness of managers and researchers:</p>
      <p>2 : ∀1 ∈ D_Emp : ¬(NotNull(1.Level) ∧ NotNull(1.CV))
• Departments are directed by managers:</p>
      <p>3 : ∀1 ∈ D_Dept∃2 ∈ D_Emp : 1.MgrNo = 2.EmpNo ∧ NotNull(2.Level)
• Each department has at least one employee:</p>
      <p>4 : ∀1 ∈ D_Dept∃2 ∈ D_Emp : 1.DeptCode = 2.DeptCode
• Each employee has activities only within a project:</p>
      <p>5 : ∀1 ∈ D_Act∃2 ∈ D_Emp : 1.ResNo = 2.EmpNo ∧ NotNull(2.PID)
7The underscore symbol indicates (anonymous) variables not relevant to the statement. The adopted notation is
therefore positional like in, e.g., Datalog.</p>
      <p>Once the constraints in  have been generated, they can be easily converted into proper
queries so as to detect possible violations. For instance, constraint 3 corresponds to the
following query expressed in SQL:
SELECT D.DeptCode, E.EmpNo FROM D_Dept D, D_Emp E
WHERE D.MgrNo = E.EmpNo AND E.Level IS NULL</p>
      <p>Consider now the datasets in Figure 3. It is apparent that  violates the following conceptual
constraints in :
• Employee E07 has both attributes Level and CV not null, thus violating constraint 2;
• Department D02 is managed by an employee (E10) that is not a manager, contradicting
constraint 3, as the above SQL query would reveal;
• Constraint 5 is also violated, since employee E12 appears in the dataset D_Act although
she does not participate in any project.</p>
      <p>
        Once the above violations are discovered, the datasets can be cleaned using some of the
available methods (see, e.g., [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]).
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusions</title>
      <p>In this paper we have put forward the idea of generating constraints on the datasets of a data
lake by exploiting a high-level, conceptual representation, in order to improve the quality of
data and, consequently, that of subsequent analysis.</p>
      <p>
        Our approach can be regarded as complementary to those that aim to curate data by directly
specifying constraints through ad-hoc languages/tools. For instance, CLAMS [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] adopts the
RDF data model for representing data in the curated layer, and defines conditional denial
constraints over views of the data lake defined using SPARQL queries. Although this is a
powerful approach, able to exploit the expressivity of SPARQL, it leaves the full burden of
specifying constraints (and queries) to the designer/analyst. Furthermore, there is no guarantee
that the set of constraints is consistent, i.e., non-contradictory. The Deequ system [
        <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
        ] is an
open-source library aimed at supporting the automatic verification of data quality. However,
the constraints available in the library apply to a single dataset, thus inter-dataset constraints
cannot be specified.
      </p>
      <p>A major challenge of our approach is to demonstrate that the propagation of conceptual
constraints, i.e., the generation of , can be fully automated. Although in the past decades a
large body of work has investigated how to automatically translate ER schemas to relational
tables (see, e.g., [28]), much less is known for other conceptual models and/or data models such
as RDF. Our view of the problem currently considers (automatic) constraint propagation as a
two-step process: (1) first, one operates a canonical transformation of the conceptual schema
 into a schema  in the target data model of the curated layer; (2) then,  is mapped
to the actual . Besides the obvious advantage of splitting the complexity of the problem into
two well-defined sub-problems, this approach can exploit in step (2) all that is known about the
equivalence of schemas ( and  in our case) expressed in the same formalism.
ACM, 2019, pp. 1993–1996. URL: https://doi.org/10.1145/3299869.3320210. doi:10.1145/
3299869.3320210.
[28] V. M. Markowitz, A. Shoshani, Representing extended entity-relationship structures in
relational databases: A modular approach, ACM Trans. Database Syst. 17 (1992) 423–464.
URL: https://doi.org/10.1145/132271.132273. doi:10.1145/132271.132273.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abedjan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>The data civilizer system</article-title>
          ,
          <source>in: CIDR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Heudecker</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. White,</surname>
          </string-name>
          <article-title>The data lake fallacy: All water and little substance</article-title>
          ,
          <source>Gartner Report G 264950</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Terrizzano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Colino</surname>
          </string-name>
          ,
          <article-title>Data wrangling: The challenging journey from the wild to the lake</article-title>
          ,
          <source>in: CIDR</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciaccia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinenghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          ,
          <article-title>Foundations of context-aware preference propagation</article-title>
          ,
          <source>J. ACM</source>
          <volume>67</volume>
          (
          <year>2020</year>
          ) 4:
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          :
          <fpage>43</fpage>
          . URL: https://doi.org/10.1145/3375713. doi:
          <volume>10</volume>
          .1145/ 3375713.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>CKAN:</surname>
          </string-name>
          <article-title>The open source data portal software</article-title>
          , http://ckan.org/, (accessed November,
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Bhardwaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Elmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Parameswaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Subramanyam</surname>
          </string-name>
          , E. Wu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Collaborative data analytics with DataHub</article-title>
          ,
          <source>PVLDB</source>
          <volume>8</volume>
          (
          <year>2015</year>
          )
          <fpage>1916</fpage>
          -
          <lpage>1927</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Korn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Whang</surname>
          </string-name>
          , Goods:
          <article-title>Organizing google's datasets</article-title>
          ,
          <source>in: SIGMOD</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sreekanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramachandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Donsky</surname>
          </string-name>
          , G. Fierro,
          <string-name>
            <given-names>C.</given-names>
            <surname>She</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Steinbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          , E. Sun,
          <article-title>Ground: A data context service</article-title>
          ,
          <source>in: CIDR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <surname>Constance:</surname>
          </string-name>
          <article-title>An intelligent data lake system</article-title>
          , in: F. Özcan, G. Koutrika, S. Madden (Eds.),
          <source>Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2016</year>
          , San Francisco, CA, USA, June 26 - July 01,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>2097</fpage>
          -
          <lpage>2100</lpage>
          . URL: https://doi.org/10.1145/2882903.2899389. doi:
          <volume>10</volume>
          .1145/2882903.2899389.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciaccia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinenghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          ,
          <article-title>Preference queries over taxonomic domains</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>1859</fpage>
          -
          <lpage>1871</lpage>
          . URL: http://www.vldb.org/pvldb/vol14/ p1859-martinenghi.pdf.
          <source>doi:10.14778/3467861</source>
          .3467874.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Papenbrock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Finke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zwiener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <article-title>Data profiling with metanome</article-title>
          ,
          <source>PVLDB</source>
          <volume>8</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wendell</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. Das</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Armbrust</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dave</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Venkataraman</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ghodsi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Shenker</surname>
            ,
            <given-names>I. Stoica</given-names>
          </string-name>
          ,
          <article-title>Apache spark: a unified engine for big data processing</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nargesian</surname>
          </string-name>
          , E. Zhu,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Pu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Arocena</surname>
          </string-name>
          ,
          <source>Data lake management: Challenges and opportunities 12</source>
          (
          <year>2019</year>
          )
          <fpage>1986</fpage>
          -
          <lpage>1989</lpage>
          . URL: https://doi.org/10.14778/3352063.3352116. doi:
          <volume>10</volume>
          .14778/3352063.3352116.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Korn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Olston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Polyzotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <article-title>Managing google's data lake: an overview of the goods system</article-title>
          ,
          <source>IEEE Data Eng. Bull</source>
          .
          <volume>39</volume>
          (
          <year>2016</year>
          )
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>Data fabric architecture is key to modernizing data management and integration, 2021</article-title>
          . URL: https://www.gartner.
          <article-title>com/smarterwithgartner/ data-fabric-architecture-is-key-to-modernizing-data-management-and-integration.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciaccia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinenghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Torlone</surname>
          </string-name>
          ,
          <article-title>Conceptual constraints for data quality in data lakes</article-title>
          ,
          <source>in: Proceedings of the 1st Italian Conference on Big Data and Data Science (ITADATA</source>
          <year>2022</year>
          ),
          <year>2022</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>122</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3340</volume>
          /paper34.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yakout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Neville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <article-title>Guided data repair</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>4</volume>
          (
          <year>2011</year>
          )
          <fpage>279</fpage>
          -
          <lpage>289</lpage>
          . URL: https://doi.org/10.14778/1952376.1952378. doi:
          <volume>10</volume>
          .14778/1952376.1952378.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>A unified model for data and constraint repair</article-title>
          , in: S. Abiteboul,
          <string-name>
            <given-names>K.</given-names>
            <surname>Böhm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          Tan (Eds.),
          <source>Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16</source>
          ,
          <year>2011</year>
          , Hannover, Germany, IEEE Computer Society,
          <year>2011</year>
          , pp.
          <fpage>446</fpage>
          -
          <lpage>457</lpage>
          . URL: https://doi.org/10.1109/ICDE.
          <year>2011</year>
          .
          <volume>5767833</volume>
          . doi:
          <volume>10</volume>
          .1109/ICDE.
          <year>2011</year>
          .
          <volume>5767833</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Geerts</surname>
          </string-name>
          , G. Mecca,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          ,
          <article-title>Cleaning data with llunatic</article-title>
          ,
          <source>VLDB J</source>
          .
          <volume>29</volume>
          (
          <year>2020</year>
          )
          <fpage>867</fpage>
          -
          <lpage>892</lpage>
          . URL: https://doi.org/10.1007/s00778-019-00586-5. doi:
          <volume>10</volume>
          .1007/ s00778-019-00586-5.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zgraggen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Satyanarayan</surname>
          </string-name>
          , T. Kraska, c. Demiralp,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hidalgo</surname>
          </string-name>
          ,
          <article-title>Sherlock: A deep learning approach to semantic data type detection</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD, KDD '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>1500</fpage>
          -
          <lpage>1508</lpage>
          . URL: https://doi.org/10.1145/3292500.3330993. doi:
          <volume>10</volume>
          .1145/3292500.3330993.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Data-driven domain discovery for structured datasets</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>953</fpage>
          -
          <lpage>967</lpage>
          . URL: https://doi.org/10.14778/3384345. 3384346. doi:
          <volume>10</volume>
          .14778/3384345.3384346.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          , Y. Suhara, c. Demiralp,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-C.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Sato: Contextual semantic type detection in tables</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>1835</fpage>
          -
          <lpage>1848</lpage>
          . URL: https://doi.org/10. 14778/3407790.3407793. doi:
          <volume>10</volume>
          .14778/3407790.3407793.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Data cleaning: Overview and emerging challenges</article-title>
          , in: F. Özcan, G. Koutrika, S. Madden (Eds.),
          <source>Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2016</year>
          , San Francisco, CA, USA, June 26 - July 01,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>2201</fpage>
          -
          <lpage>2206</lpage>
          . URL: https://doi.org/10.1145/2882903.2912574. doi:
          <volume>10</volume>
          .1145/2882903.2912574.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Farid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Ilyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          <article-title>Chu, CLAMS: bringing quality to data lakes</article-title>
          , in: F. Özcan, G. Koutrika, S. Madden (Eds.),
          <source>Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2016</year>
          , San Francisco, CA, USA, June 26 - July 01,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>2089</fpage>
          -
          <lpage>2092</lpage>
          . URL: https://doi.org/10.1145/2882903. 2899391. doi:
          <volume>10</volume>
          .1145/2882903.2899391.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          ,
          <article-title>Ontology-based data access: A survey</article-title>
          , in: J.
          <string-name>
            <surname>Lang</surname>
          </string-name>
          (Ed.),
          <source>Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19</source>
          ,
          <year>2018</year>
          , Stockholm, Sweden, ijcai.org,
          <year>2018</year>
          , pp.
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          . URL: https://doi.org/10.24963/ ijcai.
          <year>2018</year>
          /777. doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2018</year>
          /777.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schelter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Celikel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bießmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Grafberger</surname>
          </string-name>
          ,
          <article-title>Automating large-scale data quality verification</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>1781</fpage>
          -
          <lpage>1794</lpage>
          . URL: http: //www.vldb.org/pvldb/vol11/p1781-schelter.pdf.
          <source>doi:10.14778/3229863</source>
          .3229867.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schelter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bießmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rukat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seufert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brunelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taptunov</surname>
          </string-name>
          ,
          <article-title>Unit testing data with deequ</article-title>
          , in: P. A.
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Manegold</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ailamaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Deshpande</surname>
          </string-name>
          , T. Kraska (Eds.),
          <source>Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2019</year>
          , Amsterdam, The Netherlands, June 30 - July 5,
          <year>2019</year>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>