<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Wrangling for Big Data: Towards a Lingua Franca for Data Wrangling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tim Furche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Gottlob</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernd Neumayr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuel Sallinger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Oxford</institution>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        We are dealing with ever growing amounts of data, or as some like to call it, we
are at the beginning of the era of big data. The value gained from the analysis of
big data is expected to be huge. Yet data analytics needs well-organized, clean
data to work on. The process of transforming raw data from various sources into
a form suitable for data analytics is called data wrangling [
        <xref ref-type="bibr" rid="ref12 ref15">15, 12</xref>
        ]. It is estimated
that up to 80% of the time spent gaining value from big data is spent on data
wrangling as opposed to data analytics itself [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Hence, there is a clear need
for data wrangling to be more e ective.
      </p>
      <p>
        While the term data wrangling is relatively new, transforming data from one
format into another has been a focus of the data management community for
many years now. There are systems that handle data extraction very well [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Data integration [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and data exchange [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have been studied in detail. We have
gained a deep understanding of data quality, as well as querying and reasoning
over this data. All of these components are needed for successful data wrangling.
      </p>
      <p>Yet, with the advent of big data, new challenges have arrived, sometimes
characterized by the 4 V's of big data: volume (the scale of data), velocity (speed
of change), variety (di erent forms and formats) and veracity (uncertainty).
These pose challenges for each component of data wrangling itself, but also for
the whole system.</p>
      <p>
        So far, the challenges that big data poses for data wrangling have mostly
been met at the level of individual components (such as data extraction or
integration). Yet it is sharing knowledge between these components that has the
most potential of improving the data wrangling process, e.g. by allowing the data
management system to optimize execution based on such shared knowledge [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
In this paper, we describe the vision and design principles of a Datalog-based
language for data wrangling that facilitates such sharing of knowledge.
      </p>
      <p>Overall, data wrangling today is an area highly demanded by industry, but
without a clearly de ned research community or clearly de ned research
objectives. The data management community is a natural t for taking up this task
and shaping this new eld. The VADA (Value-Added Data Systems) programme,
with the University of Oxford, the University of Manchester and the University
of Edinburgh at its core, seeks to signi cantly advance data wrangling, both in
practice by providing an architecture and prototype implementation for data
wrangling, and in academic research by solving the challenges of wrangling big
data in collaboration with the data management community.</p>
    </sec>
    <sec id="sec-2">
      <title>A Lingua Franca for Wrangling Big Data</title>
      <p>
        The need to share data and knowledge between components of a data
wrangling system makes it clear that a common language is needed to express such
knowledge, and enable reasoning over it. In this paper, our goal is to describe
the vision of a lingua franca for data wrangling. Such a language should provide
a uniform way to address the di erent needs in the data wrangling process:
{ expressing knowledge in a shared knowledge-base
{ reasoning about data and transformation of data within the components
{ specifying the work ow between the components
The language should address these features by providing a uniform view of data
(independent of its source), while supporting the components by being suited to
data extraction, integration and exchange. At the same time, it must allow for
e cient processing and scalability when confronted with big data.
One of the best-established languages in the data management community for
knowledge-based reasoning is Datalog. Over the years, it has been studied in
great detail and extended in various ways [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the area of data exchange [
        <xref ref-type="bibr" rid="ref16 ref9">9,
16</xref>
        ] and data integration [
        <xref ref-type="bibr" rid="ref14 ref19">14, 19</xref>
        ], extended Datalog rules called tuple-generating
dependencies (tgds; sometimes also called existential rules [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) are used to specify
schema mappings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. They have been successfully applied in IBM's Clio system
and form the core of products o ered by companies such as LogicBlox to empower
data analytics. Complementing that, their theoretical properties are well-studied
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], including their management [
        <xref ref-type="bibr" rid="ref3 ref7">7, 3</xref>
        ], composition [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ], optimization [
        <xref ref-type="bibr" rid="ref18 ref21">21, 18</xref>
        ]
and reasoning [
        <xref ref-type="bibr" rid="ref17 ref22">22, 17</xref>
        ], in particular the computational limits of reasoning [
        <xref ref-type="bibr" rid="ref10 ref13">10,
13</xref>
        ]. The family of languages often called Datalog [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] seeks to add to Datalog's
expressive power, yet not by sacri cing e ciency and scalability.
A key challenge to such a language is large volume of data on the one hand,
and requirements for highly expressive reasoning on the other hand. Clearly,
meeting both requirements at the same time is hard. Yet there is a full spectrum
of possibilities in between:
{ small volume of data: complex reasoning
{ large volume of data: simple processing
{ very large volume of data: parallel processing
The challenge of o ering both expressiveness and scalability in a single system
poses particular design challenges to a language for data wrangling. A single
monolithic { highly expressive { language is not enough to meet the requirement
of scalability in the presence of big data.
2.1
      </p>
      <p>Design Principles
In this section, we present our vision for VADALOG, our proposed language for
data wrangling. We will do this by following the major features, themes and
properties of the language. For space reasons, we will focus on the design
principles of the language, and not go into the details of its syntactic representation.
Solid Foundation: Datalog. The language is based on Datalog, extended by
features that are well-known in the data management community: in particular
existential quanti cation (as in tgds, existential rules or Datalog ) as well as
numerous other features motivated by the theoretical and practical needs of data
wrangling. Having Datalog at its foundation gives VADALOG a well-understood
core that has been the topic of research for many years now.</p>
      <p>Family of Languages. One single, all-encompassing language cannot meet at
the same time the goal of being highly expressive as well as having low
computational complexity. VADALOG therefore consists of pro les of the language (in
the same meaning as the pro les of languages such as OWL), each providing a
speci c subset suited to a particular purpose. This allows simple computations
to be e ciently and scalably executed over big data, while at the same time
allowing complex reasoning over a smaller knowledge-base.</p>
      <p>Combining Strengths. There exist a number of powerful knowledge-based
systems, database systems, and systems that are able to deal with big data that
often combine the expertise of large groups of researchers and engineers. The
language is thus designed with the intent of making use of systems speci cally suited
for particular tasks. For example, if a VADALOG programme is formulated in a
pro le of the language that is particularly suited to an existing knowledge-based
system, then this system is used as a backend-engine. This design choice does
not come for free { many interesting research challenges have to be addressed to
deal with multiple engines working in a uni ed system.</p>
      <p>Handling Volume. Volume may be coped with by using an engine that is
suited for handling huge amounts of data. Yet this is not always the most
e cient approach. The language contains as \ rst-class citizens" the support
for partitioning the data into dataspheres which may be parameterized using
domain-dependent or data-dependent parameters. This language-design
principle of being able to handle volume by allowing clever partitioning of the data,
combined with using engines that can handle big amounts of data when
necessary, allows VADALOG to e ciently deal with huge amounts of data.
Modularity. Reasoning and transformation tasks are organized into
self-contained modules { based on the concept of a data transducer which receives
data from di erent dataspheres and produces data in di erent dataspheres. Such
transducers modularly encapsulate their dependencies (which dataspheres they
require to be present) and their guards (what conditions must be met to be
executed). De ning all of these parts of the transducer is done using VADALOG
in a single, maintainable module for each such transducer.</p>
      <p>Dynamic Orchestration. De ning single modules { transducers { is only one
part of a data wrangling system. A key part of such a system is that all its
components are able to share data and knowledge between them and, importantly,
react to such knowledge by dynamically selecting which next steps to take. For
example, as a result of quality analysis, the system may choose to redo data
extraction with di erent background knowledge, or adding a data source. A key
part of VADALOG is thus a pro le for specifying such transducer networks that
dynamically orchestrate components of the data wrangling system.
Extensibility. A rule-based language based on Datalog is clearly suited for
knowledge-based reasoning tasks. Yet, while a data wrangling system has
reasoning-intensive tasks, it also has tasks which are better suited to be implemented
in other languages. To harness components implemented in other languages,
VADALOG allows extensibility at a number of levels: At the transducer level,
which gives the components wide-ranging freedom in how to approach its task
(such as an existing component analysing data quality); at the level of actions,
which are middle-scale tasks (such as navigating web pages), and at the level of
external functions, which can add small-scale functionality not supported in the
language (such as non-supported string or number functions).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>Data wrangling is an important challenge in practice and for research. The data
management community is in a good position to take on this challenge, giving
this growing eld a scienti c community and shaping its research objectives. The
data management community should take an active role in this area.</p>
      <p>Developing a language for data wrangling is an important step that is clearly
needed for that purpose. In this paper, we described the design principles of
VADALOG, our proposed language for data wrangling. While a number of
design principles are related to solving the challenges of data wrangling itself, two
important ones promote a collaborative approach to this: It allows to harness
the power of existing knowledge-based reasoning systems, and it promotes
extensibility by allowing integration of components.</p>
      <p>We invite the data management community to collaborate with us both in
using the power of currently-developed systems for data wrangling, as well as to
shape the future of VADALOG.</p>
      <p>Acknowledgements. This work was supported by the EPSRC programme
grant EP/M025268/1. Bernd Neumayr receives funding from a habilitation grant
of the state of Upper Austria. Emanuel Sallinger was partially supported by the
Austrian Science Fund projects (FWF):P25207-N23 and (FWF):Y698.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barcelo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Libkin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murlak</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Foundations of Data Exchange</article-title>
          . Cambridge University Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nash</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Composition with target constraints</article-title>
          .
          <source>Logical Methods in Computer Science</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutter</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riveros</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Foundations of schema mapping management</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>227</volume>
          {
          <fpage>238</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutter</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riveros</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The language of plain so-tgds: Composition, inversion and structural properties</article-title>
          .
          <source>JCSS</source>
          <volume>79</volume>
          (
          <issue>6</issue>
          ),
          <volume>763</volume>
          {
          <fpage>784</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Baget</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leclere</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mugnier</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salvat</surname>
          </string-name>
          , E.:
          <article-title>On rules with existential variables: Walking the decidability line</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>175</volume>
          (
          <issue>9-10</issue>
          ),
          <volume>1620</volume>
          {
          <fpage>1654</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Barcelo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.): Datalog in Academia and Industry - Second International Workshop, Datalog 2.0, LNCS, vol.
          <volume>7494</volume>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Model management 2.0: manipulating richer mappings</article-title>
          .
          <source>In: SIGMOD Conference</source>
          . pp.
          <volume>1</volume>
          {
          <fpage>12</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukasiewicz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A general datalog-based framework for tractable query answering over ontologies</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>77</volume>
          {
          <fpage>86</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: semantics and query answering</article-title>
          .
          <source>Theor. Comput. Sci</source>
          .
          <volume>336</volume>
          (
          <issue>1</issue>
          ),
          <volume>89</volume>
          {
          <fpage>124</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Feinerer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savenkov</surname>
          </string-name>
          , V.:
          <article-title>On the undecidability of the equivalence of second-order tuple generating dependencies</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>48</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Furche</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grasso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orsi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schallhart</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>DIADEM: thousands of websites to a single database</article-title>
          .
          <source>PVLDB</source>
          <volume>7</volume>
          (
          <issue>14</issue>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Furche</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Libkin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orsi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>N.W.</given-names>
          </string-name>
          :
          <article-title>Data wrangling for big data: Challenges and opportunities</article-title>
          . In: EDBT (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gottlob</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Function symbols in tuple-generating dependencies: Expressive power and computability</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>65</volume>
          {
          <fpage>77</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ordille</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          :
          <article-title>Data integration: The teenage years</article-title>
          .
          <source>In: VLDB</source>
          . pp.
          <volume>9</volume>
          {
          <fpage>16</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kandel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaisant</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kennedy</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Ham</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riche</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodbeck</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buono</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Research directions in data wrangling: Visualizations and transformations for usable and credible data</article-title>
          .
          <source>Information Visualization</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <volume>271</volume>
          {
          <fpage>288</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kolaitis</surname>
          </string-name>
          , P.G.:
          <article-title>Schema mappings, data exchange, and metadata management</article-title>
          .
          <source>In: PODS</source>
          . pp.
          <volume>61</volume>
          {
          <issue>75</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savenkov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Nested dependencies: structure and reasoning</article-title>
          . In: PODS. pp.
          <volume>176</volume>
          {
          <fpage>187</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savenkov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Limits of schema mappings</article-title>
          .
          <source>In: ICDT. LIPIcs</source>
          , vol.
          <volume>48</volume>
          , pp.
          <volume>19</volume>
          :
          <issue>1</issue>
          {
          <fpage>19</fpage>
          :
          <fpage>17</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data integration: A theoretical perspective</article-title>
          . In: Popa,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kolaitis</surname>
          </string-name>
          , P.G. (eds.) PODS. pp.
          <volume>233</volume>
          {
          <fpage>246</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lohr</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>For big-data scientists, `janitor work' is key hurdle to insights</article-title>
          . The New York Times (
          <year>2015</year>
          ), http://nyti.ms/1Aqif2X
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Pichler</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savenkov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Relaxed notions of schema mapping equivalence revisited</article-title>
          .
          <source>Theory Comput. Syst</source>
          .
          <volume>52</volume>
          (
          <issue>3</issue>
          ),
          <volume>483</volume>
          {
          <fpage>541</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sallinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Reasoning about schema mappings</article-title>
          .
          <source>In: Data Exchange</source>
          , Information, and
          <string-name>
            <surname>Streams</surname>
          </string-name>
          ,
          <source>Dagstuhl Follow-Ups</source>
          , vol.
          <volume>5</volume>
          , pp.
          <volume>97</volume>
          {
          <issue>127</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>