<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ODIN: A Dataspace Management System</article-title>
      </title-group>
      <abstract>
        <p>Odin is a system that supports the incremental pay-as-yougo integration of data sources into dataspaces and provides user-friendly querying mechanisms on top of them. We describe its main characteristics and underlying assumptions, including the user interactions required. Odin's novelty lies in a largely automated bottom-up approach (i.e., driven by the sources at hand) that includes the user in the loop for disambiguation purposes. The on-site demonstration will feature an ongoing project with the World Health Organization (WHO). Online demo and videos: www.essi.upc.edu/dtim/odin/ A prominent approach to virtual data integration is that of exposing an ontology, which conceptualizes the domain of interest, to o er a uniform query interface over the sources. Queries over the ontology are rewritten over the sources via schema mappings. The maintenance of such constructs (i.e., evolving the ontology, or adding new sources and mappings) is well-known to be an arduous and manually-intensive task that hinders the ability of such systems to exibly adapt and provide right-time integration. This limitation has been coined as the data variety challenge, which refers to the complexity of providing on-demand integration over a vast and evolving set of data sources. Dataspaces represent a major step towards tackling the variety challenge. With the vision of reducing the usual upfront and maintenance costs, dataspaces claim for the adoption of a exible and dynamic pay-as-you-go approach where di erent integration tasks are automated [1]. Supporting the end-to-end lifecycle of dataspaces is a technically challenging task. The state of the art on automatic construction of an ontology from the data sources (and their respective mappings), commonly known as bootstrapping, is BootOX [2]. Targeted to ontology-based data integration, BootOX generates OWL 2 QL ontologies from relational databases, together with R2RML mappings to the sources. Yet, this approach falls short in settings where managing data variety is a key requirement. On the one hand, the extraction is restricted to relational databases and misses widely used semi-structured data formats such as CSV, JSON or XML. On the other hand, such mappings conform to the global-as-view (GaV) Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This work was partly supported by the GENESIS project, funded by the Spanish Ministerio de Ciencia e Innovacion under grant TIN2016-79269-R.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>family, which characterize the ontological classes and properties in terms of SQL
queries, which are not well suited for highly dynamic and evolving settings.</p>
      <p>
        Odin (short for On-demand Data INtegration), a dataspace management
system grounded in knowledge graphs, was conceived to overcome the
aforementioned challenges. Fig. 1 depicts how Odin supports the dataspaces complete
lifecycle. Odin automatically extracts the schemata from structured (e.g.,
relational) and semi-structured (e.g., JSON) data sources and translates them into
a canonical data model, namely RDFS. To this end, a set of production rules
parse their metadata and automatically generate RDFS-compliant source graphs.
Next, the source graphs are aligned while considering the user feedback
throughout this process. As result, Odin generates provenance graphs (PG) tracing the
results of the previous stages. A PG is a target-agnostic metadata construct (i.e.,
not tailored for a speci c tool) about the integration of a particular set of data
sources. PG captures the results of bootstraping the sources and aligning their
schemata, and guarantees we can generate target-speci c metadata from them1.
Thus, PGs are used to generate the speci c constructs of a given integration
tool. In this demo, ODIN generates the constructs required by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Precisely,
conjunctive query (CQ)-oriented graphs, which expose the sources schemata in
rst-normal form, which are then linked via local-as-view (LaV) schema
mappings (represented as graphs) to the global graph. LaV mappings characterize the
sources in terms of a query over the ontology, which make them inherently more
suitable in data variety settings. This entails a more complex query answering
process, which boils down to the problem of answering queries using views. In
our demo, however, we will show the feasibility of our approach in real cases.
      </p>
      <sec id="sec-1-1">
        <title>Provenance Metadata</title>
      </sec>
      <sec id="sec-1-2">
        <title>Dataspaces Metadata</title>
        <p>...</p>
        <p>Schema
Extraction
Schema
Extraction</p>
        <p>Source
Graph
...</p>
        <p>Source
Graph</p>
        <p>Wrapper (1NF)
Alignment
Feedback
Data Analyst</p>
        <p>Provenance</p>
        <p>Graph</p>
        <p>Target-oriented
merging &amp;
consolidation</p>
        <p>CQ-oriented</p>
        <p>Graph
CQ-oriented</p>
        <p>Graph</p>
        <p>LAV Global
CQ-oriented mappings Graph</p>
        <p>Graph</p>
        <p>Metadata Flow</p>
        <p>Query
Iterative Processing</p>
        <p>Data Analyst
1 Although the focus of this paper will be on answering queries, during the demo we
will highlight the ability to generate the required metadata for other ontological
reasoning services (e.g., DL-Lite for satis ability checking) from a PG.
Generation of RDFS schemata from sources. Odin adopts a meta-modeling
approach to bootstrap disparate sources in order to create an RDFS
representation of their schemata. For each source data model, Odin de nes an equivalent
rst order logic representation of its meta-model. Given a source model (e.g.,
relational or JSON), a set of pre-de ned production rules (i.e., tuple-generating
dependencies de ned at the meta-model level) generate an equivalent RDFS
model2. Source graphs are the result of this bootstrapping phase.
User-driven source graph alignment. From source graphs, Odin
incrementally generates the PG, where it annotates source graph alignments in the
form of taxonomies. To discover alignments, Odin uses an enhanced version of
LogMap3, which considers Wordnet synonyms. Candidate alignments are ranked,
and Odin prompts the user to accept or reject them. Further, since aligning two
ontologies is a hard task, Odin also provides an intuitive interface to manually
assert alignments.</p>
        <p>
          Querying the sources via the ontology. This nal step consists in generating
the required metadata constructs to pose and resolve queries over the dataspace.
To this end, from PGs, Odin automatically generates the global graph (i.e.,
a merged view of the aligned source graphs) and CQ-oriented graphs, which
expose a rst-normal form structure of the sources. To guarantee the incremental
evolution of the system, Odin also generates LaV mappings from CQ-oriented
graphs to the global graph. Since PGs were created in a bottom-up approach,
we are able to automate the de nition of all required constructs. Consequently,
given that Odin explicitly models the schema that sources expose, LaV mappings
are exact and they are not required to deal with incompleteness on the sources.
Finally, Odin provides a user-friendly interface to pose conjunctive queries (CQs)
on the global graph, that are automatically translated to SPARQL. A rewriting
algorithm interprets such query and generates the certain answers under the
closed-world assumption in terms of unions of CQs [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The demo will show that
such constructs are automatically generated in linear time (w.r.t. the size of PG).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Demo</title>
      <p>We will present the functionalities of Odin via the WHO Information System
to Control and Eliminate Neglected Tropical Diseases (WISCENTD)4. The goal
of WISCENTD is to provide support in the collection, integration and analysis
of data coming from di erent monitoring systems surveilling di erent aspects
of neglected tropical diseases (NTDs). Data related to NTDs are largely
fragmented and their integration is mandatory to shed light on NTDs around the
world. The demo will simulate the day-by-day of a WHO data analyst and how
Odin is used to rst collect and integrate di erent sources of relevance for a
certain NTD, and later cross-query them. We will use relevant datasets, such
as UN Data (open-data JSON datasets) about health economics indicators and</p>
      <sec id="sec-2-1">
        <title>2 http://essi.upc.edu/dtim/ardi 3 https://github.com/ernestojimenezruiz/logmap-matcher 4 https://www.who.int/neglected_diseases/disease_management/wiscentds/en</title>
        <p>migrant information per country5, data about diagnosis and treatment per
country periodically extracted from WIDP6 (that hosts a relational database), data
about drug distribution periodically extracted from WIMEDS7 as CSVs, etc. We
will rst showcase how the data analyst, just interacting with Odin's interface,
is able to integrate and query such sources in a friendly manner. Odin allows the
interested users to browse the metadata generated throughout the whole
process: (i) source bootstrapping, (ii) their alignment to construct the PG (Fig. 2),
and (iii) the automatic creation of the constructs for query answering. The
audience will be encouraged to participate including new sources in an incremental
manner, query the global graph, or even apply Odin to other domains.
Implementation details. Odin follows a service-oriented architecture, which
enables extensibility and separation of concerns. The frontend is implemented
in Javascript and resides in a Node.JS webserver. Odin uses WebVOWL to
visualize and interact with graphs. The backend, is implemented as a set of REST
APIs de ned using Jersey for Java. To deal with RDF graphs, this component
makes heavy use of Jena and its persistance engine Jena TDB.</p>
      </sec>
      <sec id="sec-2-2">
        <title>5 http://data.un.org</title>
        <p>6 http://bit.ly/whowidp
7 http://bit.ly/whowimeds</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>From databases to dataspaces: a new abstraction for information management</article-title>
          .
          <source>SIGMOD Record</source>
          <volume>34</volume>
          (
          <issue>4</issue>
          ),
          <volume>27</volume>
          {
          <fpage>33</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheleznyakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinkel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skj</surname>
            <given-names>veland</given-names>
          </string-name>
          , M.G.,
          <string-name>
            <surname>Thorstensen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mora</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>BootOX: Practical Mapping of RDBs to OWL 2</article-title>
          . In:
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2015</year>
          . pp.
          <volume>113</volume>
          {
          <issue>132</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Nadal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassiliadis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vansummeren</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An integrationoriented ontology to govern evolution in big data ecosystems</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>79</volume>
          ,
          <issue>3</issue>
          {
          <fpage>19</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>