<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting Ontologies for Explaining Data Sources Semantics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gianluca Cima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Lenzerini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonella Poggi</string-name>
          <email>poggi@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria Informatica, Automatica e Gestionale “Antonio Ruberto”</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Lettere e Culture Moderne cima</institution>
          ,
          <addr-line>lenzerini</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sapienza Universita` di Roma</institution>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>35</lpage>
      <abstract>
        <p>We study the problem of associating formal semantic descriptions to data services. We base our proposal on the Ontology-Based Data Access paradigm, where a domain ontology is used to provide a semantic layer mapped to the data sources of an organization. The basic idea is to explain the semantics of a data service in terms of a query over the ontology. We illustrate a formal framework for this problem, based on the notion of source-to-ontology rewriting, which comes in three variants, called sound, complete and perfect, respectively. We present a thorough complexity analysis of two computational problems, namely verification (checking whether a query is a rewriting of a given data service), and computation (computing a rewriting of a data service).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The architecture of many modern Information Systems is based on data services
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], i.e., services deployed on top of data stores, other services, and/or applications to
encapsulate a wide range of data-centric operations. In order to realize the promises of
data services, in particular to foster their reuse, it is of vital importance to well document
and clearly specify their semantics. While most current techniques manually associate
APIs (Application Programming Interface) to data services, and describe their intended
meaning with ad-hoc methods, often using natural language or complex metadata [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
we propose a new approach, whose goal is to automatically associate formal semantic
descriptions to data services. We base our proposal on the Ontology-Based Data Access
(OBDA) paradigm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. An OBDA specification consists of an ontology expressed in
Description Logic (DL) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the schema of the data sources forming the information
system, and a mapping between the source schema and the ontology. The ontology is a
formal representation of the underlying domain, and the mapping specifies the
relationship between the data at the sources and the concepts in the ontology. The semantics of
data services can be thus expressed using the elements of the domain ontology, which
is assumed to be familiar to the consumer of data services.
      </p>
      <p>
        But how can we automatically produce a semantic characterization of a data service,
having an OBDA specification available? The idea is to exploit a new reasoning task
over the OBDA specification, that works as follows: we express the data service in
terms of a query over the sources, and we aim at automatically deriving the query over
the ontology that best describes the data service, given the mapping. Note that most of
(if not all) the literature about managing data sources through an ontology [
        <xref ref-type="bibr" rid="ref10 ref7">7,10</xref>
        ] deals
with user queries expressed over the ontology, and studies the problem of finding an
      </p>
      <p>Gianluca Cima, Maurizio Lenzerini, Antonella Poggi
ontology-to-source rewriting, i.e., a query over the source schema that, once executed
over the data, provides the answers to the original query. Here, the problem is reversed,
because we start with a source query and we aim at deriving a corresponding query over
the ontology, called a source-to-ontology rewriting.</p>
      <p>
        The notions introduced in this paper are relevant in a plethora of scenarios. For the
sake of brevity, we mention only two of them. Following the ideas in [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ], it can be
shown that our notions of source-to-ontology rewriting can be used to provide the
semantics of open datasets and open APIs published by organizations, which is a crucial
aspect for unchaining all the potentials of open data. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the concept of realization
of source queries, corresponding to one of the notions studied here, is used for
checking whether the mapping provides the right coverage for expressing the relevant data
services at the ontology level.
      </p>
      <p>The contributions provided by this work can be summarized as follows. We
propose a formal framework for the problem of semantically characterizing a data service
through an ontology. We introduce the notions of perfect, sound, and complete
sourceto-ontology rewritings, and we define two basic reasoning tasks, namely verification
and computation. The former checks whether a given query is a source-to-ontology
rewriting of a data service, whereas the latter computes one such rewriting. We show
that, although the ideal notion is the one of perfect source-to-ontology rewriting, there
are cases where, with the given mapping, no query over the ontology can precisely
characterize the data service at hand. Thus, we introduce maximally sound and
minimally complete source-to-ontology rewritings, which intuitively aim at approximating
the perfect rewriting of a data service at best, with the goal of either precision (sound
rewriting), or recall (complete rewriting).</p>
      <p>
        We study the verification and the computation problem for complete and sound
source-to-ontology rewritings in one of the most popular OBDA setting considered
in the literature, namely where the ontology language is DL-LiteR [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], each mapping
assertion maps a conjunctive query (CQ) over the source to a CQ over the ontology, and
both the data service and the source-to-ontology rewriting are expressed as unions of
CQs. For complete source-to-ontology rewritings we present algorithms for verification
and computation, and characterize the complexity of both tasks. For the case of sound
rewritings, we do the same for verification, and we precisely determine the cases where
a maximally sound rewriting is not guaranteed to exist.
      </p>
      <p>
        This discussion paper describes results recently published in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To the best of
our knowledge, the problem studied in this work has been (partially) addressed only
in [
        <xref ref-type="bibr" rid="ref4 ref8">4,8</xref>
        ]. The former provides upper bound complexity results for complete rewritings,
and the latter focuses on both DL-LiteR and the E L family of ontology languages, and
studies perfect rewritings only, under a slightly different semantics with respect to the
one proposed here.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F</given-names>
          </string-name>
          . (eds.):
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          . Cambridge University Press (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carey</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Onose</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petropoulos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data services</article-title>
          .
          <source>Comm. of the ACM</source>
          <volume>55</volume>
          (
          <issue>6</issue>
          ),
          <fpage>86</fpage>
          -
          <lpage>97</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cima</surname>
          </string-name>
          , G.:
          <article-title>Preliminary results on ontology-based open data publishing</article-title>
          .
          <source>In: Proc. of DL</source>
          <year>2017</year>
          .
          <article-title>CEUR, ceur-ws.org</article-title>
          , vol.
          <source>1879</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cima</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantic technology for open data publishing</article-title>
          .
          <source>In: Proc. of WIMS 2017</source>
          . p.
          <volume>1</volume>
          :
          <issue>1</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cima</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantic characterization of data services through ontologies</article-title>
          .
          <source>In: Proc. of IJCAI</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Managing data through the lens of an ontology</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>39</volume>
          (
          <issue>2</issue>
          ),
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabellek</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Query expressibility and verification in ontology-based data access</article-title>
          .
          <source>In: Proc. of KR 2018</source>
          . pp.
          <fpage>389</fpage>
          -
          <lpage>398</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Linking data to ontologies</article-title>
          .
          <source>J. on Data Semantics X</source>
          ,
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          (
          <year>2008</year>
          ).
          <source>doi:10.1007/978-3-540-77688- 8 5</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>an Domenico</surname>
            <given-names>Lembo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.K.</given-names>
            ,
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Zakharyaschev</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Ontology-based data access: A survey</article-title>
          .
          <source>In: Proc. of IJCAI 2018</source>
          . pp.
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          :
          <article-title>Service-generated big data and big data-as-a-service: An overview</article-title>
          .
          <source>In: Proc. of the 2013 IEEE Int. Conf. on Big Data</source>
          . pp.
          <fpage>403</fpage>
          -
          <lpage>410</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>