<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-Mediated Queries for NOSQL Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marie-Laure Mugnier</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie-Christine Rousset</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Ulliana</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University Grenoble Alpes</institution>
          ,
          <addr-line>LIG, IUF</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Montpellier</institution>
          ,
          <addr-line>LIRMM, INRIA</addr-line>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Ontology-based data access (OBDA) is a well-established paradigm for querying
incomplete datasources while taking into account knowledge provided by a domain
ontology [PLC+08]. Today, the main applications of OBDA can be found in data
integration as well as in querying the Semantic Web. So far, OBDA has been studied for
relational structures and deployed on top of relational databases, and then transposed
to RDF. The database queries used in OBDA are (unions of) conjunctive relational
queries (or basic graph pattern queries), while the ontologies are specified in either a
description logic (e.g., the lightweight DL-Lite [CGL+07][KLT+10], or the expressive
Horn-SHIQ [EOS+12]), or a rule-based fragment of first-order logic (e.g., Datalog
[CGP12] and existential rules [BLMS11]). Decidability and complexity of
ontologymediated query answering within this framework have been extensively studied and
many algorithms have been designed and implemented.</p>
      <p>Whether this paradigm can be used in conjunction with other kinds of query
languages is a still an open question. The naive way to deal with non relational datasources
is to define mappings for translating them into relational structures, and then use the
classic OBDA framework as it is. However, this approach would induce a significant
performance degrade as it would add a step for converting the data using the
mappings and, most importantly, it would make impossible to take advantage of the low
level query optimizations provided by native systems. This can be particularly acute
for NOSQL systems, like key-value (KV) stores, that have been specifically designed to
scale when dealing with very large collections of data.</p>
      <p>Our goal is to study ontology-based data access directly on top of NOSQL systems.
The term NOSQL (NotOnly SQL) defines a broad collection of languages. KV stores
are NOSQL systems adopting the data model of KV records (also called JSON records).
These records are processed on distributed systems, but also increasingly exchanged
on the Web thereby replacing semistructured XML data and many RDF formats (see
JSON-LD [SLK+04]). KV records are non-first normal forms where values are not
only atomic (in contrast with relational databases) and nesting is possible [AHV95].</p>
      <p>This extended abstract summarizes our work on OBDA for KV stores [MRU16].
We introduce NO-RL, a rule language to express lightweight ontologies on top of KV
records, and provide first (un)decidability and data complexity results for
ontologymediated query answering in our setting. The NO-RL rule language operates directly
on KV records and can be compared to existing languages designed for reasoning on
nested structures like F-Logic [KLW95,Kif05], Elog [BFGK01,BCG+03,GK04] and
Active XML [ABM04]. However, up to our knowledge, none of this existing work
follows an OBDA approach.</p>
    </sec>
    <sec id="sec-2">
      <title>Data and Queries</title>
      <p>Our first contribution is a formalization of the data model of KV stores (sets of KV
records) and the associated basic queries. In this abstract, we will just illustrate these
notions by an example. The KV record below contains ten different keys: most of them
are associated with atomic values, except phone associated with a record, professor
and course associated with sequences (of records and of sequences, respectively) and
director associated with an unspecified value null .
f name : “Alice”; reachable : “yes”; boss : “Charles” g
f name : “Bob”; phone : foffice : “5-256”g; speciality : “Logics”g ]
course : [ [“C123”; “Java”] ; [“C310”; “C++”] ]
director : null g</p>
      <p>KV stores feature a limited set of low-level highly parallelizable queries based
on paths, and leave other functionalities (such as joins between records) for the
application accessing the data. An example of query retrieving the names of
professors is get(professor.name). This expression returns the values “Alice” and “Bob”
once evaluated on the example record. Another feature supported by KV stores is
the possibility to check structural properties of a record before selecting some
content. Thus, we further look at queries with a check() construct like the expression
check(director): get(professor.name). The get part of the query is evaluated at
the only condition that the key director also exists in the record. The check()
construct is particularly useful when dealing with incomplete information expressed by null
values, like for the key director in the example.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The NO-RL Rule Language</title>
      <p>To define rules on KV records, we consider atoms of the form (x), where is a (finite)
sequence of keys, called a key-path, and x a variable. NO-RL rules can be of two kinds:
0(x) ! (x) or 0(x) ! 9y (y), as illustrated below:
1. “every phone number is a contact (any value for the key phone is a value for the key
contact) ”
phone(x) ! contact(x)
2. “if there is a reachable field then there is a phone number”
reachable(x) ! 9y: phone(y)
3. “if a professor has a boss, then this last one is a director of the department”
professor.boss(x) ! director(x)
4. “the department defines a teaching specialty for all of its professors”
department(x) ! professor.specialty(x)</p>
      <p>We define three rule fragments: NO-RL(1), composed of rules that employ only
key-paths of length one (like Rules 1 and 2 in the example), NO-RL(2) and NO-RL(3),
which both extend NO-RL(1) by allowing respectively for rule bodies of arbitrary
length (like Rule 3) and for rule heads of arbitrary length (like Rule 4).</p>
      <p>The following record is obtained from the previous one by application of the four
above rules until fixpoint (process called “saturation”).
fname : “Alice”; reachable : “yes”; boss : “Charles”; phone : null; specialty : “CS” g;
fname : “Bob”; phone : foffice : “5-256”g;</p>
      <p>contact : foffice : “5-256”g; specialty : [“CS”; “Logics”]g ]
course : [[“C123”; “Java”]; [“C310”; “C++”]]
director : “Charles” g</p>
      <p>Rule application inserts a path into the existing record, which requires to merge
information (e.g., the application of Rule 4 inserts the path professor.specialty :
“CS”). We show that the inference operator associated with rule application is
confluent (up to a notion of record equality), hence the order in which rules are applied is
irrelevant. Finally, a value is an answer to a query Q over a knowledge base (I; ),
where I is a set of records and a set of NO-RL rules, if it is an answer to Q on a
record inferred from a record in I by the rules in .
4</p>
    </sec>
    <sec id="sec-4">
      <title>Decidability and Complexity Results</title>
      <p>Despite the apparent simplicity of the NO-RL language, answering check/get queries
under NO-RL rules is generally undecidable (which we prove by reduction from the
word problem in a semi-Thue system). Intuitively, this is due to the unrestricted
interaction between NO-RL(2) and NO-RL(3) rules. In particular, decidability can be
recovered by restricting the rule language to NO-RL(2) or NO-RL(3) only.</p>
      <p>A soon as a single NO-RL(3) rule is involved, saturation may be infinite. If we
consider NO-RL(2) rules, saturation is always finite since these rules do not increase
the depth of records. However, with a single NO-RL(1) rule already, saturation can be
exponential in the size of the input record. This observation advocates for query
reformulation techniques in the spirit of the OBDA paradigm. We define a sound and
complete reformulation technique (i.e., a value is an answer to Q over (I; ) iff if it
is an answer to a -reformulation of Q over I). Dually to saturation with NO-RL(2)
rules, query reformulation with NO-RL(3) rules is always finite since these rules do
not increase the depth of records. It follows that the data complexity of (the decision
problem associated with) query answering with NO-RL(3) rules is the same as basic
query answering on KV stores, i.e., it is in the low complexity class AC0. Query
reformulation can be infinite with NO-RL(2) rules, however, given a record of depth d, only
reformulations of depth less or equal to d are useful. There is an exponential number
of useful reformulations, nevertheless each of them can be produced with a polynomial
number of steps ((d + 1) j j), hence query answering over NO-RL(2) rules is in NP
(for both data and combined complexities —the exact lower bounds being open).</p>
      <p>Future work includes further clarifying query answering complexity, identifying
decidable / tractable cases mixing NO-RL(2) and NO-RL(3) rules, as well as designing
algorithms that would exploit the parallelization features of KV stores.
Acknowledgments. This work has been partially supported by projects PAGODA
(ANR12-JS02-007-01) and Labex PERSYVAL-Lab (ANR-11-LABX-0025-01).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [ABM04]
          <string-name>
            <given-names>Serge</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          , Omar Benjelloun, and
          <string-name>
            <given-names>Tova</given-names>
            <surname>Milo</surname>
          </string-name>
          .
          <article-title>Positive Active XML</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>PODS</given-names>
          </string-name>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [AHV95] Serge Abiteboul, Richard Hull, and Victor Vianu, editors.
          <source>Foundations of Databases: The Logical Level. Addison-Wesley Longman Publishing Co., Inc</source>
          ., Boston, MA, USA, 1st edition,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BCG+03]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ceresna</surname>
          </string-name>
          , Georg Gottlob,
          <string-name>
            <given-names>M</given-names>
            <surname>Herzog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and V.</given-names>
            <surname>Zigo</surname>
          </string-name>
          .
          <article-title>Web information acquisition with lixto suite: a demonstration</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [BFGK01]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          , Sergio Flesca, Georg Gottlob, and
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Koch</surname>
          </string-name>
          .
          <article-title>Visual web information extraction with lixto</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[BLMS11] J.-F. Baget</surname>
          </string-name>
          , M. Lecle`re,
          <string-name>
            <surname>M.-L. Mugnier</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Salvat</surname>
          </string-name>
          .
          <article-title>On Rules with Existential Variables: Walking the Decidability Line</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>175</volume>
          (
          <fpage>9</fpage>
          -10):
          <fpage>1620</fpage>
          -
          <lpage>1654</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [CGL+07]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family</article-title>
          .
          <source>J. Autom. Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [CGP12]
          <article-title>Andrea Cal`ı, Georg Gottlob</article-title>
          , and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Towards more expressive ontology languages: The query answering problem</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>193</volume>
          :
          <fpage>87</fpage>
          -
          <lpage>128</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [EOS+12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-K. Tran</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>Query Rewriting for HornSHIQ Plus Rules</article-title>
          . In AAAI.,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [GK04]
          <string-name>
            <given-names>Georg</given-names>
            <surname>Gottlob</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Koch</surname>
          </string-name>
          .
          <article-title>Monadic datalog and the expressive power of languages for web information extraction</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>10</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Kif05]
          <string-name>
            <given-names>Mickael</given-names>
            <surname>Kifer</surname>
          </string-name>
          .
          <article-title>Rules and Ontologies in F-logic</article-title>
          .
          <source>In RW</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [KLT+10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Toman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>The Combined Approach to Query Answering in DL-Lite</article-title>
          .
          <source>In KR</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [KLW95]
          <string-name>
            <given-names>Mickael</given-names>
            <surname>Kifer</surname>
          </string-name>
          , Georg Lausen, and
          <string-name>
            <given-names>James</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Logical foundations of object-oriented and frame-based languages</article-title>
          .
          <source>J. ACM</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [MRU16]
          <string-name>
            <surname>Marie-Laure</surname>
            <given-names>Mugnier</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marie-Christine Rousset</surname>
            , and
            <given-names>Federico</given-names>
          </string-name>
          <string-name>
            <surname>Ulliana</surname>
          </string-name>
          .
          <article-title>Ontologymediated queries for nosql databases</article-title>
          .
          <source>In AAAI</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [PLC+08]
          <string-name>
            <surname>Antonella</surname>
            <given-names>Poggi</given-names>
          </string-name>
          , Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Linking data to ontologies</article-title>
          .
          <source>J. Data Semantics</source>
          ,
          <volume>10</volume>
          :
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [SLK+04]
          <string-name>
            <surname>Manu</surname>
            <given-names>Sporny</given-names>
          </string-name>
          , Dave Longley, Gregg Kellogg, Markus Lanthaler, and
          <string-name>
            <given-names>Niklas</given-names>
            <surname>Lindstrom</surname>
          </string-name>
          .
          <article-title>JSON-LD 1.0, A JSON-based Serialization for Linked Data</article-title>
          .
          <source>Technical Report</source>
          http://www.w3.org/TR/json-ld/, W3C,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>