<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Federico Igne[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Computing CQ Lower-Bounds over OWL 2 through Approximation to RSA ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Extended Abstract</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <addr-line>Oxford</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <abstract>
        <p>Conjunctive query (CQ) answering over knowledge bases is an important reasoning task. However, with expressive ontology languages such as OWL, query answering is computationally very expensive. The PAGOdA system addresses this issue by using a tractable reasoner to compute lower and upper-bound approximations, falling back to a fullyfledged OWL reasoner only when these bounds don't coincide. The effectiveness of this approach critically depends on the quality of the approximations, and in this paper we explore a technique for computing closer approximations via RSA, an ontology language that subsumes all the OWL 2 profiles while still maintaining tractability of standard reasoning tasks. We present a novel approximation of OWL 2 ontologies into RSA, and an algorithm to compute a closer lower bound approximation using the RSA combined approach. We have implemented these algorithms in our prototype and conducted an extensive evaluation thereof.</p>
      </abstract>
      <kwd-group>
        <kwd>CQ answering</kwd>
        <kwd>combined approach</kwd>
        <kwd>ontology approximation</kwd>
        <kwd>RSA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Conjunctive query (CQ) answering is one of the primary reasoning tasks over
knowledge bases for many applications. However, when considering expressive
? This work was supported by the AIDA project (Alan Turing Institute), the
SIRIUS Centre for Scalable Data Access (Research Council of Norway, project no.:
237889), Samsung Research UK, Siemens AG, and the EPSRC projects AnaLOG
(EP/P025943/1), OASIS (EP/S032347/1) and UK FIRES (EP/S019111/1).
1 Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
description logic languages, query answering is computationally very expensive,
even when considering only complexity w.r.t. the size of the data (data
complexity ). Fully-fledged reasoners oriented towards CQ answering over unrestricted
OWL 2 ontologies exist but, although heavily optimised, they are only effective
on small to medium datasets. In order to achieve tractability and scalability for
the problem, two main approaches are often used: either the expressive power of
the input ontology or the completeness of the computed answers is sacrificed.</p>
      <p>
        Using the first approach, query answering procedures have been developed
for several fragments of OWL 2 for which CQ answering is tractable with respect
to data complexity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Three such fragments have been standardised as OWL 2
profiles, and CQ answering techniques for these fragments have been shown to be
highly scalable at the expense of expressive power [
        <xref ref-type="bibr" rid="ref10 ref12 ref2 ref9">2,9,10,12</xref>
        ]. Using the second
approach, several algorithms have been proposed to compute an approximation
of the set of answers to a given CQ. This usually results in computing a sound
subset of the answers, sacrificing completeness.
      </p>
      <p>
        A particularly interesting approach to CQ answering over unrestricted OWL 2
ontologies is adopted by the reasoner PAGOdA [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Its “pay-as-you-go” approach
allows to use a Datalog reasoner to handle the bulk of the answer computation,
computing lower and upper approximations of the answers to a query, while
relying on a fully-fledged OWL 2 reasoner like HermiT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] only as necessary to
fully answer the query.
      </p>
      <p>
        This work expands on this “pay-as-you-go” technique; it aims to improve the
lower-bound approximation in PAGOdA, tightening the gap between lower and
upper bounds and minimising the use of HermiT. We achieve this by (soundly)
approximating the input ontology into RSA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], an ontology language that
subsumes all the OWL 2 profiles and for which a CQ answering algorithm based on
the combined approach has been proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We present a novel algorithm
for approximating the input ontology into RSA, and an implementation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] of the
combined approach CQ answering algorithm adapted to the use of RDFox [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
as a backend Datalog reasoner; this includes the design of an improved version
of the filtering step for the combined approach, optimised for RDFox.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Workflow</title>
      <p>Figure 1 summarises the workflow of the system: (i) normalisation and
customisable approximation steps approximate an unrestricted OWL 2 ontology to
RSA; (ii) the canonical model is then computed for the resulting ontology; (iii) a
Approximation to RSA</p>
      <p>Canonical model computation
30
25
tecom )(ssdn2105
a co
eR se10
5
0
180
160
140
)120
(s
LBUMssecodn1680000
0 0 0 0 0 0 0 0
10 20 30 40 50 60 70 80</p>
      <p>LUBM size
0 0 0 0 0 0 0 0
10 20 30 40 50 60 70 80</p>
      <p>LUBM size
Datalog filtering program is derived from the input query and is combined with
the canonical model to produce the set of certain answers to the input query
over the approximated ontology. Depending on how the approximation is
performed, the set of answers returned might have different properties (e.g., the
approximation currently provided computes a lower bound of the answers to the
query over the original ontology).</p>
      <p>It is worth noting that, in this scenario, steps (i),(ii) are query independent,
while step (iii) is ontology independent. As such, when multiple queries are
submitted, steps (1-2) can be performed “offline” and only the third step needs to
be performed for each input query. We took advantage of this and streamlined
the execution of the combined approach by factoring out those steps that are
query independent to make answering multiple queries over the same knowledge
base more efficient.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Evaluation</title>
      <p>
        We have carried out an extensive evaluation to assess the effectiveness of the
system and test the scalability and performance of the different steps in the
execution of the combined approach. In Figure 2, we compare the scalability
of approximation and canonical model computation steps over LUBM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
100
80
      </p>
      <p>Q
u
e
r
y
1
100
80
60
40
20
0</p>
      <p>Q
u
e
r
y
2
100
80
60
40
20
0</p>
      <p>Q
u
e
r
y
3</p>
      <p>Reactome2. The two steps are query independent and present a linear growth
w.r.t. the dataset size of both knowledge bases.</p>
      <p>Another interesting aspect is that most of the time the filtering step takes
considerably less time than the canonical model computation. Figure 3 shows
the percentage time distribution of three of the tested queries over Reactome.
Filtering takes consistently less that 20% of the total execution time, when
considering bigger datasets. As mentioned before, we can limit the impact of the
canonical model computation by factoring out the task and computing it
“offline” whenever we found ourselves in a scenario in which we need to perform
query answering over a fixed ontology.</p>
      <p>Overall, our experimental results show that the new technique yields
significant performance improvements in several important application scenarios and
solve some critical problems present in the original PAGOdA implementation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Ongoing research</title>
      <p>We are already working on additional improvements to the approximation
algorithm to RSA; the current visit of the dependency graph to detect the axioms to
delete might be improved with different heuristics and might in some cases take
into account the input query (deleting axioms that are not necessarily involved
in the computation of the answers). A similar approach could be introduced to
integrate RSA in the upper-bound of the answers to a query, with the ultimate
goal of improving this step in PAGOdA as well.</p>
      <p>On a different note, we hope to obtain additional improvements in
performance in the current implementation of the RSA combined approach by
introducing parallel execution of filtering steps for different input queries, using the
named graph functionality provided by RDFox.
2 https://reactome.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Data complexity of query answering in description logics</article-title>
          .
          <source>In: Proceedings, Tenth International Conference on Principles of Knowledge Representation and Reasoning</source>
          ,
          <source>Lake District of the United Kingdom, June 2-5</source>
          ,
          <year>2006</year>
          . pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. Autom. Reasoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          (
          <year>2007</year>
          ). https://doi.org/10.1007/s10817-007- 9078-x
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carral</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Pushing the boundaries of tractable ontology reasoning</article-title>
          .
          <source>In: The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23</source>
          ,
          <year>2014</year>
          . Proceedings,
          <source>Part II. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8797</volume>
          , pp.
          <fpage>148</fpage>
          -
          <lpage>163</lpage>
          . Springer (
          <year>2014</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -11915-1_
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Feier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carral</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stefanoni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>The combined approach to query answering beyond the OWL 2 profiles</article-title>
          .
          <source>In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2015</year>
          ,
          <string-name>
            <given-names>Buenos</given-names>
            <surname>Aires</surname>
          </string-name>
          , Argentina,
          <source>July 25-31</source>
          ,
          <year>2015</year>
          . pp.
          <fpage>2971</fpage>
          -
          <lpage>2977</lpage>
          . AAAI Press (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Hermit: An OWL 2 reasoner</article-title>
          .
          <source>J. Autom. Reasoning</source>
          <volume>53</volume>
          (
          <issue>3</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>269</lpage>
          (
          <year>2014</year>
          ). https://doi.org/10.1007/s10817-014-9305-1
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heflin</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>J. Web Semant</source>
          .
          <volume>3</volume>
          (
          <issue>2-3</issue>
          ),
          <fpage>158</fpage>
          -
          <lpage>182</lpage>
          (
          <year>2005</year>
          ). https://doi.org/10.1016/j.websem.
          <year>2005</year>
          .
          <volume>06</volume>
          .005
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Igne</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Germano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Computing CQ lower-bounds over OWL 2 through approximation to RSA</article-title>
          . In: International Semantic Web Conference (ISWC) (
          <year>2021</year>
          ), [forthcoming]
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Igne</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Germano</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
          </string-name>
          , I.:
          <article-title>RSAComb - Combined approach for Conjunctive Query answering in RSA (</article-title>
          <year>Jun 2021</year>
          ). https://doi.org/10.5281/zenodo.5047811
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The combined approach to query answering in dl-lite</article-title>
          .
          <source>In: Principles of Knowledge Representation and Reasoning: Proceedings of the Twelfth International Conference, KR 2010</source>
          , Toronto, Ontario, Canada, May 9-
          <issue>13</issue>
          ,
          <year>2010</year>
          . AAAI Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Conjunctive query answering in the description logic EL using a relational database system</article-title>
          . In: Boutilier,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (ed.)
          <source>IJCAI</source>
          <year>2009</year>
          ,
          <source>Proceedings of the 21st International Joint Conference on Artificial Intelligence</source>
          , Pasadena, California, USA, July
          <volume>11</volume>
          -
          <issue>17</issue>
          ,
          <year>2009</year>
          . pp.
          <fpage>2070</fpage>
          -
          <lpage>2075</lpage>
          (
          <year>2009</year>
          ), http://ijcai. org/Proceedings/09/Papers/341.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banerjee</surname>
          </string-name>
          , J.:
          <article-title>Rdfox: A highlyscalable RDF store</article-title>
          .
          <source>In: The Semantic Web - ISWC 2015 - 14th International Semantic Web Conference</source>
          , Bethlehem, PA, USA, October
          <volume>11</volume>
          -
          <issue>15</issue>
          ,
          <year>2015</year>
          , Proceedings,
          <source>Part II. Lecture Notes in Computer Science</source>
          , vol.
          <volume>9367</volume>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stefanoni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Answering conjunctive queries over EL knowledge bases with transitive and reflexive roles</article-title>
          .
          <source>CoRR abs/1411</source>
          .2516 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaminski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Pagoda: Payas-you-go ontology query answering using a datalog reasoner</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>54</volume>
          ,
          <fpage>309</fpage>
          -
          <lpage>367</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>