<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Virtual Knowledge Graph System Ontop (Extended Abstract) ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guohui Xiao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Lanti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Kontchakov</string-name>
          <email>roman@dcs.bbk.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah Komla-Ebri</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elem Güzel-Kalaycı</string-name>
          <email>elem.guezelkalayci@v2c2.at</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Linfang Ding</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Corman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Cogrel</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Calvanese</string-name>
          <email>diego.calvanese@umu.se</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Botoeva</string-name>
          <email>e.botoeva@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Birkbeck, University of London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Imperial College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Ontopic s.r.l.</institution>
          ,
          <addr-line>Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Umeå University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Virtual Vehicle Research GmbH</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The full paper is published at the Resource Track of ISWC 2020 [14] VKG. The Virtual Knowledge Graph (VKG) approach, also known in the literature as Ontology-Based Data Access (OBDA) [8,12], has become a popular paradigm for accessing and integrating data sources [13]. In such approach, the data sources, which are normally relational databases, are virtualized through a mapping and an ontology, and presented as a unified knowledge graph, which can be queried by end-users through a vocabulary they are familiar with. At query time, a VKG system translates user queries over the ontology to SQL queries over the database. This approach frees end-users from the low-level details of data organization, so that they can concentrate on their high-level tasks. As it is gaining more importance, the VKG paradigm has been implemented in several systems [1,2,9,11] and adopted in a wide range of use cases. Here, we present the latest major release, Ontop v4, of a popular VKG system. Ontop v1. The development of Ontop has spanned the past decade. Developing such a system is highly non-trivial and requires both a theoretical investigation of the semantics and strong engineering efforts to implement all the required features. Ontop started in 2009, only one year after the first version of SPARQL had been standardized, while OWL 2 QL [6] and R2RML [4] appeared 3 years later, in 2012. At that time, the VKG research focused on union of conjunctive queries (UCQs) as a query language. With this target, Ontop v1 relied on non-recursive Datalog as its core data structure [10] because it perfectly fit the UCQ-based setting. The development of Ontop was boosted by the EU FP7 project Optique (2013-2016), during which the compliance with the relevant W3C recommendations became a priority, and significant progress was made in this direction. The last release of Ontop v1 was v1.18 in 2016 [1]. New challenges. A natural requirement that emerged during the Optique project were aggregates introduced in SPARQL 1.1 [5]. The Ontop development team spent a major effort, internally called Ontop v2, on implementing this</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>query language feature. However, it became exceedingly clear that the Datalog
representation was not well suited for this implementation. Some prototypes of
Ontop v2 were used in the Optique project for internal purposes, but never
reached the level of a public release. During this development, as Ontop moved
towards supporting the W3C recommendations for SPARQL and R2RML, we
have identified the following new challenges:
– In contrast to the usual DL encoding with unary and binary predicates
for classes and properties, in SPARQL triple pattern variables can occur in
positions of class and property names, which means that there are effectively
only two ‘predicates’: triple for triples in the RDF dataset default graph,
and quad for named graphs.
– More importantly, SPARQL is based on a rich algebra, which goes beyond
the expressivity of CQs. Non-monotonic features like optional and minus,
cardinality-sensitive query modifiers (distinct) and aggregation (group
by with functions such as sum, avg, count) are difficult to model even in
extensions of Datalog.
– Even without SPARQL aggregation, cardinalities have to be treated
carefully: the SQL queries in a mapping produce bags of tuples, but their induced
RDF graphs contain no duplicates and thus are sets of triples; however, when
a SPARQL query is evaluated, it results in a bag of solutions mappings.
These challenges turned out to be difficult to tackle in the Datalog setting.
Ontop v4. To address the challenges posed by aggregation, and others that
had emerged in the meantime, we started to investigate an alternative core
data structure. The outcome has been what we call intermediate query (IQ),
an algebra-based data structure that unifies both SPARQL and relational
algebra. Using IQ, we reimplemented most of the Ontop code base. After two beta
releases in 2017 and 2018, we released the stable version of Ontop v3 in 2019.
Following Ontop v3, the development focussed on improving compliance and adding
several major features. In particular, aggregates have now been supported since
Ontop v4-beta-1, released in late 2019. The stable version of Ontop v4 was
released in July 20201. The documentation is provided at the official website2.
Evaluation. Ontop v4 has greatly improved its compliance with relevant W3C
recommendations and provides good performance in query answering. It
supports almost all the features of SPARQL 1.1, R2RML, OWL 2 QL, and SPARQL
entailment regime, and the SPARQL 1.1 HTTP Protocol. In particular, in
Table 1, we present a summary of Ontop v4 compliance with SPARQL 1.1, where
rows correspond to sections of the WC3 recommendation. Most of the features
are supported, but some are unsupported or only partially supported. Note that
most of the missing SPARQL functions (Section 17.4) are not so challenging
to implement but require a considerable engineering effort to carefully define
their translations into SQL. We will continue the process of implementing them
gradually and track the progress in a dedicated issue3. Recently, two
indepen1 https://github.com/ontop/ontop
2 https://ontop-vkg.org/
3 https://github.com/ontop/ontop/issues/346
8. Negation
9. Property Paths
10. Assignment
11. Aggregates
12. Subqueries
13. RDF Dataset
14. Basic Federated Query
16. Query Forms
17.4.1. Functional Forms
17.4.2. Fns. on RDF Terms
17.4.3. Fns. on Strings</p>
    </sec>
    <sec id="sec-2">
      <title>BIND, VALUES</title>
      <p>Subqueries</p>
    </sec>
    <sec id="sec-3">
      <title>GRAPH, FROM [NAMED]</title>
      <p>SERVICE</p>
    </sec>
    <sec id="sec-4">
      <title>MINUS, FILTER [NOT] EXISTS</title>
      <p>PredicatePath, InversePath, ZeroOrMorePath, . . .</p>
    </sec>
    <sec id="sec-5">
      <title>COUNT, SUM, MIN, MAX, AVG, GROUP_CONCAT, SAMPLE</title>
    </sec>
    <sec id="sec-6">
      <title>SELECT, CONSTRUCT, ASK, DESCRIBE</title>
    </sec>
    <sec id="sec-7">
      <title>BOUND, IF, COALESCE, EXISTS, NOT EXISTS, ||, &amp;&amp;, =, sameTerm, IN, NOT IN isIRI, isBlank, isLiteral, isNumeric, str, lang, datatype, IRI, BNODE, STRDT, STRLANG, UUID, STRUUID</title>
    </sec>
    <sec id="sec-8">
      <title>STRLEN, SUBSTR, UCASE, LCASE, STRSTARTS, STRENDS,</title>
    </sec>
    <sec id="sec-9">
      <title>CONTAINS, STRBEFORE, STRAFTER, ENCODE_FOR_URI,</title>
    </sec>
    <sec id="sec-10">
      <title>CONCAT, langMatches, REGEX, REPLACE</title>
      <p>15. Solution Seqs. &amp; Mods.</p>
    </sec>
    <sec id="sec-11">
      <title>ORDER BY, SELECT, DISTINCT, REDUCED, OFFSET, LIMIT</title>
      <p>Coverage
4/4
1/2
0
2/2
6/6
1/1
1/2
0
6/6
4/4
6/11
9/13
14/14
5/5
8/9
5/5
0
0
The Virtual Knowledge Graph System Ontop (Extended Abstract)
17.4.4. Fns. on Numerics</p>
      <p>abs, round, ceil, floor, RAND
17.4.5. Fns. on Dates&amp;Times now, year, month, day, hours,</p>
      <p>minutes, seconds, timezone, tz
17.4.6. Hash Functions
17.5. XPath Constructor Fns. casting
17.6. Extensible Value Testing user defined functions</p>
      <p>
        MD5, SHA1, SHA256, SHA384, SHA512
dent evaluations [
        <xref ref-type="bibr" rid="ref3 ref7">3,7</xref>
        ] of VKG systems have confirmed the robust performance of
Ontop. When considering all the perspectives, like usability, completeness, and
soundness, Ontop clearly stands out among the open-source systems.
Community and Adoption. Ontop is the result of an active developer
community. It has been downloaded more than 30K times from Sourceforge. In
addition to the research groups, Ontop is also backed by a commercial company,
Ontopic s.r.l., born in April 2019. Ontop has been adopted in many academic
and industrial use cases. However, due to its liberal Apache 2 license, it is
essentially impossible to obtain a complete picture of all use cases and adoptions.
Nevertheless, a few significant use cases have been summarized in a recent
survey paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Finally, we mention two recent commercial deployments of
Ontop: UNiCS (http://unics.cloud/) is an open data platform for research and
innovation, and ODH-VKG (https://sparql.opendatahub.bz.it/) is a project
publishing South Tyrolean tourism data as a Knowledge Graph.
      </p>
      <p>Xiao et al.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Komla-Ebri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rezk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez-Muro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao.</surname>
          </string-name>
          <article-title>Ontop: answering SPARQL queries over relational databases</article-title>
          .
          <source>SWJ</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>471</fpage>
          -
          <lpage>487</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          , M. RodriguezMuro, R. Rosati,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruzzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Savo</surname>
          </string-name>
          .
          <article-title>The MASTRO system for ontologybased data access</article-title>
          .
          <source>SWJ</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>43</fpage>
          -
          <lpage>53</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Chaloupka</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Necasky</surname>
          </string-name>
          .
          <article-title>Using Berlin SPARQL benchmark to evaluate relational database virtual SPARQL endpoints</article-title>
          . Submitted to SWJ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>R2RML: RDB to RDF mapping language</article-title>
          .
          <source>W3C recommendation, W3C</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          .
          <source>SPARQL 1</source>
          .
          <article-title>1 query language</article-title>
          .
          <source>W3C recommendation, W3C</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <source>OWL 2 Web Ontology Language: Profiles. W3C Recommendation, W3C</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Namici and G. De Giacomo</surname>
          </string-name>
          .
          <article-title>Comparing query answering in OBDA tools over W3C-compliant specifications</article-title>
          .
          <source>In Proc. DL</source>
          , volume
          <volume>2211</volume>
          .
          <article-title>CEUR-WS</article-title>
          .org,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Linking data to ontologies</article-title>
          .
          <source>J. Data Sem</source>
          .,
          <volume>10</volume>
          :
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          , Ó. Corcho, and
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          .
          <article-title>Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph</article-title>
          .
          <source>In WWW</source>
          , pages
          <fpage>479</fpage>
          -
          <lpage>490</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. M.
          <string-name>
            <surname>Rodriguez-Muro</surname>
            and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rezk</surname>
          </string-name>
          .
          <article-title>Efficient SPARQL-to-SQL with R2RML mappings</article-title>
          . J. Web Sem.,
          <volume>33</volume>
          :
          <fpage>141</fpage>
          -
          <lpage>169</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. F. Sequeda</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>D. P.</given-names>
          </string-name>
          <string-name>
            <surname>Miranker</surname>
          </string-name>
          .
          <article-title>Ontology-based data access using views</article-title>
          .
          <source>In Proc. RR</source>
          , volume
          <volume>7497</volume>
          <source>of LNCS</source>
          , pages
          <fpage>262</fpage>
          -
          <lpage>265</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. G. Xiao,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakharyaschev</surname>
          </string-name>
          .
          <article-title>Ontology-based data access: A survey</article-title>
          .
          <source>In Proc. IJCAI</source>
          , pages
          <fpage>5511</fpage>
          -
          <lpage>5519</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. G. Xiao,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          .
          <article-title>Virtual knowledge graphs: An overview of systems and use cases</article-title>
          .
          <source>Data Intelligence</source>
          ,
          <volume>1</volume>
          :
          <fpage>201</fpage>
          -
          <lpage>223</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. G. Xiao,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Komla-Ebri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Güzel-Kalayci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Botoeva.</surname>
          </string-name>
          <article-title>The virtual knowledge graph system ontop</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>