<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ForBackBench: From Database to Semantic Web Mappings and Back</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Afnan Alhazmi</string-name>
          <email>a.alhazmi@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaime Osvaldo Salas</string-name>
          <email>j.o.salas@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Konstantinidis</string-name>
          <email>g.konstantinidis@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Iqaros</institution>
          ,
          <addr-line>Graal, Ontop, OntopR, GQR), chase (RDFox, Rulewerk, Llunatic), and hybrid systems</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Query rewriting</institution>
          ,
          <addr-line>Ontology Materialisation, Data integration, OBDA, Dependencies, Benchmark</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Southampton</institution>
          ,
          <addr-line>University Road, Southampton, SO17 1BJ</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>6</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Data Integration/Exchange (DI/DE) has seen great developments both in the Databases (DB) and the Semantic Web (SW) areas. ForBackBench has been the first framework that facilitates the comparison of systems from both areas by introducing datasets, converters and algorithms that allow for integration and interaction of diferent query rewriting (OBDA) and forward-chaining (Chase) systems. In this paper, we discuss the most recent developments of the framework, which aim to completely bridge the gap between the DB and SW areas in DI/DE field by integrating RML (and R2RML) ontology materialisation systems. To facilitate translations between mapping languages (R2RML, RML, OBDA) and tuple-generating dependencies (TGDs), we introduce a special language of source-to-source TGDs with built-in functions. We integrate state-of-the-art ontology materialisation systems from SW as well as datasets and scenarios from that area. Our initial experimental results already shed light on the interaction of these areas and the properties of diferent algorithms and systems, exhibiting the merits of our framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Recent years have seen an increase in the demand of systems for data integration and exchange
(DI/DE), a problem extensively studied in the both the Databases (DB) and the Semantic Web
(SW) areas and invaluable for a variety of domains, including biological research, industry, data
marketplaces, and others [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Several systems have been introduced to enable data integration.
Eforts such as ChaseBench [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have compiled a variety of systems that implement the chase
algorithm, one of the most well-known class of algorithms used to infer new data.
      </p>
      <p>
        In our previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we introduced ForBackBench, a framework that creates a unified
approach to compare several DI/DE systems. ForBackBench comes with a set of common
forward- and backward-chaining systems, common testing scenarios from literature, and
preintegrated converter tools to simplify the comparing and evaluation process, allow automated
testing, and allow for easier extension in the future. The query-answering systems that initially
became available on ForBackBench were classified in three groups: query rewriting (Rapid,
(ChaseGQR). Through a command-line interface and a web interface, the user of the framework
can run end-to-end experiments on combinations of query-answering systems and scenarios,
nEvelop-O
(G. Konstantinidis)
CEUR
Workshop
Proceedings
build new scenarios by providing ontology or TGDs files, and generate data with diferent sizes.
Additionally, the command-line interface allows the user to run any converter as a stand-alone
tool. The full details of ForBackBench functionalities are available in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Although the backward-chaining systems included as part of ForBackBench came mostly
from the Ontology-Based Data Access (OBDA) area of SW, the forward-chaining systems came
from the state-of-the-art data exchange technologies in the DB domain. In this paper, we close
the loop by integrating in ForBackBench state-of-the-art ontology materialisation engines from
the SW domain. This brings us a step closer to realising our vision of bridging the gap between
the DB and SW areas in DI/DE. Our contributions include the following:
• We present end-to-end translation algorithms and implementation between the most
common SW mapping languages (OBDA mappings, R2RML, and RML) and TGDs.
• We integrate three state-of-the-art ontology materialisation systems into ForBackBench:</p>
      <p>
        Morph-KGC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] , RMLMapper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and SDM-RDFizer [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
• We integrate a new scenario, GTFS-Madrid-Bench scenario [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a recent OBDA scenario
with real data that was used in the KGCW 2023 Challenge at the KGC Workshop [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
• We present an initial round of experiments pointing out the diferent ways algorithms
and engines are used and shedding light on the performance of these systems.
      </p>
      <p>To the best of our knowledge, this is the first framework that includes all relevant scenarios,
systems and algorithms, paving the way for a versatile platform where one can plug diferent
systems to produce a customised end-to-end solution for a DI/DE task.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Extended ForBackBench Framework</title>
      <p>
        As part of our extension of the ForBackBench Framework, we aimed to include forward-chaining
systems from the SW community, and to provide complete translation between forward- and
backward-chaining formats. Therefore, we extended the functionality of the Data/Mapping
Generator component that we introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to support conversion between TGDs and
mapping languages (R2RML [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and RML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]).
      </p>
      <p>
        The problem of SW mapping translation was first introduced in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to discuss its importance
and the available mapping translation engines. Our work is complementary to the recent
research [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which focuses on mapping translations between mapping languages within the
SW area, where these languages rely on triples’ generation. In our work, we focus on the
mapping translation between SW mappings (that use triples) and DB mappings (that use rules).
In order to test these translations, we integrated the GTFS-Madrid-Bench scenario, which comes
with R2RML and RML mappings used to generate TGDs for DB forward-chaining systems.
Mappings to TGDs Translation In some cases, mappings languages use “Skolem” or
“builtin” functions that cannot be encoded into TGDs in a straightforward manner. For instance,
Example 1 shows an OBDA mapping object that creates new values for the arguments of
 :   by combining multiple columns of the source tables. Therefore, we introduce a
new sub-language of TGDs: Skolem source-to-source TGDs. A Skolem ss-TGD(Σ ) is a TGD of
the form ∀,⃗ (⃗(,⃗ )⃗ →  (()⃗, ( )⃗)) where  can be an arbitrary user-defined function (here
we limit ourselves to string concatenation). In the following we focus on relations of arity two
(so they are directly translatable to triples, without reification).
      </p>
      <p>Definition 1. For all TGDs  of at most arity two,  is a Skolem source-to-source TGD if (1)  and
 are over the source schema, (2)  , the body, is an SQL query , (3)  , the head, is a ChaseBench
query, where  , the arguments, are concatenation functions of multiple columns of source tables.</p>
      <p>For ontology materialisation scenarios we implement two phases: the ofline phase, where we
convert OBDA, RML, or R2RML mappings to TGDs (see Example 1) and the online phase, where
we generate and load data for the new source tables. The choice of supported mapping languages
was guided by the current scope of our framework, which is mostly limited to relational DBs.
In the future we aim to support more non-relational mapping languages.</p>
      <p>TGDs to Mappings Translation. For each source-to-target (st) TGD mapping (we
assume/transform TGDs to have a single head atom), we define a triple map (,  , , ) where the TGD
head atom translates (via the natural way) into  ,  ,  and  is a generated SQL query reflecting
the TGD body. Every mapping language (OBDA mappings, R2RML, or RML) contains a “target”
that can be constructed from  ,  ,  and a logical source in SQL or CSV. Thus we appropriately
translate our triple map to a chosen language. We can also support translating back to a mapping
language from a scenario that includes both st-TGDs as well as our own Skolem ss-TGDs (thus
reverting Example 1), in which case the SQL query  is obtained from the Skolem ss-TGDs
(since it appears explicitly). If we have Skolem ss-TGDs, we use the namespaces contained in
them; otherwise, we invent an example namespace for the final mapping.</p>
      <p>Example 1.</p>
      <p>OBDA Mapping:
mappingId Mapping00643
target ns:employee/{ } /name/{ }
source SELECT .  as eID, . 
INNER JOIN    ON .  =   . 
Translated TGDs:</p>
      <p>Σ = { SELECT . 
INNER JOIN   
 _  
Σ = { _</p>
      <p>as eID, .</p>
      <p>ON .  =   .  →
_  _00643( :/ {eID}// {eName},  :/
_  _00643(? , ? ) →   (? , ? ).}
ns:EmpInfo ns:dept/{ }</p>
      <p>as eName,   . 
as eName,   . 
{dName})}</p>
      <p>as dName FROM 
as dName FROM</p>
    </sec>
    <sec id="sec-4">
      <title>3. Experiments and Results</title>
      <p>We are benchmarking materialization systems from DB (RDFox, Rulewerk), materialization
systems from SW (Morph-KGC, RMLMapper, and SDM-RDFizer), and query rewriting systems
(Rapid, Iqaros, and Ontop). Note that Ontop has two versions: (OntopR2RML) with R2RML
mappings and (Ontop) with its native mapping language (i.e., OBDA mappings). Each system
and query was run a total of six times, with the first being a ‘cold-run’. We compare loading and
chasing (generation of RDF graphs) times for forward-chaining, with loading, rewriting, and
unfolding times (the unfolding time includes the conversion of the query to a valid SQL query)
for backward-chaining. For DB systems we also include query execution times. All experiments
were run on a MacBook Pro M1 chip with a 2.30GHz, 8-core processor, 8GB of memory, and a
total of 1TB of disk space.</p>
      <p>The SW forward-chaining systems (SDM-RDFizer, RMLMapper, and Morph-KGC) do not
have out-of-the-box query answering functionality so we do not include execution times for
these; we will extend the framework by including SPARQL query answering as future work.
Thus, sometimes it is meaningless to compare an ontology materialisation system with a query
rewriting system (that includes execution), especially when the former is faster. Nevertheless,
in the GTFS-Madrid-Bench scenario seen below, the benchmarked SW materialisation systems
are all slower than query rewriting systems that include execution, which is a very interesting
result, pointing out that, possibly, virtual Knowledge Graphs – that do online query answering
directly from sources – have not been investigated enough.</p>
      <p>
        Figure 1a shows results for the GTFS-Madrid-Bench scenario [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with its accompanying
mappings, and a “small” data size (using the first scale of the VIG generator that comes with the
scenario). Our choice for the 5 queries of GTFS-Madrid-Bench was based on their complexities,
thus we chose both simple queries (without joins) and more complex queries (up to 8 joins).
Note that systems’ performance might vary up to a couple of seconds due to device resources
allocation (including memory and CPU). Results show that DB forward-chaining systems, RDFox
and Rulewerk, are significantly faster in chasing, in the current scenario, but slower in loading
data. Overall, in this experiment, DB systems are faster than the ontology-materialisation
engines SDM-RDFizer and Morph-KGC, while RMLMapper timed out (timeout was 2.5h).
      </p>
      <p>Figure 1b shows the results of our experiments in the OWL2Bench scenario with again a
small data size. Once again, the SW systems present faster loading times but significantly slower
chase times except RMLMapper which was slow in loading as well. Query-rewriting systems
(Ontop, Rapid, Iqaros) appear to be the fastest in both figures, however in the experiment of
Figure 1b, both Rapid and Iqaros timed out after unfolding in Q2 due to a huge rewriting size,
while in Q5 Morph-KGC is faster than both. The results, although very preliminary, imply
that query-rewriting might be penalised for larger/complex queries, and that forward-chaining
systems from both the SW and DB areas are valuable solutions on large datasets and complex
queries. We are currently running more experiments.</p>
      <p>(a) GTFS-Madrid-Bench Scenario
(b) OWL2Bench Scenario</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>We presented the most recent developments of the ForBackBench framework. We introduced
Skolem source-to-source TGDs, as well as translations between mapping languages (R2RML,
RML, OBDA) and TGDs. Finally, we integrated state-of-the-art SW ontology materialisation
systems into ForBackBench and provided experimental results on a wide range of scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Acknowledgements</title>
      <p>This work was partially funded by the UKRI Horizon Europe guarantee funding scheme for the
Horizon Europe projects RAISE (101093216101058479) and UPCAST (101093216101093216).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          ,
          <article-title>Principles of data integration</article-title>
          ,
          <source>Elsevier</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          , G. Konstantinidis,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , E. Tsamoura,
          <article-title>Benchmarking the chase</article-title>
          ,
          <source>in: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhazmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Blount</surname>
          </string-name>
          , G. Konstantinidis,
          <article-title>Forbackbench: A benchmark for chasing vs. query-rewriting</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1519</fpage>
          -
          <lpage>1532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhazmi</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Konstantinidis, OBDA vs forward chaining: the ForBackBench framework</article-title>
          ,
          <source>International Semantic Web Conference (ISWC)</source>
          <year>2022</year>
          : Posters, Demos, and
          <string-name>
            <surname>Industry Track</surname>
          </string-name>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          .,
          <source>Ldow</source>
          <volume>1184</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          , T. De Nies,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mechant</surname>
          </string-name>
          , R. Van de Walle,
          <article-title>Automated metadata generation for linked data generation and publishing workflows, in: LDOW2016, CEUR-WS</article-title>
          . org,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>SDM-RDFizer: An RML interpreter for the eficient creation of RDF knowledge graphs</article-title>
          ,
          <source>in: Proceedings of the 29th ACM Intl conference on Information &amp; Knowledge Management</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>3039</fpage>
          -
          <lpage>3046</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimmino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Toledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <surname>GTFSMadrid-Bench</surname>
          </string-name>
          :
          <article-title>A benchmark for virtual knowledge graph access in the transport domain</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>65</volume>
          (
          <year>2020</year>
          )
          <fpage>100596</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Şimşek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <source>KGCW 2023 challenge @ ESWC</source>
          <year>2023</year>
          ,
          <year>2023</year>
          . URL: https://zenodo.org/record/7837289.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>R2RML: RDB to RDF mapping language</article-title>
          ,
          <year>2012</year>
          . URL: http://www.w3.org/TR/r2rml/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Chaves-Fraga,
          <article-title>Towards a new generation of ontology based data access</article-title>
          ,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>153</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimmino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>García-Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>An ontological approach for representing declarative mapping languages, Semantic Web (</article-title>
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>