<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Conversion of Transport Data Adopting Declarative Mappings: an Evaluation of Performance and Scalability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Scrocca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Carenini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Comerio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Celino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cefriel</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The transportation domain is characterised by a multitude of di erent formats to represent data, thus creating a problem of (lack of) interoperability between systems and a need for data conversion. In order to cope with the speci c requirements of production systems, special attention should be given to performance and scalability of conversion solutions. In this paper, we present a thorough evaluation of the Chimera framework for semantic data conversion through declarative mappings, in both a dataset and message conversion scenarios. We illustrate the experimental results and we o er our considerations and recommendations for the successful implementation of conversion pipelines exploiting Semantic Web technologies.</p>
      </abstract>
      <kwd-group>
        <kwd>Transport Data</kwd>
        <kwd>Semantic Data Conversion</kwd>
        <kwd>Knowledge Graph Construction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Interoperability in the transportation domain is the main challenge to
provide travellers with multi-modal door-to-door travel solutions involving di
erent transport service providers. The Shift2Rail initiative, nanced by the
European Commission, has been addressing this challenge within the Innovation
Programme 41 by de ning an Interoperability Framework to enable a seamless
data exchange between di erent transport stakeholders [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The proposed
approach exploits a reference ontology, representing the shared conceptual model of
the transport domain, to enable interoperability at a semantic level and
any-toany communications between di erent actors. In this scenario, each stakeholder
is not forced to adopt a new format or standard and can enter the ecosystem
by de ning a set of rules that specify how the currently used data model can be
mapped onto the reference ontology [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As a result, a technical artifact,
henceforth referred as the converter, can be con gured to translate messages from
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
      </p>
      <sec id="sec-1-1">
        <title>1 Cf. https://shift2rail.org/research-development/ip4/</title>
        <p>a standard A to a standard B, exploiting Semantic Web technologies and the
reference ontology.</p>
        <p>
          The main challenges in the adoption of this approach are the de nition of
the mapping rules and the performance and scalability requirements related to
the technical implementation of converters. In the SPRINT project
(Semantics for PerfoRmant and scalable INteroperability of multimodal Transport) we
addressed these two aspects [
          <xref ref-type="bibr" rid="ref11 ref16">11,16</xref>
          ]. This work reports the results obtained in
developing a semantic converter and in assessing its performances and scalability.
        </p>
        <p>The performance and scalability evaluation of a converter should consider the
two main conversion scenarios in the transportation domain: the harmonisation
of static data, like timetables and scheduled transport, and the transformation of
dynamic data, like journey planning messages. The Dataset Conversion (batch
conversion) scenario considers the case where medium-sized to big archives of
transportation data, usually static data, should be converted. Conversions are
required with low frequency, but the conversion procedure should minimise the
resource usage to obtain scalability with respect to the size of the dataset. The
Message Conversion (service mediation) scenario considers the case where a
message, usually dynamic data, should be converted to guarantee communication
between two di erent systems at runtime. A small amount of data is converted
for each request, but the conversion procedure should minimise the processing
time to avoid introducing overhead and to obtain scalability with respect to a
high frequency request rate.</p>
        <p>
          To address these two scenarios, we designed and developed a generic, modular
and exible solution, the Chimera framework2 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], the converter component of
the Interoperability Framework. The performed testing activities reported in this
work, besides o ering insights on the Chimera implementation, contribute to a
general validation of the involved technologies and tools for the discussed use
case in the transportation domain.
        </p>
        <p>The reminder of the paper is organised as follows: Section 2 deals with
preliminaries and related works; Section 3 describes the testing activities designed
to evaluate converters; Section 4 discusses the results obtained and elicits a set
of recommendations for the performance and scalability of converters; Section 5
draws the conclusions and de nes potential future works.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries and Related Works</title>
      <p>
        A semantic data conversion procedure, following the any-to-one centralised
mapping approach [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], transforms data in two steps exploiting the de ned rules
to/from the reference ontology: (i) input data are mapped onto the reference
ontology (lifting phase from standard A to RDF), (ii) the obtained triples,
specifying data through the reference ontology, are mapped onto the target data
format (lowering phase from RDF to standard B).
      </p>
      <p>A possible implementation of the described conversion procedure is based
on a Object-Relational Mapping (ORM) approach using
unmarshalling/mar</p>
      <sec id="sec-2-1">
        <title>2 https://github.com/cefriel/chimera</title>
        <p>
          shalling libraries to obtain an in-memory representation of data as objects, and
then exploiting annotations in the code to map each class and attribute to class
and properties of the reference ontology. This approach is implemented in
RDFBeans3, Empire4, and was studied for the de nition of the ST4RT converter.
This implementation improved the pre-existing approaches providing more
exibility in the de nition of the annotations to address complex mappings in both
the lifting and the lowering phase [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. However, while this method allows using
classical object-oriented programming techniques, its application showcased
several drawbacks: (i) an object-oriented representation of the source/target data
format is required, (ii) related performance and scalability issues arise for
conversion procedure with complex annotations and/or handling large les. For these
reasons, in the SPRINT project, we implemented and tested an alternative
converter relying on declarative mappings for the lifting and lowering phases (cf.,
Figure 1).
        </p>
        <p>
          Chimera is an open-source framework based on Apache Camel5 and adopts
a modular approach to build exible pipelines for conversions based on
Semantic Web technologies. The main goal of Chimera is to minimise the amount of
code to be written, allowing to create a converter just by con guring the various
blocks o ered by the framework. In particular, this paper evaluates conversion
procedures adopting the following Chimera blocks: (i) a lifting block for
materialisation through a custom version of the RML-Mapper library6 and employing
RML [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] mappings to generate RDF triples, and (ii) a lowering block based on
Apache Velocity templates to query the RDF graph with SPARQL queries and
to de ne the logic to place the query results in the proper output format7. A
3 RDFBeans, cf. https://github.com/cyberborean/rdfbeans
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>4 Empire, cf. https://github.com/mhgrove/Empire</title>
      </sec>
      <sec id="sec-2-3">
        <title>5 https://camel.apache.org/</title>
      </sec>
      <sec id="sec-2-4">
        <title>6 https://github.com/RMLio/rmlmapper-java</title>
      </sec>
      <sec id="sec-2-5">
        <title>7 https://github.com/cefriel/rdf-lowerer. A demo example of the lowering ap</title>
        <p>
          proach is available in the repository.
more detailed overview of the Chimera framework is available in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and in the
repository8.
        </p>
        <p>
          RML is a mapping language that extends the R2RML recommendation [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
to support heterogeneous data sources. RML mapping rules are de ned on a set
of logical sources representing the input data sources; each rule is represented
using a triple map that de nes how to extract a record from the logical source
and generate a set of associated RDF triples; a join condition allows to create
triples involving entities generated by two triple maps.
        </p>
        <p>
          A real-world use case and evaluation of the discussed approach for batch data
conversion in the transportation domain is presented in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] exploiting an initial
release of Chimera. This paper describes the results of a broader evaluation of
the converter: we adopt a transportation domain benchmark for the batch data
conversion scenario [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], discuss how di erent con gurations and parameters [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
can a ect the performances and scalability of the conversion, and evaluate the
converter also in the service mediation scenario.
        </p>
        <p>
          A comparison of Chimera with others state-of-the-art tools for knowledge
graph materialisation is available in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The advantages of adopting a semantic
data conversion procedure based on lifting and lowering mechanisms in a di erent
domain, i.e. the Web of Things, is presented in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Test Design</title>
      <p>In this section we present the performance and scalability evaluation, for both
dataset and message conversion, and the testing infrastructure.</p>
      <p>
        Dataset Conversion For the evaluation in the dataset conversion scenario,
we chose the GTFS-Madrid-Bench9 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The benchmark is based on GTFS10
(General Transit Feed Speci cation) data on the Madrid city metro published
from the Consorcio Regional de Transportes de Madrid (CRTM11). GTFS is
composed of a set of CSV les where each one represents some information
about static transit information. Based on the original data and according to
the benchmark, we generated datasets of increasing scale (1, 5, 10, 50, and 100),
and in di erent formats (CVS, JSON and XML). It is important to point out
that we generated those datasets for performance testing, but a typical GTFS
feed size is in the order of tens of megabytes and rarely overcomes the 100 MB
when unzipped.
      </p>
      <p>
        The planned testing activities considered a roundtrip conversion GTFS !
Linked GTFS [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] ! GTFS. The GTFS-to-LinkedGTFS RML lifting mappings
are from the GTFS-Madrid-Bench. For the lowering, we de ned a set of custom
Apache Velocity templates. The lifting phase considers di erent input formats
      </p>
      <sec id="sec-3-1">
        <title>8 https://github.com/cefriel/chimera/blob/master/README.md</title>
      </sec>
      <sec id="sec-3-2">
        <title>9 https://github.com/oeg-upm/gtfs-bench 10 https://developers.google.com/transit/gtfs 11 https://www.crtm.es</title>
        <p>(CSV, JSON or XML), while the lowering phase always produces CSV les.
In the conversion pipeline we used an in-memory RDF repository to store and
query the materialised RDF graph. However, we also evaluated the impact on
performances of using an external triplestore.</p>
        <p>Message Conversion To evaluate the message conversion, we selected a
realistic journey planning test case involving a deployed web service, converting
a response message from the HaCon VBB journey planning endpoint12 to the
TRIAS format13. During the lifting, a VBB TripList message (representing travel
solutions for a requested itinerary, dimension 43KB) is mapped onto the IP4
IT2Rail ontology14 through RML mappings. The materialised graph is stored
through an in-memory RDF repository. During the lowering, the data modelled
through the IT2Rail ontology are mapped onto a TRIAS TripResponse message
using a Apache Velocity template with speci c SPARQL queries. We employed
the JMeter tool15 to test the performances of the converter with a increasing
size of concurrent requests (number of threads: 10, 50, 100, 150, 500, 1000, 2500,
5000; ramp-up period: 1 second; loop count: 1).</p>
        <p>Testing infrastructure All tests were run using a Docker16 container to
guarantee reproducibility on a machine running CentOS Linux 7, with Intel Xeon
8-core CPU and 64 GB Memory. We set a memory limit to 24GB, a timeout of
24 hours and no limits on CPU usage. We run each test 5 times averaging the
obtained results. We also monitored resource usage of containers in execution.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Test Evaluation</title>
      <p>
        In this section, we discuss the performance and scalability test results and
illustrate the identi ed bottlenecks and their possible solutions. Additional details
and data are available in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We compare the conversion results obtained with
two di erent releases of Chimera, core and nal. The nal release implements
a set of optimisations, discussed in this section, that were developed within the
SPRINT project to improve the performance and scalability of the lifting and
lowering components17.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Dataset Conversion: Performance and Scalability</title>
        <p>
          In Table 1, we provide the complete results for the dataset conversion scenario
considering di erent sizes and di erent formats. Execution times are measured
12 http://fahrinfo.vbb.de
13 https://github.com/VDVde/TRIAS
14 http://www.it2rail.eu/
15 https://jmeter.apache.org/
16 https://www.docker.com/
17 A complete report discussing the core and nal releases is available in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The
Chimera repository contains a summary of changes https://github.com/cefriel/
chimera/releases.
in seconds and averaged on the 5 runs of each conversion; TO stands for
timeout (&gt;24-hours), OoM stands for out-of-memory (&gt;24GB). We also report the
input size and the number of triples generated at the end of the lifting phase.
        </p>
        <p>We also tested di erent con gurations of the pipeline (not fully reported
here). The results in Table 1 refer to the best-performing con guration for each
format and size. As a preliminary comment, we notice that the overall conversion
time is mainly in uenced by the lifting phase (as also shown later in Table 2).</p>
        <p>Scale 1 5 10 50 100
Input Size 4.9 MB 10.42 MB 23.64 MB 106.1 MB 247.5 MB
Num. Triples 565,489 1,800,911 3,663,380 18,009,100 36,633,800
Release core nal core nal core nal core nal core nal
CSV 22.77 10.83 164.95 55.37 544.41 154.13 11,624.45 3,441.67 TO OoM
JSON 50.41 30.89 659.11 394.21 2,471.29 1,467.70 66,003.36 34,901.65 TO TO
XML TO 16.26 TO 123.29 TO 434.05 TO 12,648.65 TO OoM</p>
        <p>The results show a consistent performance improvement in the conversion
time of the nal release with respect to the core one, respectively for CSV (2x),
JSON (1:6x) and XML (&gt; 1; 000x) data sources. The nal version was able to
convert CSV, JSON and XML datasets up to 100 MB and generating 18 M
triples with the available resources, thus demonstrating its capability to process
even large dataset of static transportation data.</p>
        <p>The nal version mainly improved with respect to the lifting phase, due to
the adoption of di erent libraries to access the input data sources, a simpli ed
mechanism to handle the generated triples (triples are directly written to the
RDF repository of the Chimera pipeline during the lifting procedure) and the
introduction of di erent concurrency strategies in the RML block (at the record
level and/or triple map level). However, concurrency naturally increases CPU
usage and memory consumption, thus, in some cases, it may be preferable to use
single-threading, with a longer conversion time but lower resource usage.</p>
        <p>
          Additional tests, available in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], show how concurrency in the lifting process
performs better at the triple map level if triple maps associated with the same
logical source are performed within the same thread (avoiding di erent threads
to concurrently access the same input source). However, di erent RML
mappings may result in di erent performance results (e.g., number of triple maps
de ned for each logical source, join conditions among them) considering the same
concurrency con guration.
        </p>
        <p>CSV conversion time outperforms the JSON/XML one because of the impact
of the libraries to access the input datasources in the lifting procedure. Also
intuitively, accessing rows in a CSV le and iterating over them is less expensive than
querying a JSON/XML le resolving a JSONPath/XPath query and iterating
over nodes retrieved. This aspect a ects the execution and worsens with the size
of the le to query, hence the di erences in timings. Considering the XML data
format, the core version did not complete the conversion within the 24 hours
timeout even for the 1-xml dataset, while the nal converter completed it up to
the 50-scale size, performing even better than the JSON conversion thanks to the
new XML parser. The conversion of the JSON dataset obtained more limited
improvements between the core and nal version, mainly because the
performances of the JSON parser limit the advantages of introducing concurrency.
The results obtained for JSON and XML show not only the importance of
lifting mapping optimisation via concurrency, but also the impact on performances
of the parsing procedure.</p>
        <p>Finally, we checked the impact of join conditions in the RML mappings on
the conversion time. Mappings de ned in the GTFS-Madrid-Bench maximise the
number of joins among the di erent les composing a GTFS feed. As in SQL
queries, a growing number of items (in this case, the scale of the input dataset)
increases non linearly the number of needed comparisons for a non-optimised
join operation. In the RML mappings it is often possible to avoid the usage of
join conditions by adopting the same IRI generation pattern in di erent triple
maps. With this approach, we managed to optimise the RML mappings for
the GTFS-Madrid-Bench dataset. Two RML mappings producing exactly the
same knowledge graph were used to compare conversion times in Chimera with
and without joins conditions. The results showed that the optimised mappings
reduced the conversion time up to 2=3 in the case of 50-scale CSV (6205.25s with
joins, 2269.62s without joins).</p>
        <p>Table 2 compares, for the 50-scale CSV dataset, the results obtained
considering an in-memory and an external RDF repository (triplestore). It is important
to point out that the performance strictly depends also on the employed
triplestore, in this case GraphDB Free v9.0.0 18. On one hand, the usage of an external
repository with incremental writes ( nal-csv-ext ) drastically reduces the memory
consumption (2x reduction) with respect to the same con guration run using an
in-memory repository ( nal-csv ). The conversion time using incremental writes
is higher, but it can be acceptable because it comes with a substantial decrease
in resource usage; the adoption of a triplestore without concurrency limitations19
would likely bring an even better time-resource trade o . In any case, it is
important to take into account that an external repository implies also its own
resource usage.</p>
        <p>Table 2 also details the lifting and lowering time considering the di erent
formats for the 50-scale dataset in the nal release. As previously commented,
lifting times are in uenced by the input data format whereas the lowering times
are similar since the same lowering mappings are executed on the same
knowledge graph. In general, our results show that lifting through materialisation is a
time-consuming approach in case of large datasets. However, it is important to
highlight that the really good lowering performance is mainly due to the
possibility of querying a materialised knowledge graph. The lowering time is higher
when using an external repository ( nal-csv-ext ) due to the concurrency
limitations for the SPARQL queries in the template. However, complex queries in
18 https://graphdb.ontotext.com/
19 Free version of GraphDB is limited to two concurrent queries</p>
        <p>Conversion Lifting Lowering Max Mem Max CPU</p>
        <p>
          time (s) time (s) time (s) (GB) Usage (%)
core-csv 11,624.45 11,583.49 40.97 18.84 185.56
nal-csv 3,441.67 3,407.39 34.28 18.58 516.51
nal-csv-ext 3,784.34 3,659.39 88.95 9.63 314.61
nal-json 34,901.65 34,861.97 39.69 18.45 153.97
nal-xml 12,648.65 12,614.62 34.03 18.44 540.01
the templates applied to large knowledge graphs can e ectively bene t of an
external triplestore [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. For example, when lifting and lowering steps have
similar execution times, it may be bene cial to increase a little bit the lifting time
writing triples to an external repository, to speed up the lowering phase thanks
to the reading performances of the triplestore.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Message Conversion: Performance and Scalability</title>
        <p>In Table 3 we report the results of the tests performed for the message conversion
scenario. For the nal release, we compare two con gurations: nal-m-1
introducing concurrency in lifting only at the record level, and nal-m-2 introducing
it also at the triple map level. The best performing con guration ( nal-m-1 ),
resulted in a conversion times in the order of 100ms, thus perfectly acceptable
in a runtime scenario. Moreover, a 5x improvement in the conversion time was
obtained from the core to the nal release, thanks to the already mentioned
optimisations, but also thanks to a speci c con guration for the message conversion
scenario that executes the lowering transformation avoiding input/output
operations on the lesystem. The results obtained for nal-m-2, demonstrate that
for small messages it is preferable not to introduce excessive concurrency, since
the structure initialization time (e.g., threads) is not compensated by an overall
speedup. Finally, it is worth noting the very limited resource usage, especially
memory, in the di erent tested con gurations.</p>
        <p>Table 4 reports the scalability test results with an increasing number of
concurrent requests. A single instance of the converter managed to handle up to
100 concurrent requests per second (at 150 concurrent requests the processing
time overcomes one second), handling successfully also a peak of 2,500
concurrent requests. After 3,000 pending requests the queue mechanism provided by</p>
        <p>Number of
concurrent requests
10
50
100
150
500
1,000
2,500</p>
        <p>Apache Camel starts dropping requests. The maximum length of the queue can
be increased, however, a high number of pending requests is associated with
noticeable performance degradation. The low resource usage and the optimisations
resulted in very good scalability for increasing workloads, even considering a
single instance of the converter.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Recommendations for Performance and Scalability</title>
        <p>The reported results show the improvements obtained in the development of the
Chimera framework for semantic data conversion pipelines and its
applicability to both the dataset and message conversion scenarios in the transportation
domain. Moreover, from the performed testing activities, we derive a set of
additional recommendations for performance and scalability using the proposed
approach.</p>
        <p>
          Performance Considering the RML-based lifting, the speci c RML mappings
de ned for a conversion pipeline (join conditions, number of triple maps, number
of logical sources, path to access the records, . . . ) can in uence the performances
of the lifting portion. As a result, the choice of the pipeline con guration should
take into account the trade-o between the conversion time and the resource
usage, for example with di erent concurrency strategies. In particular, in some
cases the gain in conversion time obtained does not justify the higher resource
usage. Additional recommendations for the RML-based lifting are:
{ to e ciently exploit concurrency, it is important to tune the di erent
parameters, e.g., the number of concurrent threads adopted;
{ concurrency may cause issues in case of RML mappings generating blank
nodes without specifying a deterministic identi er (each thread may assign
di erent random identi ers to the same blank node generating it multiple
times);
{ the presence of many functions in the RML mappings (RML and FnO
integration [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]) can cause concurrent access to the same data structures
negatively impacting the conversion time;
{ concurrency strategies can be implemented not only within the lifting
procedure but also in the pipeline, for example, con guring di erent lifting blocks
in parallel or exploiting concurrent consumers for routes in Chimera.
With respect to the lowering phase, the queries and the logic of the templates
can heavily a ect performance. Therefore, it is recommended to:
{ Optimise queries, by accessing data using simple queries and avoiding
expensive constructs or patterns. It is better to divide complex queries into
sub-queries, if possible.
{ Optimise template logic, by avoiding nested loops and by using support data
structures (e.g. maps) to e ciently access records in queries' result sets.
Finally, the stream option to process templates in-memory (avoiding
input/output operations) is recommended only for the runtime data/message conversion
use case. For large batch datasets, this option should be avoided, because the
template engine is able to optimise memory consumption with incremental writes
to the lesystem.
        </p>
        <p>
          Scalability In the dataset conversion scenario, the scalability of the solution is
limited by memory consumption due to the materialisation of large knowledge
graphs. A potential alternative could exploit virtualisation techniques for the
lifting phase, but the state-of-the-art tools are still not mature enough to be
employed in a conversion scenario [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. To address the materialisation scalability, the
Chimera framework allows using an external repository to store the materialised
graph. In our tests, we showed that this approach can e ectively reduce memory
consumption, but it still has some limits. In particular, the use of an external
repository shifts the bottleneck to the triplestore. For this reason, for very large
datasets, we recommend to split the conversion as follows, under the
assumption that the materialised graph does not change very often: (i) execution of the
lifting procedure (if required, splitting the mappings in di erent executions);
(ii) bulk loading of the materialised graph(s) into the triplestore (thus avoiding
incremental indexing issues); (iii) on-demand lowering of the materialised graph
(possibly with a separate Chimera pipeline). This approach also allows to select
di erent lifting tools. Indeed, the RML speci cation has several implementations
that can be chosen on the basis of the speci c scenario requirements [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>In the message conversion scenario, to cope with a higher number of
concurrent requests, it is possible to deploy more than one instance of the converter
exploiting a load balancing mechanism. However, standards in the transportation
domain usually require dealing with a large set of di erent message types which
implies de ning a high number of converters. In this situation, an e cient
scalability strategy is to deploy (multiple instances of) a universal converter which
is able to dynamically select and execute the relevant mappings with respect to
the input/output message.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>Semantic interoperability in the transportation domain can be addressed e
ectively exploiting Semantic Web technologies to enable the communication
between actors and the de nition of new services for travellers. To guarantee the
adoption by relevant stakeholders, it is however extremely important to address
their performance and scalability requirements. Considering both the dataset
and message conversion scenario, this paper identi ed two test cases from the
transportation domain and evaluated performances and scalability of the
semantic data conversion approach using the Chimera framework, with a declarative
approach based on RML mappings for lifting and on Apache Velocity templates
and SPARQL queries for lowering.</p>
      <p>Our analysis showed the potential and exibility of the Chimera solution: in
the dataset conversion scenario, we managed to generate and handle knowledge
graphs with millions of triples; in the message conversion scenario, we obtained
very low conversion times and a proved robustness also with hundred of
concurrent requests per second. Moreover, on the basis of our tests, we de ned a set
of recommendations to improve performance and scalability with the discussed
approach.</p>
      <p>
        As future work, we would like to investigate and implement in Chimera
further optimisations for the lifting procedure, for example adopting the data
structures de ned in the SDM-RDFizer tool [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and investigating concurrency
strategy improvements. Moreover, we would like to setup a more comprehensive
benchmark for the message conversion scenario (e.g., based on GTFS-RT20).
      </p>
      <sec id="sec-5-1">
        <title>Acknowledgments</title>
        <p>The presented research was partially supported by the SPRINT project (Grant
Agreement 826172) and the RIDE2RAIL project (Grant Agreement 881825), co-funded by
the European Commission under the Horizon 2020 Framework Programme.
20 https://developers.google.com/transit/gtfs-realtime</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arenas-Guerrero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Knowledge graph construction with R2RML and RML: An ETL system-based overview</article-title>
          .
          <source>In: Proceedings of the 2nd International Workshop on Knowledge Graph Construction</source>
          (
          <year>2021</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2873</volume>
          / paper11.pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bennara</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefrancois</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Messalti</surname>
          </string-name>
          , N.:
          <article-title>Interoperability of Semantically-Enabled Web Services on the WoT: Challenges and Prospects</article-title>
          .
          <source>In: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications &amp; Services</source>
          . pp.
          <volume>149</volume>
          {
          <issue>153</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1145/3428757.3429199
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carenini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <article-title>ST4RT { Semantic Transformations for Rail Transportation</article-title>
          .
          <source>In: 7th Transport Research Arena (TRA</source>
          <year>2018</year>
          ).
          <source>Zenodo (Apr</source>
          <year>2018</year>
          ). https://doi.org/10.5281/zenodo.1440984
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carenini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , et al.:
          <source>SPRINT project Deliverable D5</source>
          .
          <article-title>6 { Final report on the results of the validation of pilot implementation (</article-title>
          <year>2021</year>
          ), http://sprint-transport.eu/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Endris</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>What are the parameters that a ect the construction of a knowledge graph? In: On the Move to Meaningful Internet Systems</article-title>
          . pp.
          <volume>695</volume>
          {
          <fpage>713</fpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -33246-4 43
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>65</volume>
          ,
          <issue>100596</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1016/j.websem.
          <year>2020</year>
          .100596
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Llaves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>Intermodal public transit routing using linked connections</article-title>
          .
          <source>In: International Semantic Web Conference: Posters and Demos</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          (
          <issue>2015</issue>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1486</volume>
          /paper_28.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Comerio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carenini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scrocca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celino</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Turn transportation data into EU compliance through semantic web-based solutions</article-title>
          .
          <source>In: 1st International Workshop On Semantics For Transport</source>
          . vol.
          <volume>2447</volume>
          (
          <year>2019</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2447</volume>
          /paper6.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>R2RML: RDB to RDF mapping language</article-title>
          .
          <source>W3C recommendation</source>
          ,
          <source>W3C (Sep</source>
          <year>2012</year>
          ), http://www.w3.org/TR/2012/RECr2rml-20120927/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sande</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>de</surname>
            <given-names>Walle</given-names>
          </string-name>
          , R.V.:
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          .
          <source>In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW</source>
          <year>2014</year>
          ). vol.
          <volume>1184</volume>
          . CEURWS.org (
          <year>2014</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1184</volume>
          /ldow2014_paper_01.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hosseini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalwar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Automated mapping for semantic-based conversion of transportation data formats</article-title>
          .
          <source>In: 1st International Workshop On Semantics For Transport</source>
          . vol.
          <volume>2447</volume>
          (
          <year>2019</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2447</volume>
          /paper7.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Iglesias</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>SDM-RDFizer: An RML interpreter for the e cient creation of rdf knowledge graphs</article-title>
          .
          <source>In: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</source>
          . pp.
          <volume>3039</volume>
          {
          <issue>3046</issue>
          (
          <year>2020</year>
          ). https://doi.org/10.1145/3340531.3412881
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jurankova</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al.:
          <source>SPRINT project Deliverable D5</source>
          .
          <article-title>5 { Software release of the proof-of-concept in its technical environment (</article-title>
          <string-name>
            <surname>F-REL)</surname>
          </string-name>
          (
          <year>2020</year>
          ), http:// sprint-transport.eu/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maroy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.:
          <article-title>RML and FnO: Shaping DBpedia declaratively</article-title>
          . In: The Semantic Web:
          <article-title>ESWC 2017 Satellite Events</article-title>
          . vol.
          <volume>10577</volume>
          , pp.
          <volume>172</volume>
          {
          <fpage>177</fpage>
          . Springer (
          <year>2017</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -70407-4 32
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buchn</surname>
            <given-names>cek</given-names>
          </string-name>
          , P.,
          <string-name>
            <surname>Carenini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gogos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santoro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>SPRINT: Semantics for PerfoRmant and scalable INteroperability of multimodal Transport</article-title>
          .
          <source>In: 8th Transport Research Arena TRA 2020</source>
          . pp.
          <volume>1</volume>
          {
          <issue>10</issue>
          (
          <year>2020</year>
          ), http://hdl.handle.
          <source>net/11311/1132635</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Scrocca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Comerio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carenini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Celino</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Turning transport data to comply with EU standards while enabling a multimodal transport knowledge graph</article-title>
          .
          <source>In: Proceedings of the 19th International Semantic Web Conference</source>
          . vol.
          <volume>12507</volume>
          , pp.
          <volume>411</volume>
          {
          <fpage>429</fpage>
          . Springer (
          <year>2020</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -62466-8 26
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Vetere</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Models for semantic interoperability in serviceoriented architectures</article-title>
          .
          <source>IBM Systems Journal</source>
          <volume>44</volume>
          (
          <issue>4</issue>
          ),
          <volume>887</volume>
          {
          <fpage>903</fpage>
          (
          <year>2005</year>
          ). https://doi.org/10.1147/sj.444.0887
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>