Semantic Conversion of Transport Data
    Adopting Declarative Mappings: an Evaluation
           of Performance and Scalability

    Mario Scrocca    , Alessio Carenini     , Marco Comerio       , and Irene Celino

                    Cefriel, Milan, Italy name.surname@cefriel.com


        Abstract. The transportation domain is characterised by a multitude
        of different formats to represent data, thus creating a problem of (lack of)
        interoperability between systems and a need for data conversion. In or-
        der to cope with the specific requirements of production systems, special
        attention should be given to performance and scalability of conversion
        solutions. In this paper, we present a thorough evaluation of the Chimera
        framework for semantic data conversion through declarative mappings,
        in both a dataset and message conversion scenarios. We illustrate the
        experimental results and we offer our considerations and recommenda-
        tions for the successful implementation of conversion pipelines exploiting
        Semantic Web technologies.

        Keywords: Transport Data · Semantic Data Conversion · Knowledge
        Graph Construction


1     Introduction

    Interoperability in the transportation domain is the main challenge to pro-
vide travellers with multi-modal door-to-door travel solutions involving differ-
ent transport service providers. The Shift2Rail initiative, financed by the Eu-
ropean Commission, has been addressing this challenge within the Innovation
Programme 41 by defining an Interoperability Framework to enable a seamless
data exchange between different transport stakeholders [15]. The proposed ap-
proach exploits a reference ontology, representing the shared conceptual model of
the transport domain, to enable interoperability at a semantic level and any-to-
any communications between different actors. In this scenario, each stakeholder
is not forced to adopt a new format or standard and can enter the ecosystem
by defining a set of rules that specify how the currently used data model can be
mapped onto the reference ontology [8]. As a result, a technical artifact, hence-
forth referred as the converter, can be configured to translate messages from
    Copyright © 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).

1
    Cf. https://shift2rail.org/research-development/ip4/
2       M. Scrocca, A. Carenini, M. Comerio, and I. Celino

a standard A to a standard B, exploiting Semantic Web technologies and the
reference ontology.
    The main challenges in the adoption of this approach are the definition of
the mapping rules and the performance and scalability requirements related to
the technical implementation of converters. In the SPRINT project (Seman-
tics for PerfoRmant and scalable INteroperability of multimodal Transport) we
addressed these two aspects [11,16]. This work reports the results obtained in
developing a semantic converter and in assessing its performances and scalability.
    The performance and scalability evaluation of a converter should consider the
two main conversion scenarios in the transportation domain: the harmonisation
of static data, like timetables and scheduled transport, and the transformation of
dynamic data, like journey planning messages. The Dataset Conversion (batch
conversion) scenario considers the case where medium-sized to big archives of
transportation data, usually static data, should be converted. Conversions are
required with low frequency, but the conversion procedure should minimise the
resource usage to obtain scalability with respect to the size of the dataset. The
Message Conversion (service mediation) scenario considers the case where a
message, usually dynamic data, should be converted to guarantee communication
between two different systems at runtime. A small amount of data is converted
for each request, but the conversion procedure should minimise the processing
time to avoid introducing overhead and to obtain scalability with respect to a
high frequency request rate.
    To address these two scenarios, we designed and developed a generic, modular
and flexible solution, the Chimera framework2 [16], the converter component of
the Interoperability Framework. The performed testing activities reported in this
work, besides offering insights on the Chimera implementation, contribute to a
general validation of the involved technologies and tools for the discussed use
case in the transportation domain.
    The reminder of the paper is organised as follows: Section 2 deals with pre-
liminaries and related works; Section 3 describes the testing activities designed
to evaluate converters; Section 4 discusses the results obtained and elicits a set
of recommendations for the performance and scalability of converters; Section 5
draws the conclusions and defines potential future works.


2     Preliminaries and Related Works
A semantic data conversion procedure, following the any-to-one centralised map-
ping approach [17], transforms data in two steps exploiting the defined rules
to/from the reference ontology: (i) input data are mapped onto the reference
ontology (lifting phase from standard A to RDF), (ii) the obtained triples, spec-
ifying data through the reference ontology, are mapped onto the target data
format (lowering phase from RDF to standard B).
    A possible implementation of the described conversion procedure is based
on a Object-Relational Mapping (ORM) approach using unmarshalling/mar-
2
    https://github.com/cefriel/chimera
                         Evaluating Semantic Conversion of Transport Data          3


Fig. 1: Conversion procedure adopting a two-step approach (lifting and lowering)
based on declarative mappings.
shalling libraries to obtain an in-memory representation of data as objects, and
then exploiting annotations in the code to map each class and attribute to class
and properties of the reference ontology. This approach is implemented in RDF-
Beans3 , Empire4 , and was studied for the definition of the ST4RT converter.
This implementation improved the pre-existing approaches providing more flex-
ibility in the definition of the annotations to address complex mappings in both
the lifting and the lowering phase [3]. However, while this method allows using
classical object-oriented programming techniques, its application showcased sev-
eral drawbacks: (i) an object-oriented representation of the source/target data
format is required, (ii) related performance and scalability issues arise for conver-
sion procedure with complex annotations and/or handling large files. For these
reasons, in the SPRINT project, we implemented and tested an alternative con-
verter relying on declarative mappings for the lifting and lowering phases (cf.,
Figure 1).
    Chimera is an open-source framework based on Apache Camel5 and adopts
a modular approach to build flexible pipelines for conversions based on Seman-
tic Web technologies. The main goal of Chimera is to minimise the amount of
code to be written, allowing to create a converter just by configuring the various
blocks offered by the framework. In particular, this paper evaluates conversion
procedures adopting the following Chimera blocks: (i) a lifting block for materi-
alisation through a custom version of the RML-Mapper library6 and employing
RML [10] mappings to generate RDF triples, and (ii) a lowering block based on
Apache Velocity templates to query the RDF graph with SPARQL queries and
to define the logic to place the query results in the proper output format7 . A


3
  RDFBeans, cf. https://github.com/cyberborean/rdfbeans
4
  Empire, cf. https://github.com/mhgrove/Empire
5
  https://camel.apache.org/
6
  https://github.com/RMLio/rmlmapper-java
7
  https://github.com/cefriel/rdf-lowerer. A demo example of the lowering ap-
  proach is available in the repository.
4      M. Scrocca, A. Carenini, M. Comerio, and I. Celino

more detailed overview of the Chimera framework is available in [16] and in the
repository8 .
    RML is a mapping language that extends the R2RML recommendation [9]
to support heterogeneous data sources. RML mapping rules are defined on a set
of logical sources representing the input data sources; each rule is represented
using a triple map that defines how to extract a record from the logical source
and generate a set of associated RDF triples; a join condition allows to create
triples involving entities generated by two triple maps.
    A real-world use case and evaluation of the discussed approach for batch data
conversion in the transportation domain is presented in [16] exploiting an initial
release of Chimera. This paper describes the results of a broader evaluation of
the converter: we adopt a transportation domain benchmark for the batch data
conversion scenario [6], discuss how different configurations and parameters [5]
can affect the performances and scalability of the conversion, and evaluate the
converter also in the service mediation scenario.
    A comparison of Chimera with others state-of-the-art tools for knowledge
graph materialisation is available in [1]. The advantages of adopting a semantic
data conversion procedure based on lifting and lowering mechanisms in a different
domain, i.e. the Web of Things, is presented in [2].


3   Test Design

In this section we present the performance and scalability evaluation, for both
dataset and message conversion, and the testing infrastructure.


Dataset Conversion For the evaluation in the dataset conversion scenario,
we chose the GTFS-Madrid-Bench9 [6]. The benchmark is based on GTFS10
(General Transit Feed Specification) data on the Madrid city metro published
from the Consorcio Regional de Transportes de Madrid (CRTM11 ). GTFS is
composed of a set of CSV files where each one represents some information
about static transit information. Based on the original data and according to
the benchmark, we generated datasets of increasing scale (1, 5, 10, 50, and 100),
and in different formats (CVS, JSON and XML). It is important to point out
that we generated those datasets for performance testing, but a typical GTFS
feed size is in the order of tens of megabytes and rarely overcomes the 100 MB
when unzipped.
    The planned testing activities considered a roundtrip conversion GTFS →
Linked GTFS [7] → GTFS. The GTFS-to-LinkedGTFS RML lifting mappings
are from the GTFS-Madrid-Bench. For the lowering, we defined a set of custom
Apache Velocity templates. The lifting phase considers different input formats
8
   https://github.com/cefriel/chimera/blob/master/README.md
9
   https://github.com/oeg-upm/gtfs-bench
10
   https://developers.google.com/transit/gtfs
11
   https://www.crtm.es
                          Evaluating Semantic Conversion of Transport Data         5

(CSV, JSON or XML), while the lowering phase always produces CSV files.
In the conversion pipeline we used an in-memory RDF repository to store and
query the materialised RDF graph. However, we also evaluated the impact on
performances of using an external triplestore.

Message Conversion To evaluate the message conversion, we selected a re-
alistic journey planning test case involving a deployed web service, converting
a response message from the HaCon VBB journey planning endpoint12 to the
TRIAS format13 . During the lifting, a VBB TripList message (representing travel
solutions for a requested itinerary, dimension 43KB) is mapped onto the IP4
IT2Rail ontology14 through RML mappings. The materialised graph is stored
through an in-memory RDF repository. During the lowering, the data modelled
through the IT2Rail ontology are mapped onto a TRIAS TripResponse message
using a Apache Velocity template with specific SPARQL queries. We employed
the JMeter tool15 to test the performances of the converter with a increasing
size of concurrent requests (number of threads: 10, 50, 100, 150, 500, 1000, 2500,
5000; ramp-up period: 1 second; loop count: 1).

Testing infrastructure All tests were run using a Docker16 container to guar-
antee reproducibility on a machine running CentOS Linux 7, with Intel Xeon
8-core CPU and 64 GB Memory. We set a memory limit to 24GB, a timeout of
24 hours and no limits on CPU usage. We run each test 5 times averaging the
obtained results. We also monitored resource usage of containers in execution.

4     Test Evaluation
In this section, we discuss the performance and scalability test results and illus-
trate the identified bottlenecks and their possible solutions. Additional details
and data are available in [4]. We compare the conversion results obtained with
two different releases of Chimera, core and final. The final release implements
a set of optimisations, discussed in this section, that were developed within the
SPRINT project to improve the performance and scalability of the lifting and
lowering components17 .

4.1   Dataset Conversion: Performance and Scalability
In Table 1, we provide the complete results for the dataset conversion scenario
considering different sizes and different formats. Execution times are measured
12
   http://fahrinfo.vbb.de
13
   https://github.com/VDVde/TRIAS
14
   http://www.it2rail.eu/
15
   https://jmeter.apache.org/
16
   https://www.docker.com/
17
   A complete report discussing the core and final releases is available in [13]. The
   Chimera repository contains a summary of changes https://github.com/cefriel/
   chimera/releases.
6       M. Scrocca, A. Carenini, M. Comerio, and I. Celino

in seconds and averaged on the 5 runs of each conversion; TO stands for time-
out (>24-hours), OoM stands for out-of-memory (>24GB). We also report the
input size and the number of triples generated at the end of the lifting phase.
    We also tested different configurations of the pipeline (not fully reported
here). The results in Table 1 refer to the best-performing configuration for each
format and size. As a preliminary comment, we notice that the overall conversion
time is mainly influenced by the lifting phase (as also shown later in Table 2).
   Scale         1            5               10                50             100
 Input Size   4.9 MB      10.42 MB        23.64 MB           106.1 MB       247.5 MB
Num. Triples 565,489      1,800,911       3,663,380         18,009,100      36,633,800
  Release    core final core final      core     final    core      final core final
    CSV     22.77 10.83 164.95 55.37 544.41 154.13 11,624.45 3,441.67 TO OoM
   JSON     50.41 30.89 659.11 394.21 2,471.29 1,467.70 66,003.36 34,901.65 TO TO
   XML       TO 16.26 TO 123.29 TO              434.05     TO     12,648.65 TO OoM

Table 1: Dataset conversion execution times (in seconds) for 1,5,10,50,100-scale
size GTFS dataset, comparing different formats and Chimera releases.

    The results show a consistent performance improvement in the conversion
time of the final release with respect to the core one, respectively for CSV (2x),
JSON (1.6x) and XML (> 1, 000x) data sources. The final version was able to
convert CSV, JSON and XML datasets up to 100 MB and generating 18 M
triples with the available resources, thus demonstrating its capability to process
even large dataset of static transportation data.
    The final version mainly improved with respect to the lifting phase, due to
the adoption of different libraries to access the input data sources, a simplified
mechanism to handle the generated triples (triples are directly written to the
RDF repository of the Chimera pipeline during the lifting procedure) and the
introduction of different concurrency strategies in the RML block (at the record
level and/or triple map level). However, concurrency naturally increases CPU
usage and memory consumption, thus, in some cases, it may be preferable to use
single-threading, with a longer conversion time but lower resource usage.
    Additional tests, available in [4], show how concurrency in the lifting process
performs better at the triple map level if triple maps associated with the same
logical source are performed within the same thread (avoiding different threads
to concurrently access the same input source). However, different RML map-
pings may result in different performance results (e.g., number of triple maps
defined for each logical source, join conditions among them) considering the same
concurrency configuration.
    CSV conversion time outperforms the JSON/XML one because of the impact
of the libraries to access the input datasources in the lifting procedure. Also intu-
itively, accessing rows in a CSV file and iterating over them is less expensive than
querying a JSON/XML file resolving a JSONPath/XPath query and iterating
over nodes retrieved. This aspect affects the execution and worsens with the size
of the file to query, hence the differences in timings. Considering the XML data
format, the core version did not complete the conversion within the 24 hours
                            Evaluating Semantic Conversion of Transport Data    7

timeout even for the 1-xml dataset, while the final converter completed it up to
the 50-scale size, performing even better than the JSON conversion thanks to the
new XML parser. The conversion of the JSON dataset obtained more limited
improvements between the core and final version, mainly because the perfor-
mances of the JSON parser limit the advantages of introducing concurrency.
The results obtained for JSON and XML show not only the importance of lift-
ing mapping optimisation via concurrency, but also the impact on performances
of the parsing procedure.
     Finally, we checked the impact of join conditions in the RML mappings on
the conversion time. Mappings defined in the GTFS-Madrid-Bench maximise the
number of joins among the different files composing a GTFS feed. As in SQL
queries, a growing number of items (in this case, the scale of the input dataset)
increases non linearly the number of needed comparisons for a non-optimised
join operation. In the RML mappings it is often possible to avoid the usage of
join conditions by adopting the same IRI generation pattern in different triple
maps. With this approach, we managed to optimise the RML mappings for
the GTFS-Madrid-Bench dataset. Two RML mappings producing exactly the
same knowledge graph were used to compare conversion times in Chimera with
and without joins conditions. The results showed that the optimised mappings
reduced the conversion time up to 2/3 in the case of 50-scale CSV (6205.25s with
joins, 2269.62s without joins).
     Table 2 compares, for the 50-scale CSV dataset, the results obtained consid-
ering an in-memory and an external RDF repository (triplestore). It is important
to point out that the performance strictly depends also on the employed triple-
store, in this case GraphDB Free v9.0.0 18 . On one hand, the usage of an external
repository with incremental writes (final-csv-ext) drastically reduces the memory
consumption (2x reduction) with respect to the same configuration run using an
in-memory repository (final-csv ). The conversion time using incremental writes
is higher, but it can be acceptable because it comes with a substantial decrease
in resource usage; the adoption of a triplestore without concurrency limitations19
would likely bring an even better time-resource trade off. In any case, it is im-
portant to take into account that an external repository implies also its own
resource usage.
     Table 2 also details the lifting and lowering time considering the different
formats for the 50-scale dataset in the final release. As previously commented,
lifting times are influenced by the input data format whereas the lowering times
are similar since the same lowering mappings are executed on the same knowl-
edge graph. In general, our results show that lifting through materialisation is a
time-consuming approach in case of large datasets. However, it is important to
highlight that the really good lowering performance is mainly due to the possi-
bility of querying a materialised knowledge graph. The lowering time is higher
when using an external repository (final-csv-ext) due to the concurrency limi-
tations for the SPARQL queries in the template. However, complex queries in
18
     https://graphdb.ontotext.com/
19
     Free version of GraphDB is limited to two concurrent queries
8       M. Scrocca, A. Carenini, M. Comerio, and I. Celino

                     Conversion Lifting Lowering Max Mem Max CPU
                       time (s) time (s) time (s) (GB) Usage (%)
           core-csv   11,624.45 11,583.49 40.97   18.84   185.56
           final-csv   3,441.67 3,407.39 34.28    18.58   516.51
        final-csv-ext 3,784.34 3,659.39 88.95      9.63   314.61
          final-json 34,901.65 34,861.97 39.69    18.45   153.97
          final-xml 12,648.65 12,614.62 34.03     18.44   540.01
Table 2: Dataset conversion scenario: 50-scale size GTFS dataset detailed results
comparing different configurations and formats.


                    Conversion Lifting Lowering Max Mem Max CPU Num.
                    time (ms) time (ms) time (ms) (GB) Usage (%) Triples
            core-m     739       711       28      0.09    40     466
          final-m-1    138       107       31      0.04    25     466
          final-m-2    166       125       41      0.04    30     466

Table 3: Message conversion scenario: VBB-TRIAS conversion comparing differ-
ent configurations.

the templates applied to large knowledge graphs can effectively benefit of an
external triplestore [16]. For example, when lifting and lowering steps have sim-
ilar execution times, it may be beneficial to increase a little bit the lifting time
writing triples to an external repository, to speed up the lowering phase thanks
to the reading performances of the triplestore.

4.2   Message Conversion: Performance and Scalability
In Table 3 we report the results of the tests performed for the message conversion
scenario. For the final release, we compare two configurations: final-m-1 intro-
ducing concurrency in lifting only at the record level, and final-m-2 introducing
it also at the triple map level. The best performing configuration (final-m-1 ),
resulted in a conversion times in the order of 100ms, thus perfectly acceptable
in a runtime scenario. Moreover, a 5x improvement in the conversion time was
obtained from the core to the final release, thanks to the already mentioned opti-
misations, but also thanks to a specific configuration for the message conversion
scenario that executes the lowering transformation avoiding input/output oper-
ations on the filesystem. The results obtained for final-m-2, demonstrate that
for small messages it is preferable not to introduce excessive concurrency, since
the structure initialization time (e.g., threads) is not compensated by an overall
speedup. Finally, it is worth noting the very limited resource usage, especially
memory, in the different tested configurations.
    Table 4 reports the scalability test results with an increasing number of con-
current requests. A single instance of the converter managed to handle up to
100 concurrent requests per second (at 150 concurrent requests the processing
time overcomes one second), handling successfully also a peak of 2,500 concur-
rent requests. After 3,000 pending requests the queue mechanism provided by
                          Evaluating Semantic Conversion of Transport Data          9

              Number of       Avg Time of processing Interval between
          concurrent requests    N Requests [ms]       requests [ms]
                  10                   131                  100
                  50                   219                   20
                  100                  775                   10
                  150                 1,663                 6.7
                  500                 3,926                  2
                 1,000                7,567                  1
                 2,500               21,114                 0.4

Table 4: Message conversion scenario: VBB-TRIAS conversion with increasing
number of concurrent requests.

Apache Camel starts dropping requests. The maximum length of the queue can
be increased, however, a high number of pending requests is associated with no-
ticeable performance degradation. The low resource usage and the optimisations
resulted in very good scalability for increasing workloads, even considering a
single instance of the converter.


4.3   Recommendations for Performance and Scalability

The reported results show the improvements obtained in the development of the
Chimera framework for semantic data conversion pipelines and its applicabil-
ity to both the dataset and message conversion scenarios in the transportation
domain. Moreover, from the performed testing activities, we derive a set of ad-
ditional recommendations for performance and scalability using the proposed
approach.


Performance Considering the RML-based lifting, the specific RML mappings
defined for a conversion pipeline (join conditions, number of triple maps, number
of logical sources, path to access the records, . . . ) can influence the performances
of the lifting portion. As a result, the choice of the pipeline configuration should
take into account the trade-off between the conversion time and the resource
usage, for example with different concurrency strategies. In particular, in some
cases the gain in conversion time obtained does not justify the higher resource
usage. Additional recommendations for the RML-based lifting are:

 – to efficiently exploit concurrency, it is important to tune the different pa-
   rameters, e.g., the number of concurrent threads adopted;
 – concurrency may cause issues in case of RML mappings generating blank
   nodes without specifying a deterministic identifier (each thread may assign
   different random identifiers to the same blank node generating it multiple
   times);
 – the presence of many functions in the RML mappings (RML and FnO inte-
   gration [14]) can cause concurrent access to the same data structures nega-
   tively impacting the conversion time;
10      M. Scrocca, A. Carenini, M. Comerio, and I. Celino

 – concurrency strategies can be implemented not only within the lifting proce-
   dure but also in the pipeline, for example, configuring different lifting blocks
   in parallel or exploiting concurrent consumers for routes in Chimera.

With respect to the lowering phase, the queries and the logic of the templates
can heavily affect performance. Therefore, it is recommended to:

 – Optimise queries, by accessing data using simple queries and avoiding ex-
   pensive constructs or patterns. It is better to divide complex queries into
   sub-queries, if possible.
 – Optimise template logic, by avoiding nested loops and by using support data
   structures (e.g. maps) to efficiently access records in queries’ result sets.

Finally, the stream option to process templates in-memory (avoiding input/out-
put operations) is recommended only for the runtime data/message conversion
use case. For large batch datasets, this option should be avoided, because the
template engine is able to optimise memory consumption with incremental writes
to the filesystem.


Scalability In the dataset conversion scenario, the scalability of the solution is
limited by memory consumption due to the materialisation of large knowledge
graphs. A potential alternative could exploit virtualisation techniques for the
lifting phase, but the state-of-the-art tools are still not mature enough to be em-
ployed in a conversion scenario [6]. To address the materialisation scalability, the
Chimera framework allows using an external repository to store the materialised
graph. In our tests, we showed that this approach can effectively reduce memory
consumption, but it still has some limits. In particular, the use of an external
repository shifts the bottleneck to the triplestore. For this reason, for very large
datasets, we recommend to split the conversion as follows, under the assump-
tion that the materialised graph does not change very often: (i) execution of the
lifting procedure (if required, splitting the mappings in different executions);
(ii) bulk loading of the materialised graph(s) into the triplestore (thus avoiding
incremental indexing issues); (iii) on-demand lowering of the materialised graph
(possibly with a separate Chimera pipeline). This approach also allows to select
different lifting tools. Indeed, the RML specification has several implementations
that can be chosen on the basis of the specific scenario requirements [1].
     In the message conversion scenario, to cope with a higher number of concur-
rent requests, it is possible to deploy more than one instance of the converter
exploiting a load balancing mechanism. However, standards in the transportation
domain usually require dealing with a large set of different message types which
implies defining a high number of converters. In this situation, an efficient scal-
ability strategy is to deploy (multiple instances of) a universal converter which
is able to dynamically select and execute the relevant mappings with respect to
the input/output message.
                         Evaluating Semantic Conversion of Transport Data      11

5      Conclusions and Future Work
Semantic interoperability in the transportation domain can be addressed effec-
tively exploiting Semantic Web technologies to enable the communication be-
tween actors and the definition of new services for travellers. To guarantee the
adoption by relevant stakeholders, it is however extremely important to address
their performance and scalability requirements. Considering both the dataset
and message conversion scenario, this paper identified two test cases from the
transportation domain and evaluated performances and scalability of the seman-
tic data conversion approach using the Chimera framework, with a declarative
approach based on RML mappings for lifting and on Apache Velocity templates
and SPARQL queries for lowering.
    Our analysis showed the potential and flexibility of the Chimera solution: in
the dataset conversion scenario, we managed to generate and handle knowledge
graphs with millions of triples; in the message conversion scenario, we obtained
very low conversion times and a proved robustness also with hundred of concur-
rent requests per second. Moreover, on the basis of our tests, we defined a set
of recommendations to improve performance and scalability with the discussed
approach.
    As future work, we would like to investigate and implement in Chimera
further optimisations for the lifting procedure, for example adopting the data
structures defined in the SDM-RDFizer tool [12] and investigating concurrency
strategy improvements. Moreover, we would like to setup a more comprehensive
benchmark for the message conversion scenario (e.g., based on GTFS-RT20 ).

Acknowledgments
The presented research was partially supported by the SPRINT project (Grant Agree-
ment 826172) and the RIDE2RAIL project (Grant Agreement 881825), co-funded by
the European Commission under the Horizon 2020 Framework Programme.


References
 1. Arenas-Guerrero, J., et al.: Knowledge graph construction with R2RML and RML:
    An ETL system-based overview. In: Proceedings of the 2nd International Work-
    shop on Knowledge Graph Construction (2021), http://ceur-ws.org/Vol-2873/
    paper11.pdf
 2. Bennara, M., Zimmermann, A., Lefrançois, M., Messalti, N.: Interoperabil-
    ity of Semantically-Enabled Web Services on the WoT: Challenges and
    Prospects. In: Proceedings of the 22nd International Conference on Informa-
    tion Integration and Web-based Applications & Services. pp. 149–153 (2020).
    https://doi.org/10.1145/3428757.3429199
 3. Carenini, A., et al.: ST4RT – Semantic Transformations for Rail Transporta-
    tion. In: 7th Transport Research Arena (TRA 2018). Zenodo (Apr 2018).
    https://doi.org/10.5281/zenodo.1440984
20
     https://developers.google.com/transit/gtfs-realtime
12      M. Scrocca, A. Carenini, M. Comerio, and I. Celino

 4. Carenini, A., et al.: SPRINT project Deliverable D5.6 – Final report on the results
    of the validation of pilot implementation (2021), http://sprint-transport.eu/
 5. Chaves-Fraga, D., Endris, K.M., Iglesias, E., Corcho, Ó., Vidal, M.: What
    are the parameters that affect the construction of a knowledge graph? In:
    On the Move to Meaningful Internet Systems. pp. 695–713. Springer (2019).
    https://doi.org/10.1007/978-3-030-33246-4 43
 6. Chaves-Fraga, D., et al.: GTFS-Madrid-Bench: A benchmark for virtual knowledge
    graph access in the transport domain. Journal of Web Semantics 65, 100596 (2020).
    https://doi.org/10.1016/j.websem.2020.100596
 7. Colpaert, P., Llaves, A., Verborgh, R., Corcho, O., Mannens, E., Van de Walle,
    R.: Intermodal public transit routing using linked connections. In: International
    Semantic Web Conference: Posters and Demos. pp. 1–5 (2015), http://ceur-ws.
    org/Vol-1486/paper_28.pdf
 8. Comerio, M., Carenini, A., Scrocca, M., Celino, I.: Turn transportation data
    into EU compliance through semantic web-based solutions. In: 1st International
    Workshop On Semantics For Transport. vol. 2447 (2019), http://ceur-ws.org/
    Vol-2447/paper6.pdf
 9. Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language.
    W3C recommendation, W3C (Sep 2012), http://www.w3.org/TR/2012/REC-
    r2rml-20120927/
10. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.:
    RML: A generic language for integrated RDF mappings of heterogeneous data.
    In: Proceedings of the Workshop on Linked Data on the Web co-located with the
    23rd International World Wide Web Conference (WWW 2014). vol. 1184. CEUR-
    WS.org (2014), http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf
11. Hosseini, M., Kalwar, S., Rossi, M.G., Sadeghi, M.: Automated mapping for
    semantic-based conversion of transportation data formats. In: 1st International
    Workshop On Semantics For Transport. vol. 2447 (2019), http://ceur-ws.org/
    Vol-2447/paper7.pdf
12. Iglesias, E., et al.: SDM-RDFizer: An RML interpreter for the efficient cre-
    ation of rdf knowledge graphs. In: Proceedings of the 29th ACM International
    Conference on Information & Knowledge Management. pp. 3039–3046 (2020).
    https://doi.org/10.1145/3340531.3412881
13. Jurankova, P., et al.: SPRINT project Deliverable D5.5 – Software release
    of the proof-of-concept in its technical environment (F-REL) (2020), http://
    sprint-transport.eu/
14. Meester, B.D., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: RML and
    FnO: Shaping DBpedia declaratively. In: The Semantic Web: ESWC 2017 Satellite
    Events. vol. 10577, pp. 172–177. Springer (2017). https://doi.org/10.1007/978-3-
    319-70407-4 32
15. Sadeghi, M., Buchnı́ček, P., Carenini, A., Corcho, O., Gogos, S., Rossi, M., Santoro,
    R., et al.: SPRINT: Semantics for PerfoRmant and scalable INteroperability of
    multimodal Transport. In: 8th Transport Research Arena TRA 2020. pp. 1–10
    (2020), http://hdl.handle.net/11311/1132635
16. Scrocca, M., Comerio, M., Carenini, A., Celino, I.: Turning transport data to com-
    ply with EU standards while enabling a multimodal transport knowledge graph.
    In: Proceedings of the 19th International Semantic Web Conference. vol. 12507,
    pp. 411–429. Springer (2020). https://doi.org/10.1007/978-3-030-62466-8 26
17. Vetere, G., Lenzerini, M.: Models for semantic interoperability in service-
    oriented architectures. IBM Systems Journal 44(4), 887–903 (2005).
    https://doi.org/10.1147/sj.444.0887