<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of
Web Semantics 65 (2020) 100596. doi:10.1016/j.websem.2020.100596.
[16] J. Arenas</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.websem.2018.06.003</article-id>
      <title-group>
        <article-title>typhon-rml: Modularised Declarative Knowledge Graph Construction for Flexible Integrations and Performance Optimisation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Grassi</string-name>
          <email>marco.grassi@cefriel.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Scrocca</string-name>
          <email>mario.scrocca@cefriel.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Carenini</string-name>
          <email>alessio.carenini@cefriel.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Celino</string-name>
          <email>irene.celino@cefriel.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cefriel - Politecnico di Milano</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>3471</volume>
      <fpage>1</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Adopting declarative approaches for constructing knowledge graphs enhances the maintainability and reusability of schemas and data transformations from diverse data sources. However, a fully declarative description requires the user to encode specific details of the data integration process within the mapping rules, including how to extract the input data from specific data sources and how to load the result into the target ones. This aspect significantly burdens the developers of mapping processors, who must adhere to the mapping language features to transform heterogeneous data formats to RDF, while also facilitating eficient access to various input and output data sources. Additionally, considering the user's point of view, a tightly coupled approach for the declaration and execution of the entire construction process can afect the flexibility of reusing mapping rules for diferent data integration scenarios. In this paper, we address these challenges and propose a method to modularise the declarative construction of knowledge graphs, allowing for a decoupled processing of input/output data sources and mapping rules. We introduce the typhon-rml library to demonstrate this approach. Focusing on RML mapping rules as input, we showcase how the tool facilitates their reuse and customisation for various integration requirements. A preliminary qualitative evaluation is conducted in the context of a relevant scenario for smart trafic management.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Declarative Mappings</kwd>
        <kwd>Knowledge Graph Construction</kwd>
        <kwd>Mapping Languages</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A declarative approach for knowledge graph construction is based on a mapping language to specify the
required schema and data transformations from various heterogeneous data sources [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a mapping
processor capable of their execution. This approach, in contrast with ad-hoc data integration pipelines,
enhances the maintainability and reusability of mapping rules. However, relying on a fully declarative
approach for describing the entire ETL process (extraction from input data sources, transformation
to RDF, loading to target data sources) can impose a growing challenge for developers of mapping
processors who aim to adhere to a specific mapping language. As also emerged from the results of the
latest implementation challenges organised within the Knowledge Graph Construction Workshop [
        <xref ref-type="bibr" rid="ref2 ref3">2,
3</xref>
        ] considering RML [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], there is great heterogeneity in mapping processors regarding not only the
coverage of language features (i.e., modules of the RML specification covered) but also the types of
data sources (e.g., local files, HTTPS services, relational databases, etc.) and formats (CSV, JSON, XML,
etc.) supported. Indeed, a great challenge in implementing mapping processors is that they must
not only support transformation capabilities defined by the mapping language specifications but also
eficient access, parsing and querying of diverse input data sources and formats. Similarly, a declarative
specification of the target data source possibly requires manipulation of the generated RDF (e.g., to
obtain a diferent serialisation) and interaction with heterogeneous, and possibly remote, data sources
to store the data. These aspects pose a relevant challenge for developers of mapping processors since
they involve selecting, integrating, and maintaining external libraries.
      </p>
      <p>Additionally, from a user’s perspective, there are potential drawbacks in a tightly coupled approach
for knowledge graph construction that expects a mapping processor to support the complete execution
of the ETL process defined by the declarative mapping language. The lack of modularisation makes
it more dificult to eficiently apply the same declarative mappings across various data integration
contexts if specific features are not supported by the selected mapping language or mapping processor.</p>
      <p>This paper addresses these issues and investigates how a modularised approach for declarative
knowledge graph construction can enable a more flexible development of data integration pipelines
and their performance optimisation. The typhon-rml tool is introduced to propose and demonstrate an
approach enabling the compilation of the input RML mapping rules to an intermediate representation
that decouples extract-load operations from the rules specifying the data and schema transformations.
We discuss how this method could support the reuse and customisation of a declarative mapping
document to meet heterogeneous data integration requirements.</p>
      <p>The remainder of the paper is organised as follows. Section 2 introduces relevant concepts and
discusses the challenges considering a concrete scenario in the context of smart trafic management.
Section 3 describes the designed approach and how this is implemented by the typhon-rml tool.
Section 4 provides a preliminary evaluation of the tool, discussing its application to the scenario previously
introduced. Finally, Section 5 draws the conclusions and presents the future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries and Challenges Addressed</title>
      <p>
        Alongside the declarative mapping rules to generate RDF triples, an RML [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] document specifies the
heterogeneous data sources to be considered as input. In an RML mapping, a rml:Source describes the
data sources to be accessed and a rml:LogicalSource how to extract data from them. From these, the
RML document specifies the declarative mapping rules to be executed on a given rml:LogicalSource
via a set of rml:TriplesMaps. These two aspects are tightly coupled, but these steps are logically
independent and do not necessarily need to be intertwined.
      </p>
      <p>
        The W3C Knowledge Graph Construction Community Group is currently working on the RML
specification to improve the decoupling between RML Core and the dedicated RML IO module 1, e.g., by
limiting the test cases associated with RML Core only to JSON inputs read from local files. Moreover,
the introduction of an RML IO-Registry2 guarantees that mapping processors can implement support
only for specific data sources and data formats. However, despite the additional modularisation, an
RML mapping processor aiming to fully comply with the specification should support the complete
data integration process. The rmlmapper-java3 implementation presented by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] showcases how the
need for supporting diferent input and output data sources required the addition of many external
dependencies that increase the size of the library and, over time, may afect the technical debt
associated with the mapping processor. Moreover, incorporating the input and output data access within the
mapping processor limits users’ ability to adapt and select technologies that may be better suited for
a specific mapping scenario. As shown in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for RML processors but also for other KG construction
engines [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], it is extremely important to consider that the libraries used to access, parse and query the
input data sources may heavily impact the performances of the overall knowledge graph construction
process.
      </p>
      <p>
        Concerning the reusability of mapping rules, a certain degree of flexibility is needed for deploying
the same mappings supporting diferent data integration pipelines [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. The need for encoding within
the mapping rules how to access data from a specific data source may complicate its reuse for other
data sources that expose data in the same format and may apply the same rules for KG construction.
For example, if a RML mapping processor only processes inputs from files, and the input data is
pro1https://w3id.org/rml/io/spec
2https://w3id.org/rml/rml-io-registry/
3https://github.com/RMLio/rmlmapper-java
vided via a message broker, it may be necessary at runtime to consume the data, save it to a file with a
specific temporary name and refer to the specific file within the mapping rules. Additionally, the
integration within the mapping processor of input/output operations hinders the opportunity to optimise
the execution of mapping rules for specific mapping scenarios. While it is true that diferent mapping
processors may provide a better solution in terms of performance in diferent cases, users often stick
with selecting a single mapping processor due, for example, to the choice of a programming language
of preference. As a result, the user is limited to the optimisations and features implemented by the
selected mapping processor. The decoupled approach proposed in this work aims to enable the
potential customisation of the data integration process to be executed given a set of RML mapping rules. By
adopting typhon-rml, the user can support any customisation allowed by Java libraries. However, we
foresee the potential implementation of a similar approach in other programming languages.
      </p>
      <p>
        To explain better the challenges addressed, we introduce a motivating example from a use case
within the SmartEdge European project4 for real-time trafic data applications in Helsinki [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
use case aims to leverage a common interoperable representation of mobility data collected from a
swarm of nodes (e.g., sensors/vehicles) that should collaborate at a given road intersection. To enable
the conversion of incoming data to a shared RDF representation, a set of RML mapping rules has been
developed to support a declarative mapping approach. In this paper, we focus on converting data
about radar observations of the incoming trafic at a given intersection. Each observation provides
a categorization of the type of vehicle detected by the radar and related metrics such as the speed
measured and the estimated position. While the development of the mapping rules happened by using
local samples of radar data in JSON format, bringing the defined mapping rules into a production
deployment poses the following challenges:
      </p>
      <p>C1 Applying the same mapping rules to an input data source adopting the same data format but
requiring diferent access mechanisms, e.g., in terms of protocol (HTTPS, WebSocket, etc.) or interaction
paradigm (pull/push, frequency, pagination, etc.). The Logical Source defined to access radar
samples from a file should be modified to support the specification of data access from a WebSocket,
but the mapping processor does not support this type of input source;
C2 Deploying the same mapping rules for multiple data sources. The same mapping rules should be
deployed in diferent Road Side Units (RSU) at each intersection and the Logical Source description
should be changed for each deployment according to the parameters of the relevant radar;
C3 Forwarding the mapping output to specific target data sources. The processed radar data should
be sent to diferent NATS 5 queues for downstream processing, but the mapping processor does
not support this type of target source;
C4 Enable custom performance optimisation dependent on the input data being considered. Adding a
TriplesMap to describe the sensor would cause the generation of the same triple multiple times
(i.e., one time for each observation received by the same radar in a single message). This would
afect performance even in the case of automatic duplication removal by the mapping processor.
Moreover, checks for non-null fields and IRI validity are performed for every triple generated in
the generic case because no assumptions can be made about the incoming data by the mapping
processor.</p>
      <p>C5 Customising the KG construction process to support specific requirements for data integration. The
mapping process should be instrumented to collect metrics and traces from each RSU on the
number of messages processed and the conversion latency to RDF. Moreover, the processed data
should be stored in a CSV format for historical data collection.</p>
      <p>In our previous work, we introduced Chimera6 [11] to configure composable semantic
transformation pipelines for KG construction. Chimera provides a modular and configurable set of building
4https://www.smart-edge.eu/2023/07/25/use-case-preventing-rear-end-collisions-by-enhancing-road-intersection-safety/
5https://nats.io/
6https://github.com/cefriel/chimera
blocks to construct and manipulate RDF graphs within a data integration pipeline defined through
the Apache Camel7 integration framework. One of Chimera’s objectives is to decouple the execution
of mapping rules for KG construction from access to input/output data sources. The underlying idea
of the typhon-rml tool presented in this paper is to automatize the generation of Chimera pipelines
from existing declarative mapping rules expressed via RML. Such an approach facilitates leveraging
already existing adapters provided by the Apache Camel ecosystem while enabling the execution of RML
mapping rules in Chimera by a more thin mapping processor focusing on applying transformations.</p>
      <p>Additionally, in the paper [12], we proposed an approach for generic knowledge conversions
between data formats derived from declarative approaches for KG construction like RML. The
mappingtemplate tool8 implements this approach and leverages a template engine to describe mapping rules
according to the Mapping Template Language (MTL) syntax. The mapping-template can process RML
mapping rules by converting them to a template-based representation using MTL, thus ofering the
possibility of introducing specific performance optimisations to the mapping rules before executing them
(e.g., reducing duplicated triples generation considering specific assumptions on the input data). The
typhon-rml tool explicitly generates an MTL version of the input RML mapping rules to support their
customisation by the user before deployment.</p>
      <p>As a related work, the RMLWeaver-JS [13] mapping processor also leverages an intermediate
representation obtained from RML at compile time. In this case, a mapping plan consisting of algebraic
mapping operators is used for optimising the execution of the mapping rules.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Design and Implementation</title>
      <p>The designed approach aims at modularizing the knowledge graph construction process, and a first
implementation is made available via the typhon-rml tool leveraging the Chimera framework for
implementing it. The typhon-rml tool is available at https://github.com/cefriel/typhon-rml.</p>
      <p>The underlying idea of the approach is that starting from an RML mapping, we can separate the
data access and the execution of mapping rules on the accessed data into distinct parts. A dedicated
component can be executed at compile time to transform the RML mapping rules into artefacts that
can be customised, if needed, to address diferent integration scenarios and to introduce performance
optimisations. At runtime, the generated artefacts are executed to perform the correct KG construction
process described declaratively in the RML file.</p>
      <p>The typhon-rml component is a Java-based solution that implements the process shown in Figure
1. The typhon-rml tool gets as input an RML mapping, and produces: (i) a Chimera data pipeline</p>
      <sec id="sec-3-1">
        <title>7https://camel.apache.org/</title>
        <p>8https://github.com/cefriel/mapping-template
(route.xml) to read the needed input data sources, execute the mapping rules on them and finally
store or send of this mapping results to a given output data source, and (ii) the mapping rules needed
to perform this mapping operation.</p>
        <p>Chimera is used to implement the data pipeline that uses the appropriate Camel component needed
to access the data source specified in the RML and write the results to the right target(s). The mapping
execution part is also executed via Chimera through the camel-chimera-mapping-template
component, which integrates the mapping-template library into Chimera. The pipeline is generated from
RML as an Apache Camel route specified declaratively via the Camel DSL in XML. Additionally, a set
of mapping rules using MTL (template.vm) are generated to execute the same transformations defined
by the input RML. At runtime, the typhon-chimera-skeleton component is provided to execute the
data pipeline and the mappings generated, producing an output that is equivalent to the one that would
have been produced by the RML mapping.</p>
        <p>Notably, we decided to apply the mapping-template, our own solution for generic knowledge
conversion between diferent data formats, to transform the input RML mapping rules at compile time.
Both generated files are produced by applying declarative MTL mappings on the input RML file. In the
former case, a router.vm9 file is used to produce an XML serialized Chimera pipeline while, in the
latter, a translator.vm10 MTL file is used to produce the RML equivalent MTL file. The router.vm MTL
mapping generates a Chimera pipeline composed of multiple Apache Camel routes. Firstly, through
a SPARQL query the distinct rml:Source in the RML file are extracted alongside their corresponding
rml:LogicalSource. This information is used to configure the Chimera pipeline that accesses data
by using the relevant Apache Camel components11. For instance, if the RML mapping declares two
distinct sources, two diferent routes are generated by typhon-rml to access both data sources. In
the generate Chimera pipeline, all accessed data are then forwarded to the
camel-chimera-mappingtemplate component, which applies the template.vm generated mapping. In this first version, we
only support the default RML behaviour of saving the generated KG to a local file.</p>
        <p>The benefit of this approach is that the generated Chimera route.xml and template.vm MTL
mappings are not executed automatically. Instead, there is an additional stage allowing the user to modify
both files. These modifications might be necessary to adapt the KG construction as described in the
motivating example and the identified related challenges (Section 2). The Chimera route.xml pipeline can
be modified by changing from where data is read, as long as this is done by using an available Apache
Camel component. For example, this might mean reading data relying on a specific binary protocol
instead of reading data from a local file. Similarly, the MTL mapping can be manually changed to
introduce performance optimisations not explicitly definable in RML. An example is an optimised join
condition that relies on external knowledge of the input data sources that can not be expressed in RML.</p>
        <p>To be executed, the generated Chimera pipeline and accompanying MTL mappings are processed
by the typhon-chimera-skeleton component. This component is a Java template project with the
needed dependencies to run Chimera pipelines. The typhon-chimera-skeleton project can manually
be modified to include additional Java dependencies needed for Chimera route customizations.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Preliminary Evaluation</title>
      <p>We discuss a qualitative evaluation of typhon-rml by applying it to the motivating example introduced
in Section 2. All the artefacts mentioned in this section are made available in the tool repository.</p>
      <p>Each radar in the considered environment records the position of detected vehicles and their
estimated speed alongside a timestamp for when the observation was made. In the RML mapping shown
in Listing 1, the SOSA ontology [14] is used to define each radar as a sosa:Sensor, which generates
sosa:Observations related to the vehicle that the radar is detecting. The detected vehicle is classified
according to the type of vehicle detected and as a sosa:FeatureOfInterest to which the measured</p>
      <sec id="sec-4-1">
        <title>9https://github.com/cefriel/typhon-rml/blob/main/typhon-rml/src/main/resources/router.vm 10https://github.com/cefriel/typhon-rml/blob/main/typhon-rml/src/main/resources/typhon-rml-compiler.vm 11https://camel.apache.org/components/4.10.x/index.html</title>
        <p>properties of position and speed are attached. The RML mapping assumes that the observations are
available as a local JSON file called data.json.</p>
        <p>This RML mapping is used as input for typhon-rml, which produces the XML Chimera pipeline
shown in Listing 2 and the MTL mapping equivalent to the input RML mapping. The mapping
illustrates the typical iterative process of mapping development, where representative data from a data
source is stored locally and used for mapping development on that sample. However, to deploy the
mappings in a production environment like the one of our example, the data should be fetched from
the data source itself and the same mapping rules should be applied to diferent radars in each RSU.
For RML, this means having a copy of the development mapping and changing the LogicalSource to
relfect the actual data source, while checking that the selected mapping processor supports it. Following
the typhon-rml approach, the pipeline can also be configured to rely on external libraries not directly
supported by the mapping processor, and the MTL mapping remains unchanged.</p>
        <sec id="sec-4-1-1">
          <title>Listing 1: Declaration of an RML LogicalSource</title>
          <p>in an RML mapping</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Listing 2: Corresponding typhon-rml generated Chimera pipeline</title>
          <p>@prefix rml: &lt;http://w3id.org/rml/&gt; .
@prefix sosa: &lt;http://www.w3.org/ns/sosa/&gt; .
@prefix xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt; .
@prefix ex: &lt;http://example.org/ns#&gt; .
@prefix geo: &lt;http://www.w3.org/2003/01/geo/wgs84_pos#&gt; .
&lt;http://example.com/base/LogicalSource&gt; a rml:LogicalSource;
rml:iterator "$[*]";
rml:referenceFormulation rml:JSONPath;
rml:source [ a rml:RelativePathSource;
rml:root rml:MappingDirectory;
rml:path "data.json"
] .
&lt;http://example.com/base/TriplesMap1&gt; a rml:TriplesMap;
rml:logicalSource &lt;http://example.com/base/LogicalSource&gt;;
rml:subjectMap [
rml:template "http://example.org/observation/</p>
          <p>{message_id}/{timestamp}" ;
rml:class sosa:Observation
];
rml:predicateObjectMap [
rml:predicate sosa:hasFeatureOfInterest ;
rml:objectMap [
rml:template "http://example.org/location/{lat}/{lon}" ;
rml:class sosa:FeatureOfInterest</p>
          <p>The identified challenge C1 is thus addressed by the typhon-rml because the user can edit the
generated Chimera pipeline to read data from a WebSocket connection instead of reading from a local
ifle. To achieve this, the route where the data is being read can be modified as shown in the modified
route example12. Additionally, this manual intervention also solves challenge C2 because the radar to
be considered by the Chimera route is specified in the WebSocket connection string and can be possibly
read via environment variables. The challenge C3 is also similarly addressed as the NATS target can
be specified as a step of the Chimera pipeline where the mapped data will be sent. This can be done
by adding the Camel NATS component as the last step of pipeline as shown in the modified route
example13. These examples show how the adoption of the Apache Camel framework simplifies the
customisation of the pipelines for additional requirements by leveraging already existing components.
Nevertheless, also custom components can be developed and integrated within a pipeline.</p>
          <p>Additionally, the generated MTL file may be leveraged to customise the transformations to be
executed and improve the overall performance. In the considered example, we can remove checks for
non-null values and IRI validity since we know the specific format of the messages received. Moreover,
12https://github.com/cefriel/typhon-rml/blob/bf4c513fde22c4b26c4218c21ce279caa95e755d/typhon-rml/example/
route-modified.xml#L11
13https://github.com/cefriel/typhon-rml/blob/bf4c513fde22c4b26c4218c21ce279caa95e755d/typhon-rml/example/
route-modified.xml#L33
we may generate RDF triples describing a radar as a sosa:Sensor performing a set of observations only
for distinct values in the same message (C4). The considered examples have minor performance impact,
however, similar optimisations may have a greater efect, considering, for instance, cases that involve
joins between diferent data sources and that can be optimised in the MTL file (cf. the shapes.txt case
in the GTFS-Madrid Benchmark [15, 16]).</p>
          <p>Finally, considering the challenge C5, the user may adapt the pipeline to insert a monitoring
component in the Chimera pipeline which can be used to observe and manage the KG construction process of
each RSU. Moreover, the generated pipeline can be modified according to more specific requirements
for data integration. To obtain historical data for CSV collection, the MTL file generated from the RML
specification can be modified to generate a CSV output 14 and obtain a ready-to-be-deployed artefact for
this requirement. Similarly, the mappings can be customised to cover features RML mapping processor
of choice (e.g., access to specific input data sources).</p>
          <p>The current design of typhon-rml aims to support also the processing within Chimera pipeline of
RML mapping rules without requiring a conversion to MTL and by leveraging directly an RML
mapping processor. However, this approach requires modifications to the mapping engine to support the
execution of mapping rules on Logical Sources provided dynamically. A related discussion is happening
within the W3C Community Group for Knowledge Graph Construction15 and could enable support for
typhon-rml considering existing RML mapping processors.</p>
          <p>Currently, the mapping-template provides coverage for the RML Core specification as
demonstrated by its execution against the test cases for the KGCW Challenge 2024. The typhon-rml supports
decoupled execution of the same set of RML Core test cases, thus including reading from CSV, JSON,
XML local files and from MySQL and Postgres databases. The configuration to run the tool and the
generated Chimera route and MTL file for each test case are made available in the online repository 16.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The declarative knowledge graph construction process can benefit from a decoupled approach
separating input and output operations from the execution of the mapping rules. In this paper, we
highlighted the challenges of a fully declarative and integrated approach, considering a motivating example
for smart trafic management. The typhon-rml tool is proposed to implement a decoupled approach
leveraging Chimera and the mapping-template tool to enable flexible development and performance
optimisation of RML mapping rules. The advantages of such an approach have been discussed in the
context of the proposed motivating example, showcasing how the same mapping rules can be easily
modified to address diferent integration requirements and adapted to optimise data access according
to the specific mapping scenario considered.</p>
      <p>As future work, we plan to extend the compatibility of typhon-rml according to the test cases
defined by the RML-IO specification and considering the RML-IO Registry. Moreover, we foresee a
similar solution to support the definition of Chimera pipelines not only considering RML mapping
rules but also from metadata describing a data source within a data catalogue (e.g., using DCAT). In
this case, RML may simply refer to a data source, and typhon-rml may fetch its metadata to compose
the correct pipeline to access the data source. Finally, we believe that the challenges highlighted and
discussed in this paper may be relevant to the Knowledge Graph Construction community, opening
the door to additional research contributions to enable a more decoupled management of input/output
operations for existing mapping processors.
14https://github.com/cefriel/typhon-rml/blob/651f7609d04f36bafb9a2b94cbb4777663438b1a/example/template-modified.</p>
      <p>vm#L14
15cf. https://github.com/kg-construct/rml-io-registry/issues/10
16cf. https://github.com/cefriel/typhon-rml/tree/main/evaluation</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The presented research was partially supported by the SmartEdge project, funded under the Horizon
Europe RIA Research and Innovation Programme (Grant Agreement 101092908).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly to improve grammar, check
spelling, and reword. After using these tool(s)/service(s), the author(s) reviewed and edited the content
as needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          , G. Haesendonck,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review</article-title>
          ,
          <source>Web Semant</source>
          .
          <volume>75</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1016/j.websem.
          <year>2022</year>
          .
          <volume>100753</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Serles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          , Preface, in
          <source>: Proceedings of the 4th International Workshop on Knowledge Graph Construction co-located with 20th Extended Semantic Web Conference</source>
          , volume
          <volume>3471</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR, Hersonissos, Greece,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3471</volume>
          /#preface, iSSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Serles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          , Preface, in
          <source>: Proceedings of the 5th International Workshop on Knowledge Graph Construction</source>
          , volume
          <volume>3718</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR, Hersonissos, Greece,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3718</volume>
          /#preface, iSSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>The RML Ontology: A Community-Driven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data to RDF</article-title>
          , in: T. R.
          <string-name>
            <surname>Payne</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            , G. Qi,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poveda-Villalón</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kaoudi</surname>
          </string-name>
          , G. Cheng, J.
          <source>Li (Eds.)</source>
          ,
          <source>The Semantic Web - ISWC 2023, Lecture Notes in Computer Science</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>175</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 47243- 5\_9.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Haesendonck</surname>
          </string-name>
          , G. De Mulder,
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Leveraging Web of Things W3C Recommendations for Knowledge Graphs Generation</article-title>
          , in: Web Engineering, Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>352</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Scrocca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Carenini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Comerio</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Celino</surname>
          </string-name>
          ,
          <article-title>Semantic Conversion of Transport Data Adopting Declarative Mappings: An Evaluation of Performance and Scalability</article-title>
          , in: D.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Scrocca</surname>
          </string-name>
          , M. Comerio (Eds.),
          <source>Proceedings of the 3rd International Workshop Semantics And The Web For Transport</source>
          , volume
          <volume>2939</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR, Online, September,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2939</volume>
          /#paper2, iSSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>García-González</surname>
          </string-name>
          ,
          <article-title>Optimising the ShExML engine through code profiling: From turtle's pace to state-of-the-art performance</article-title>
          ,
          <source>Semantic Web</source>
          <volume>16</volume>
          (
          <year>2025</year>
          )
          <article-title>SW-243736</article-title>
          . URL: https://journals.sagepub. com/doi/abs/10.3233/SW-243736. doi:
          <volume>10</volume>
          .3233/SW- 243736.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tailhardat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <string-name>
            <surname>Designing</surname>
            <given-names>NORIA</given-names>
          </string-name>
          :
          <article-title>a Knowledge Graph-based Platform for Anomaly Detection and Incident Management in ICT Systems</article-title>
          , in: D.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Serles</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          <string-name>
            <surname>Assche</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 4th International Workshop on Knowledge Graph Construction co-located with 20th Extended Semantic Web Conference</source>
          , volume
          <volume>3471</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR, Hersonissos, Greece,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3471</volume>
          /paper3.pdf, iSSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Anicic</surname>
          </string-name>
          , et al.,
          <source>SmartEdge project Deliverable D3</source>
          .
          <article-title>1 - Design of Tools for Continuous Semantic Integration</article-title>
          ,
          <year>2023</year>
          . URL: https://www.smart-edge.eu/deliverables/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kosonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Koskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kostiainen</surname>
          </string-name>
          ,
          <article-title>Real-time trafic data applications in the mobility lab of helsinki-case smart junction</article-title>
          , in: ITS World Congress,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>