<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>SDM-RDFizer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrique Iglesias</string-name>
          <email>iglesias@l3s.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria-Esther Vidal</string-name>
          <email>maria.vidal@tib.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graph Creation, Data Integration System, RDF Mapping Languages</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leibniz University of Hannover</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Portorož'25: Sixth International Workshop on Knowledge Graph Construction</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TIB Leibniz Information Centre for Science and Technology</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>in various formats</institution>
          ,
          <addr-line>including CSV, JSON, and XML files. However, RML supports CSV, JSON, and XML files</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In recent years, knowledge graphs (KGs) have grown in popularity. Major companies have incorporated them into their products to enhance the user experience. Consequently, numerous methods have emerged to address KG construction. The RDF Mapping Language (RML) is an extension of the W3C mapping language recommendation R2RML. RML allows for the declarative definition of KG structures based on unified ontologies and data sources Furthermore, RML has additional functionalities, such as executing functions, using quoted triples, working with collections, and creating logical views. Thus, RML has become a standalone mapping language separate from R2RML. The KGCW 2025 Challenge defines a dataset comprising a series of test cases that cover all RML functionalities. These test cases evaluate the compliance of state-of-the-art KG creation engines. This paper reports on the conformance evaluation of SDM-RDFizer executing this dataset, highlighting its strengths and areas for improvement to enhance performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge graphs (KGs) have become increasingly commonplace in recent years, driven by the
significant growth in daily data generation. Major companies such as Google, Netflix, Amazon, and Microsoft
use KGs to create relationships between products, concepts, and themes to enhance the user experience
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As a result, multiple methods and mapping languages have been developed to support KG creation.
One such mapping language is the RDF Mapping Language (RML) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. RML was introduced initially
as an extension of R2RML, which focused solely on relational databases. RML expanded support to
additional data source formats, including CSV, JSON, and XML. Both RML and R2RML adhere to the rules
established by the Resource Description Framework (RDF)1. Over time, RML has incorporated additional
functionalities such as the definition and execution of value transformation functions (RML+FnO [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]),
support for RDF-Star (RML-Star [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), transformation of collections and containers (RML-CC2), and
projection and transformation of data sources (RML-LV3). The integration of these extensions has
allowed RML to evolve into a stand-alone mapping language with its own specification 4
      </p>
      <p>The dataset used in the KGCW 2025 Challenge defines an informative set of test cases that cover core
functionalities of RML, including the execution of joins, duplicate handling, blank nodes, and empty
values. Additionally, the dataset includes test cases for all current RML extensions (e.g., RML-Star,
RML-CC, RML-LV). The challenge aims to evaluate the level of compliance of existing KG creation
engines with the latest RML specification. This report builds upon the results achieved in the KGCW
CEUR</p>
      <p>
        ceur-ws.org
2024 Challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by incorporating evaluations of newly introduced test cases (e.g., RML-LV) and
addressing cases that were previously incomplete. It presents the modifications applied to SDM-RDFizer
to ensure full compliance with the updated specification and provides the outcomes of executing it
against the 2025 challenge dataset.
      </p>
      <p>This paper is organized into three additional sections. Section 2 provides an overview of SDM-RDFizer,
including its techniques, data structures, and physical operators used to optimize KG creation. Section 3
presents the results of the challenge, including a detailed description of the test cases and the required
updates for execution. Finally, Section 4 ofers concluding remarks and outlines future directions for
SDM-RDFizer.</p>
    </sec>
    <sec id="sec-2">
      <title>2. SDM-RDFizer</title>
      <p>
        SDM-RDFizer [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ] is a KG creation engine capable of transforming structured (i.e., CSV and relational
databases) and semi-structured data (i.e., JSON and XML) into RDF triples, following defined mapping
rules. It supports the latest RML specification.
      </p>
      <p>SDM-RDFizer adopts a two-fold approach to KG creation, consisting of two main modules: Triples
Maps Planning (TMP) and Triples Maps Execution (TME). Each module plays a distinct role in
the KG creation process and employs a set of data structures and operators to handle various aspects
such as join execution and duplicate removal. TMP determines the execution order of RML Triples
Maps (TMs) with the goal of minimizing memory usage. TME then generates the KG according to the
execution plan established by TMP.</p>
      <p>To interpret RML TMs, SDM-RDFizer uses a SPARQL-based parser. This parser employs four SPARQL
queries: the main query extracts core mapping information, such as the logical source, rml:subjectMap,
rml:predicateObjectMap, join conditions, logical dumps, function calls, etc. Meanwhile, the remaining
queries handle nested structures like collections, functions, and fields for logical views.
To support the transformation of diferent types of TMs, SDM-RDFizer implements several specialized
operators. The Simple Object Map (SOM) operator executes rml:template and rml:reference; the Object
Reference Map (ORM) operator handles parent triples maps; and the Object Join Map (OJM) operator
executes joins. For duplicate detection, SDM-RDFizer uses hash tables known as Predicate Tuple
Tables (PTTs). Each generated triple is compared against the corresponding PTT; if it already exists, it
is discarded as a duplicate. Otherwise, the triple is added to both the PTT and the KG. A Dictionary
Table (DT) is used to compress the resources stored in PTTs. For join operations, SDM-RDFizer
caches the results in a structure called the Predicate Join Tuple Table (PJTT) to avoid redundant join
computations.</p>
      <p>
        For the KGCW 2024 Challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], SDM-RDFizer was extended to support the latest RML modules,
including RML-FNML, RML-Star, and RML-IO. In the case of RML-IO, SDM-RDFizer was enhanced to
accept new input types such as compressed files (e.g., ZIP, TAR), SPARQL endpoints, and remote data
sources. It also gained the ability to compress the generated KG into various formats, split triples into
multiple files, and export in diferent RDF serializations (e.g., JSON-LD, RDF/XML, Turtle).
To support RML-FNML, a new operator was introduced to execute functions during data transformation,
incorporating strategies from FunMap [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. FunMap is a TM translator that replaces TMs containing
functions with equivalent TMs by executing the functions and transforming the input data accordingly.
SDM-RDFizer applies the same principle, enabling real-time data transformation through function
execution.
      </p>
      <p>For RML-Star, a new operator was developed to handle the rml:quotedTriplesMap construct introduced in
this module. Additionally, PJTT was extended to store joins that may occur in either the rml:subjectMap
or the rml:objectMap.</p>
      <p>Finally, SDM-RDFizer was further enhanced to support the RML-LV and RML-CC modules, which
are discussed later in this report. The version of the SDM-RDFizer used is 4.7.5.12.2. SDM-RDFizer is
publicly available on GitHub 5.
5https://github.com/SDM-TIB/SDM-RDFizer</p>
    </sec>
    <sec id="sec-3">
      <title>3. KGCW 2025 Challenge Test Cases</title>
      <p>The KGCW 2025 Challenge 6 aims to assess the compliance of existing KG creation engines with the
latest RML specification. The dataset used in this challenge is a refined version of the KGCW 2024
Challenge dataset 7, with a greater focus on cases that test the core functionality of the updated RML
specification, along with new test cases (e.g., RML-LV). The dataset is composed of seven sets of test
cases:
• RML-Core: This set includes basic test cases originally defined in the RML test suite 8 to evaluate
the compliance of KG creation engines. While the 2024 dataset used CSV, JSON, XML files, and
relational databases (MySQL and PostgreSQL) as data sources, the 2025 dataset focuses solely
on JSON files. The remaining formats have been redistributed across the RML-IO and
RML-IORegistry modules. This revision aims to produce a more representative and concise set of test
cases while reducing the requirements for achieving compliance with RML-Core.
• RML-FNML: This set includes test cases that apply functions to transform data using a set of
predefined operations 9.
• RML-Star: This set contains test cases for RDF-Star 10, adapted from the RML-Star test suite 11
to conform to the updated specification.
• RML-IO: Formerly part of a unified module (which is now divided into RML-IO and
RML-IORegistry) in the 2024 dataset, this set now includes a wide range of remote data sources such as
endpoints, compressed files, and JSON and XML files 12. It also defines output specifications for
various formats, including Turtle, RDF/JSON, JSON-LD, and compressed formats like ZIP and
TAR.
• RML-CC: This set includes test cases that cover collections and containers 13.
• RML-LV: This set features test cases where data sources are generated through projection and
joins across multiple sources, including mixtures of diferent data formats 14.</p>
      <p>This work presents the results of executing all the modules with SDM-RDFizer. Table 1 shows the total
number of test cases in the dataset and which cases were passed and failed by SDM-RDFizer. The full
results are available on GitHub 15.</p>
      <sec id="sec-3-1">
        <title>3.1. Results of RML-Core</title>
        <p>RML-Core consists of test cases designed to validate fundamental RML functionalities, including the
definition of classes, the use of rml:template and rml:reference, the execution of parent triples maps and
joins, and the handling of data types, language tags, and named graphs. In the 2024 challenge, this
module utilized a range of data sources such as CSV, JSON, XML, and relational databases. For the 2025
challenge, however, the dataset has been streamlined to focus exclusively on JSON files.
Given this focus, one of the primary challenges lies in correctly navigating nested JSON structures. To
address this, SDM-RDFizer implements a recursive traversal mechanism that locates the relevant data
by descending through the JSON hierarchy according to the specified iterator.
6https://zenodo.org/records/14970817
7https://zenodo.org/records/10973433
8https://kg-construct.github.io/rml-core/test-cases/docs/
9https://kg-construct.github.io/rml-fnml/test-cases/docs/
10https://kg-construct.github.io/rml-star/test-cases/docs/
11https://zenodo.org/records/6518802
12https://kg-construct.github.io/rml-io/test-cases/docs/
13https://kg-construct.github.io/rml-cc/test-cases/docs/
14https://kg-construct.github.io/rml-lv/test-cases/docs/
15https://github.com/SDM-TIB/SDM-RDFizer/tree/master/kgcw_2025_challenge</p>
        <p>Module
RML-Core
RML-FNML
RML-Star</p>
        <p>RML-IO
RML-IO-Registry</p>
        <p>RML-CC
RML-LV</p>
        <p>Total</p>
        <p>SDM-RDFizer employs a parser query to process the input mappings. To comply with the latest RML
specification, the parser has been updated to adopt the new rml namespace, remove references to the
deprecated R2RML namespace, and revise the rml:logicalSource definition. This includes support for
rml:path and rml:root, and replaces rml:query with rml:iterator.</p>
        <p>SDM-RDFizer successfully executed 58 of the 59 test cases in the RML-Core module. RMLTC0019b-JSON
is deemed an incorrect test case since it presents an empty expected output when it should generate
triples. This test case significantly overlaps with RMLTC0019a-JSON, and in this test case, the expected
output contains triples. This test case is being reviewed 16.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results of RML-FNML</title>
        <p>
          RML-FNML includes test cases that apply functions for value transformations, such as replacing,
concatenating, or changing the case of strings, based on the RML+FnO specification. SDM-RDFizer
supports these transformations by leveraging FunMap [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a tool that rewrites TMs containing functions
into equivalent ones that reflect the function results. SDM-RDFizer also implements a dedicated operator
that executes these functions on the fly, ensuring that the values incorporated into the KG are those
produced by the function maps.
        </p>
        <p>The parser query is extended to recognize function maps. An additional parser query is used to isolate
functional maps in the case of nested functions. This enables proper handling of nested structures, as
each function map is extracted and executed individually. SDM-RDFizer dynamically resolves function
dependencies when one function receives the output of another.</p>
        <p>SDM-RDFizer successfully completes 16 out of 17 test cases in this set. RMLFNMLTC0001-CSV involves
a function that generates a random UUID, making it highly unlikely to match the expected output exactly.
Nevertheless, the test is considered correct if a UUID is generated. In contrast, RMLFNMLTC0041-CSV is
regarded as an incorrect case, as the function used in its triples map cannot produce the value presented
in the expected output.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results of RML-Star</title>
        <p>RML-Star includes test cases based on RDF-Star, an RDF extension that introduces quoted triples, which
allow a triple to be used as the subject or object of another triple. This capability supports the expression
of metadata about statements—such as provenance, certainty, or attribution. To accommodate this
feature, RML-Star introduces the rml:quotedTriplesMap construct for defining quoted triples within a
KG. SDM-RDFizer supports this by implementing a dedicated operator capable of generating quoted
triples, including recursively nested ones.</p>
        <p>A particular challenge addressed in this module is the execution of joins in the rml:subjectMap, in addition
to the more common use in the rml:objectMap. SDM-RDFizer handles both scenarios consistently using
its OJM operator for join processing and PJTT for managing intermediate results. The parser was also
extended to recognize and process rml:quotedTriplesMap.
RML-Star introduces a new type of TM, rml:NonAssertedTriplesMap. This type of TM is designed solely
for generating quoted triples and does not contribute asserted triples to the KG. SDM-RDFizer treats
these mappings as auxiliary, invoking them only when needed to produce quoted triples, and ignoring
them otherwise.</p>
        <p>SDM-RDFizer successfully executes all 18 test cases in this module.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Results of RML-IO and RML-IO-Registry</title>
        <p>RML-IO and RML-IO-Registry modules comprise test cases that cover a wide range of input data source
formats, including compressed files, JSON and XML documents, and data extracted from SPARQL
endpoints. These modules also test the ability to generate the KG in various RDF serialization formats,
such as Turtle, RDF/XML, and JSON-LD, and to compress outputs into formats like ZIP and TAR. Some
cases introduce the concept of directing specific triples to designated output files. These modules aim
to assess the capacity of KG creation engines to manage diverse input types and output requirements.
Initially, the 2024 dataset included such test cases under RML-Core, but in the 2025 dataset, they are
handled by these two modules.</p>
        <p>RML-IO-Registry focuses on specialized source specifications and reference formulations, including
datatype mappings, diferent encodings, and handling inputs containing comments.
To extract data from diverse sources, SDM-RDFizer utilizes several Python libraries: csv for CSV files,
json for JSON, xml for XML, mysql-connector for MySQL, psycopg2 for PostgreSQL, and pyodbc for SQL
Server.</p>
        <p>SDM-RDFizer uses the requests library to retrieve remote data. It employs the SPARQLWrapper library
for SPARQL endpoints to execute queries and format results similarly to CSV. Compressed input files
are downloaded locally and decompressed using appropriate libraries (e.g., zip for ZIP files). Output
ifles are serialized into RDF formats using the rdflib library.</p>
        <p>Some test cases explore the use of alternative output destinations for specific triples. These outputs
can be defined within various mapping components, such as rml:subjectMap, rml:predicateMap, or
rml:objectMap. Depending on the component, SDM-RDFizer directs the relevant triples to a specified
output file. For example, if an alternate output is declared in the subjectMap, all triples from that
mapping are sent to the alternative file. When defined in the predicateObjectMap, only the relevant
triples to that predicate are redirected. Triples without an alternate destination go to the default output
ifle set at execution. SDM-RDFizer prioritizes the creation of these auxiliary files and can compress
them when needed.</p>
        <p>SDM-RDFizer successfully executed 64 out of 73 test cases in the RML-IO module.
Test cases RMLSTC0006a and RMLSTC0006b contained blank spaces in the table headers used as data
sources, leading to mismatches between the column names expected in the TMs and those present in
the input 17. Once these blank spaces were removed, SDM-RDFizer successfully executed both cases.
They are therefore considered passed.</p>
        <p>Test case RMLSTC0009a is expected to raise an error because the CSV input file contains column names
enclosed in quotes, which is syntactically incorrect. However, the engine successfully produces an
output since SDM-RDFizer is implemented in Python, and Python interprets these values without issue.
It remains unclear whether this case should be marked as passed or failed. For now, it is considered
passed until an oficial clarification is provided.</p>
        <p>Test cases RMLTTC0004f and RMLTTC0004g specify incorrect file extensions for their logical target
dumps 18. Although SDM-RDFizer produces the correct output format, the wrong file extensions could
cause confusion. These cases are also considered passed.</p>
        <p>Finally, RMLTTC0002f, RMLTTC0002g, RMLTTC0002h, RMLTTC0002i, RMLTTC0002k, RMLTTC0002l,
RMLTTC0002m, RMLTTC0002n, and RMLTTC0002r are considered incorrect and are being review 19
17https://github.com/kg-construct/rml-io/issues/134
18https://github.com/kg-construct/rml-io/issues/136
19https://github.com/kg-construct/rml-io/issues/129, https://github.com/kg-construct/rml-io/issues/130, https://github.com/
kg-construct/rml-io/issues/131
due to incorrect graph assignments in the expected output.</p>
        <p>SDM-RDFizer successfully executed 70 out of 103 test cases in the RML-IO-Registry module.
Test case RMLIOREGTC0003e failed due to a missing ”@” symbol at the beginning of the triples map,
resulting in a syntax error. Once corrected, SDM-RDFizer executes the case successfully, and it is
therefore considered passed.</p>
        <p>RMLIOREGTC0003d is considered incorrect and is being reviewed 20since the defined namespace is not
reflected correctly in the XML source. Thus, the iterator of the triples map is wrong.
RMLIOREGTC0004k, RMLIOREGTC0004l, RMLIOREGTC0005k, RMLIOREGTC0005l,
RMLIOREGTC0006k, and RMLIOREGTC0006l are considered incorrect since the source query
does not extract all the values needed for the triples map. Additionally, the triples require the ”IDs”
column, which doesn’t exist in the database tables. These test cases are under review 21.
RMLIOREGTC0004o, RMLIOREGTC0004t, RMLIOREGTC0004w, RMLIOREGTC0005o,
RMLIOREGTC0005t, RMLIOREGTC0005w, RMLIOREGTC0006o, RMLIOREGTC0006t, and
RMLIOREGTC0006w are considered incorrect and are under review 22 since the expected outputs express
the values of the property ”amount” in scientific notation, while the values in the database are in the
standard decimal form. For example, ”20.0” is expressed as ”2.0E1” in the expected output. It is unclear
if the SDM-RDFizer should convert the values from the database into their corresponding scientific
notation or leave them as they are, since both are equivalent.</p>
        <p>Test cases RMLIOREGTC0004y and RMLIOREGTC0006y expect the property ”paid” to be represented
as a boolean. However, when the source data is stored in MySQL or SQL Server, boolean values are
expressed as integers ”1” or ”0”. As a result, the extracted values are interpreted as integers, not
booleans, causing a mismatch.</p>
        <p>RMLIOREGTC0004z, RMLIOREGTC0005z, and RMLIOREGTC0006z are considered incorrect and are
currently under review 23. These test cases expect byte string outputs; however, the values are encoded
diferently when uploaded to the various databases, making it impossible to reproduce the expected
results exactly.</p>
        <p>Test case RMLIOREGTC0011a is considered incorrect because its SPARQL query is incomplete; it lacks
the necessary prefix definitions. Additionally, the prefix declarations in the mapping end with ”;”
instead of ”.” as Turtle syntax requires. This case is currently under review 24.</p>
        <p>RMLIOREGTC0012b, RMLIOREGTC0012c, RMLIOREGTC0012d, RMLIOREGTC0012e,
RMLIOREGTC0012f, RMLIOREGTC0012g, RMLIOREGTC0012h, and RMLIOREGTC0012i are incorrect and
are under review 25 since the mappings of all test cases are in RML’s old formulation.
Test cases RMLIOREGTC0009a and RMLIOREGTC0010a require access to Kafka and MQTT streams,
respectively, to extract JSON input. A new version of SDM-RDFizer is currently being developed to
support these streaming sources, but it is not publicly available.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Results of RML-CC</title>
        <p>The RML-CC module contains test cases designed to generate RDF collections and containers. It
introduces the terms rml:gather and rml:gatherAs into RML. The rml:gather term specifies which data
elements should be grouped into a collection, while rml:gatherAs defines the type of RDF container or
collection to be used (e.g., rdf:Alt, rdf:Bag, rdf:List, or rdf:Seq). The gathered data may consist of literals,
IRIs, or results from join operations. In some cases, nested collections are formed—for example, an
rdf:Bag might contain multiple rdf:List instances. The rml:gather term can be used within both the
rml:subjectMap and rml:objectMap. Properly handling blank nodes is essential in this module, as they
are intermediate nodes linking elements within a collection.
SDM-RDFizer extends its parser query to support rml:gather and rml:gatherAs, and introduces additional
logic to retrieve and manage nested collections. A new operator is implemented to collect data and
generate the corresponding triples based on the specified collection type. This operator supports
recursive application, enabling the construction of nested structures. When the collected data originates
from join operations, SDM-RDFizer utilizes the PJTT data structure to store the intermediate results,
which are accessed as needed during collection generation.</p>
        <p>SDM-RDFizer successfully executed all 35 test cases from this module.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Results of RML-LV</title>
        <p>RML-LV is a set of test cases that apply RML logical views to input data. This module is distinctive in
that it focuses on generating new input data by combining and projecting existing sources and using
various types of joins (e.g., inner and left joins). The output of a logical view is expected to be flat,
meaning that the projected data must not retain any nested structures, even when the original input is a
JSON file. RML-LV extends the definition of a TM’s logicalSource by introducing the terms rml:viewOn
and rml:field . The rml:viewOn term specifies the source of data for the logical view, while rml:field
defines which values are extracted and their corresponding names. The fields of a Logical view may
also be nested. RML-LV supports data format intermixing; for example, a JSON structure may contain
values resembling CSV content.</p>
        <p>SDM-RDFizer extends its parser to support logical views and implements a separate mechanism to
extract nested views. A new operator is introduced and executed at the TMP module level, unlike
the other operators executed at the TME module level; it produces raw values instead of RDF triples.
Following the same design philosophy as with other data source formats, SDM-RDFizer ensures that
each logical view is generated only once per execution. This operator can also be executed recursively to
process nested logical views. When executing joins, SDM-RDFizer uses a modified version of the PJTT
data structure, which stores raw values instead of RDF entities. In the case of data format intermixing,
these values are extracted and loaded into memory as independent files of the corresponding type
(e.g., JSON structures within a CSV file). Finally, all projected data is flattened to eliminate any nested
structures.</p>
        <p>SDM-RDFizer successfully executed all 32 test cases of this module.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>The KGCW 2025 Challenge dataset evaluates the compliance of state-of-the-art engines with the new
RML formulation. It comprises 337 test cases across seven modules: RML-Core, RML-IO,
RML-IORegistry, RML-FNML, RML-Star, RML-CC, and RML-LV. SDM-RDFizer successfully executed 293 of 337
test cases, covering all proposed modules. The remaining test cases are currently under review, given
that they have errors that need addressing and two test cases will be addressed in a future release. The
SDM-RDFizer is fully RML compliant.</p>
      <p>To achieve this, SDM-RDFizer introduced a new parsing query and separate queries for the extraction
of nested functions, collections, and views, the extension of existing data structures for the proper
handling of the new operators, an operator for the execution of functions on the fly, an operator for
generating quoted triples, an operator for the generation of RDF collection, and an operator for the
creation of logical views. Moving forward, the authors will extensively test the operators implemented
to comply with RML-CC and RML-LV (especially the flattening logical views) to determine that all
border cases are appropriately covered. Additionally, the operator for the execution of RML-Star is
currently being redesigned to improve the handling of intermediate results and remove redundant code.
Furthermore, a new module is being developed to extract data from Kafka and MQTT streams. Finally,
an improved mapping parser is planned to remove the reliance on ever more complicated parsing
queries.
This work was supported by the “Leibniz Best Minds: Programme for Women Professors”, through
funding of the “TrustKG-Transforming Data in Trustable Insights” project (Grant P99/2020), and by the
Lower Saxony Ministry of Science and Culture (MWK) with funds from the Volkswagen Foundation’s
zukunft.niedersachsen program (CAIMed - Lower Saxony Center for AI and Causal Methods in Medicine;
GA No. ZN4257).</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          ,
          <article-title>Industry-scale knowledge graphs: lessons and challenges</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>62</volume>
          (
          <year>2019</year>
          )
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . URL: https://doi.org/10.1145/3331166. doi:
          <volume>10</volume>
          .1145/3331166.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. Van de Walle,
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          ,
          <source>in: Workshop on Linked Data on the Web</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Mannens,</surname>
          </string-name>
          <article-title>An ontology to semantically declare and describe functions</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          , Ó. Corcho,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Rml-star: A declarative mapping language for rdf-star generation</article-title>
          , in: O.
          <string-name>
            <surname>Seneviratne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
          </string-name>
          , L. Etcheverry (Eds.),
          <source>Proceedings of the ISWC 2021 Posters</source>
          ,
          <article-title>Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 20th International Semantic Web Conference (ISWC</article-title>
          <year>2021</year>
          ), Virtual Conference,
          <source>October 24-28</source>
          ,
          <year>2021</year>
          , volume
          <volume>2980</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2980</volume>
          /paper374.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <article-title>Results for knowledge graph creation challenge 2024: Sdm-rdfizer</article-title>
          , in: D.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Serles</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          <string-name>
            <surname>Assche</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 5th International Workshop on Knowledge Graph Construction co-located with 21th Extended Semantic Web Conference (ESWC</source>
          <year>2024</year>
          ), Hersonissos, Greece, May
          <volume>27</volume>
          ,
          <year>2024</year>
          , volume
          <volume>3718</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3718</volume>
          /paper12.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>SDM-RDFizer: An RML Interpreter for the Eficient Creation of RDF Knowledge Graphs</article-title>
          , in: CIKM,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/ 3340531.3412881.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Collarana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Chaves-Fraga, Empowering the sdm-rdfizer tool for scaling up to complex knowledge graph creation pipelines1</article-title>
          ,
          <source>Semantic Web</source>
          <volume>16</volume>
          (
          <year>2025</year>
          )
          <article-title>SW243580</article-title>
          . URL: https://journals.sagepub.com/doi/abs/10.3233/SW-243580. doi:
          <volume>10</volume>
          .3233/SW- 243580. arXiv:https://journals.sagepub.com/doi/pdf/10.3233/SW-243580.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          , Ó. Corcho, Funmap:
          <article-title>Eficient execution of functional mappings for knowledge graph creation</article-title>
          , in: J.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A. M.</given-names>
            <surname>Tamma</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          , L. Kagal (Eds.),
          <source>The Semantic Web - ISWC 2020 - 19th International Semantic Web Conference</source>
          , Athens, Greece, November 2-
          <issue>6</issue>
          ,
          <year>2020</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>12506</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>276</fpage>
          -
          <lpage>293</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -62419-4_
          <fpage>16</fpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 62419- 4\_
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>