<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Join</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Performance Results of FlexRML in the KGC W Challenge 2024</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Freund</string-name>
          <email>michael.freund@iis.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Schmid</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rene Dorsch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Harth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graph Construction, RDF Mapping Language, KGCW Challenge</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Integrated Circuits IIS</institution>
          ,
          <addr-line>Nürnberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Friedrich-Alexander-Universität Erlangen-Nürnberg</institution>
          ,
          <addr-line>Nürnberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>KGCW'24: 5th International Workshop on Knowledge Graph Construction</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>1</lpage>
      <abstract>
        <p>The Knowledge Graph Construction Workshop introduced a challenge to evaluate the performance metrics of diferent RML interpreters using a set of standardized benchmarks. We participated in the challenge's performance track with our RML interpreter, FlexRML, and report the median execution time and peak memory consumption over five runs on the provided virtual machine using the challenge tool. Through this challenge, we were able to identify weaknesses in FlexRML, such as its current support for CSV data only, lack of support for the latest RML vocabulary, and a crash that occurs when the system executing the mapping runs out of memory. These are issues that we plan to address in future releases of FlexRML.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>help practitioners choose the right RML interpreter for their project, the Knowledge Graph
Construction Workshop (KGCW) has introduced the KGCW Challenge to enable fair comparison.
The KGCW Challenge consists of a dataset and corresponding RML mappings divided into two
tracks. The first track covers conformance of the RML interpreters to the new RML specifications,
while the second track focuses on mapping performance by evaluating execution time and peak
memory consumption.</p>
      <p>
        This paper reports on the results of FlexRML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a new RML interpreter built from the
ground-up, in the performance track of the KGCW Challenge 2024. The rest of the paper is
divided into three sections. Section 2 gives a brief overview of FlexRML in its current state and
the technical improvements planned for the next releases. Section 3 reports the performance
metrics of FlexRML in the empirical evaluation and a discussion of the results. Finally, Section 4
concludes the paper and outlines future research.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. FlexRML</title>
      <p>FlexRML is a flexible RML interpreter written in C++ that is designed to be usable across
the entire network architecture. This means that FlexRML is designed to operate in almost
unconstrained cloud environments, moderately constrained industrial PCs and single-board
computers at the network edge, and extremely resource-constrained Internet of Things (IoT)
devices and microcontrollers running real-time operating systems. This flexibility sets FlexRML
apart from other RML interpreters, which typically focus only on almost unconstrained cloud
environments and are therefore written in high-level programming languages such as Java,
JavaScript, or Python. In the following, we provide a brief overview of FlexRML’s architecture,
current supported features, and planned features for future releases.</p>
      <sec id="sec-3-1">
        <title>2.1. Architecture</title>
        <p>When mapping non-RDF data to RDF, FlexRML performs two main steps: the preprocessing
step, which optimizes the RML mappings for speed, and the actual mapping step, which executes
the optimized RML mappings and generates the RDF data.</p>
        <p>Preprocessing Step FlexRML applies well-established mapping optimization strategies to
improve overall mapping performance, such as self-join elimination or mapping normalization,
and replaces joins with reference conditions whenever possible. In addition, FlexRML uses
a result size estimation algorithm based on independent Bernoulli sampling. The algorithm
generates a small sample from the original non-RDF data sources using a simple random
sampling approach, performs the mapping on the sample, enumerates all unique RDF triples
generated, and based on the result estimates the number of unique RDF triples that will be
generated when all source files are mapped. The estimated number of unique RDF triples is
used to select correct bit sizes for hash functions and data structures in the rest of the mapping
process, which allows to save memory.</p>
        <p>Mapping Step The mapping process itself can take advantage of multiple cores, if available,
by using the producer-consumer design pattern. In the mapping process, FlexRML generates all
triples, uses a hash function with bit sizes of 32, 64, or 128 bits depending on the result of the
estimation process to hash the generated triple, and compares the result to a hash set containing
all hashes of RDF triples generated up to that point to remove duplicates. If the generated triple
is a duplicate, the triple is discarded, otherwise the triple is written to the output and the hash
is added to the hash set. This duplicate removal approach is memory eficient, but carries the
risk of hash collisions, which can result in missing RDF triples in the output data. By choosing
appropriate bit sizes, the risk of missing output RDF triples can be minimized and has never
occurred in the KGCW challenge dataset.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Current State and Planned Features</title>
        <p>The current release, FlexRML 1.0, is available in easy-to-use, pre-built binaries for Debian-based
64 bit systems, arm-based 64 bit systems such as Raspberry Pis2, and ofers a GitHub Repository 3
usable on microcontrollers such as the Arduino Nano 33 IoT and ESP32 via the Arduino IDE.
Because the code is open source, FlexRML can also be built locally, giving users access to the
latest development features.</p>
        <p>
          The main drawback of FlexRML is that currently only the mapping of CSV source data to
RDF is fully supported. This is because we want to be able to run FlexRML across the entire
network architecture, and some microcontrollers only support a subset of the C++ standard
library, which limits our ability to reuse existing libraries for input file handling. This forces
us to implement the file handling logic ourselves, which is complicated and time-consuming.
But we are making progress, in the current development build we already partially support
mapping JSON data with a subset of JSONPath expressions. In addition, we currently do not
fully support the new RML vocabulary terms [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], only those of the RML-IO specification 4.
        </p>
        <p>In the next releases of FlexRML, we will fully integrate JSON by enhancing our implementation
of a JSONPath parser to cover all features. Additionally, we plan to make FlexRML available in
web browsers using WebAssembly, which will allow us to extend FlexRML beyond the cloud and
directly into user applications. We are also aware that the performance of joins that cannot be
substituted by reference conditions is not optimal, and we plan to apply optimization strategies
to address this. A full list of planned features and the implementation progress can be found on
our roadmap on GitHub5.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Empirical Evaluation</title>
      <p>In the following, we discuss the hardware and software used, report the results of the empirical
evaluation, and discuss the results.</p>
      <sec id="sec-4-1">
        <title>3.1. Experimental Setup</title>
        <p>For the empirical evaluation, we used the virtual machine provided by the organizers of the
KGCW Challenge. To execute the mappings, we used the recommended challenge tool, for
2https://github.com/wintechis/flex-rml/releases
3https://github.com/wintechis/flex-rml-esp32/tree/main
4https://github.com/kg-construct/rml-io
5https://github.com/wintechis/flex-rml?tab=readme-ov-file#planned-features-for-flexrml
which we included a Dockerfile for FlexRML. The non-RDF data to be transformed into RDF was
available in CSV format, and in the default process, it is loaded into a database from which the
mapping is performed. Since FlexRML does not support mapping from databases, we adjusted
the pipeline to directly map the CSV data. Additionally, FlexRML does not support the newest
vocabulary terms, so we adjusted the RML mapping rules accordingly. The challenge tool with
FlexRML integrated, the adjusted metadata.json files describing the new pipeline, and the
adjusted RML mappings used for the evaluation can be found on GitHub6. To allow for easy
reproducibility, we also included a simple shell script that copies all the adjusted data into the
correct directories and needs to be run once the benchmark data has been downloaded. We used
a simple Python script to verify the correctness of FlexRML’s output against the reference output
provided7. All performance metrics reported in the following are collected by the challenge
tool over five runs.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Results</title>
        <p>The challenge tool evaluated the performance of FlexRML in duplicate values, empty values,
the GTFS-Madrid-Bench, joins, mappings, properties, and records.</p>
        <p>Duplicated Values The Duplicated Values dataset comes in five variants with diferent
percentages of duplicates, ranging from 0 percent to 100 percent in steps of 25 percent. The median
runtime and peak memory usage is reported in Table 1. All outputs of FlexRML match the
expected reference output.</p>
        <sec id="sec-4-2-1">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Duplicated Values 0%</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Duplicated Values 25%</title>
        </sec>
        <sec id="sec-4-2-4">
          <title>Duplicated Values 50%</title>
        </sec>
        <sec id="sec-4-2-5">
          <title>Duplicated Values 75%</title>
        </sec>
        <sec id="sec-4-2-6">
          <title>Duplicated Values 100%</title>
        </sec>
        <sec id="sec-4-2-7">
          <title>Execution Time (sec) Peak Memory (MiB)</title>
          <p>8.59 454.48
7.46 456.40
6.75 417.48
6.23 406.35
5.73 393.68</p>
          <p>The results show that FlexRML’s runtime and peak memory usage during the mapping process
continuously decrease as the number of duplicates increases. This is because the number of
unique output RDF triples also decreases, resulting in fewer write operations to disk and thus a
reduction in runtime. In addition, fewer hashes need to be stored in the hash set, resulting in
lower memory consumption.</p>
          <p>Empty Values The Empty Values dataset is available in the same variations as the previous
dataset, containing empty values ranging from 0 percent to 100 percent of the total dataset size
in steps of 25 percent. The median runtime and peak memory consumption are reported in
Table 2. The output produced by FlexRML again matches the expected reference output.</p>
          <p>The results of the Empty Values dataset mirror those of the Duplicated Values dataset. As the
overall number of empty values increases, the required disk writes are reduced, resulting in
6https://github.com/FreuMi/challenge-tool
7https://zenodo.org/records/10973433</p>
        </sec>
        <sec id="sec-4-2-8">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-9">
          <title>Empty Values 0%</title>
        </sec>
        <sec id="sec-4-2-10">
          <title>Empty Values 25%</title>
        </sec>
        <sec id="sec-4-2-11">
          <title>Empty Values 50%</title>
        </sec>
        <sec id="sec-4-2-12">
          <title>Empty Values 75%</title>
        </sec>
        <sec id="sec-4-2-13">
          <title>Empty Values 100%</title>
          <p>faster execution times. In addition, the size of the internal hash set decreases, resulting in less
memory consumption.</p>
          <p>
            GTFS-Madrid-Benchmark The GTFS-Madrid-Benchmark [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] is used to evaluate the mapping
performance with increasing dataset sizes. The benchmark is available in four scales, scale 1,
scale 10, scale 100, and scale 1000. The benchmarks testing mixed content mapping, i.e., mapping
JSON and XML data, could not be run because FlexRML currently only supports mapping CSV
data. The median runtime and peak memory consumption are reported in Table 3.
          </p>
        </sec>
        <sec id="sec-4-2-14">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-15">
          <title>GTFS-Madrid Scale 1</title>
        </sec>
        <sec id="sec-4-2-16">
          <title>GTFS-Madrid Scale 10</title>
        </sec>
        <sec id="sec-4-2-17">
          <title>GTFS-Madrid Scale 100</title>
        </sec>
        <sec id="sec-4-2-18">
          <title>GTFS-Madrid Scale 1000</title>
        </sec>
        <sec id="sec-4-2-19">
          <title>Execution Time (sec) Peak Memory (MiB)</title>
          <p>2.97 414.10
24.14 599.30
251.88 2332.76
-</p>
          <p>While running the GTFS-Madrid-Benchmark with a scale factor of 1000, FlexRML crashed
because the virtual machine doing the mapping ran out of memory. We plan to address this issue
in future releases of FlexRML, as it requires us to change the way we store the hashes of generated
RDF triples. Specifically, we need to monitor the system’s RAM and, when approaching the
maximum available RAM, store the additional hashes on disk to avoid a crash.</p>
          <p>
            Additionally, we noticed that the provided reference data does not match the output generated
by FlexRML. A closer examination of the data revealed that most of the numerical values are
set to 999.999, and additional data types were expected but not declared in the mappings. This
issue was also identified in the KGCW Challenge 2023 [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. When using the mappings and data
sources with multiple other RML interpreters, the results match those of FlexRML and show
the same mismatches with the reference.
          </p>
          <p>Otherwise, the performance results are as expected. As the size of the source data increases,
both execution time and peak memory usage increase.</p>
          <p>Joins The Join dataset evaluates the performance of RML interpreters when mapping definitions
containing diferent types of joins, namely 1-to-1 joins, 1-to-N joins, N-to-1 joins, and N-to-M
joins. The Join dataset contains 33 diferent variations, varying in the number of joined datasets
and the percentage of data in each dataset that needs to be joined. Due to the large number of
datasets, we report only selected results in Table 4. All outputs are in line with the reference.</p>
          <p>The performance results for the Join dataset show that FlexRML handles diferent types of
joins with fairly consistent execution times and peak memory usage. For datasets with 50% of
the data to be joined, the execution times range from 14.11 seconds to 18.21 seconds, indicating
that the join type does not drastically afect the processing time. Peak memory usage shows a
small variation, from 489.24 MiB to 543.96 MiB, indicating that memory use is relatively stable
across diferent join variants.</p>
          <p>Mappings The Mappings dataset is used to evaluate the impact of diferent structures in the
RML mapping rules. Specifically, the mappings vary the number of TripleMaps (TMs) and
PredicateObjectMaps (POMs). TMs specify rules for transforming the input data into RDF
triples and consist of POMs, which define how the predicate and object of the predefined subject
must be generated. The Mappings dataset consists of mappings with variants of 1 TM with
15 POMs, 15 TMs with 1 POM each, 3 TMs with 5 POMs each, and 5 TMs with 3 POMs each.
The resulting median execution time and peak memory consumption are shown in Table 5. All
outputs of FlexRML match the reference output data.</p>
        </sec>
        <sec id="sec-4-2-20">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-21">
          <title>Mappings 1TM / 15POM</title>
        </sec>
        <sec id="sec-4-2-22">
          <title>Mappings 15TM / 1POM</title>
        </sec>
        <sec id="sec-4-2-23">
          <title>Mappings 3TM / 5POM</title>
        </sec>
        <sec id="sec-4-2-24">
          <title>Mappings 5TM / 3POM</title>
        </sec>
        <sec id="sec-4-2-25">
          <title>Execution Time (sec) Peak Memory (MiB)</title>
          <p>6.54 444.65
6.34 434.35
3.79 486.92
4.60 453.81</p>
          <p>The performance results for the Mappings dataset reveal that mappings with fewer TMs but
more POMs per TM (1 TM with 15 POMs) have slightly higher execution times and memory
usage compared to variants with more TMs but fewer POMs per TM (15 TMs with 1 POM
each). The variant with 3 TMs and 5 POMs each achieves the lowest execution time of 3.79
seconds but the highest peak memory usage of 486.92 MiB. This is due to the way multithreading
is implemented in FlexRML, where each TM is mapped in a separate thread. This increases
memory consumption due to threading overhead but also reduces execution time.
Properties The Properties dataset increases the number of columns while keeping the number
of rows constant. This means, that the Properties dataset evaluates RML interpreters for their
ability to handle horizontally scaled dataset sizes. The Properties dataset is available in the
variants 1M rows with 1 column, 1M rows with 10 columns, 1M rows with 20 columns, and 1M
rows with 30 columns. The results are shown in Table 6. All outputs of FlexRML are verified to
match the expected output.</p>
        </sec>
        <sec id="sec-4-2-26">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-27">
          <title>Properties 1M rows / 1 column</title>
        </sec>
        <sec id="sec-4-2-28">
          <title>Properties 1M rows / 10 columns</title>
        </sec>
        <sec id="sec-4-2-29">
          <title>Properties 1M rows / 20 columns</title>
        </sec>
        <sec id="sec-4-2-30">
          <title>Properties 1M rows / 30 columns</title>
          <p>The performance results for the Properties dataset show that both FlexRML’s execution time
and peak memory consumption increase as the number of columns in the dataset increases,
while keeping the number of rows constant at 1 million. The execution time increases from
5.63 seconds for a dataset with 1 column to 137.30 seconds for a dataset with 30 columns, an
increase of approximately 25 times. Similarly, the peak memory usage increases from 403.58
MiB to 1658.16 MiB over the same range, an increase of about a factor of 4. These results show
that FlexRML’s mapping performance is directly afected by horizontally scaled data, with more
columns leading to higher computational requirements, more hashes to compute and more RDF
triples to write to the disk, and increased memory usage due to a larger in-memory hash set.
However, the scaling is sublinear, as a 30-fold increase in the number of columns results in only
a 25-fold increase in execution time and a 4-fold increase in memory consumption.
Records The Records dataset is complementary to the Properties dataset, as it evaluates the
performance of RML interpreters in handling vertically scaled data. The Records dataset keeps
the number of columns constant while systematically increasing the number of rows with each
variant. The dataset consists of the variants 10k rows with 20 columns, 100k rows with 20
columns, 1M rows with 20 columns, and 10M rows with 20 columns, as shown in Table 7. All
outputs of FlexRML again match the expected reference output.</p>
        </sec>
        <sec id="sec-4-2-31">
          <title>Dataset</title>
        </sec>
        <sec id="sec-4-2-32">
          <title>Records 10k rows / 20 columns</title>
        </sec>
        <sec id="sec-4-2-33">
          <title>Records 100k rows / 20 columns</title>
        </sec>
        <sec id="sec-4-2-34">
          <title>Records 1M rows / 20 columns</title>
        </sec>
        <sec id="sec-4-2-35">
          <title>Records 10M rows / 20 columns</title>
        </sec>
        <sec id="sec-4-2-36">
          <title>Execution Time (sec) Peak Memory (MiB)</title>
          <p>1.23 381.51
8.28 434.05
85.79 1154.07
943.34 11275.53</p>
          <p>The performance results for the Records dataset show that both the execution time and peak
memory usage of FlexRML increase as the number of rows in the dataset increases, while the
number of columns remains constant at 20. The execution time increases from 1.23 seconds for
10k rows to 943.34 seconds for 10M rows, and the peak memory usage increases from 381.51
MiB to 11275.53 MiB over the same range. While these increases are significant, they are not
proportional, with the execution time increasing by a factor of about 767 and memory usage
increasing by a factor of about 30 for a 1000-fold increase in dataset size. This indicates that
FlexRML scales more eficiently than a direct linear relationship would suggest, especially in
terms of memory usage.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>The results of FlexRML in the KGCW Challenge 2024 show that the performance metrics
for handling the Duplicated Values dataset, the Empty Values dataset, and the
GTFS-MadridBenchmark are as expected. The number of unique output RDF triples mainly afects memory
consumption due to the duplicate removal hash set and execution time due to disk write
operations. The Join dataset shows stable performance regardless of join type. The Mappings
dataset shows that multithreading increases memory consumption due to overhead, but reduces
execution times. The Properties and Records datasets show that FlexRML’s execution time and
memory consumption increase with larger dataset sizes, but only sublinearly, with memory
consumption growing much slower than execution time and both less than the increase in
dataset size. Overall, FlexRML’s performance was the best of the RML interpreters participating
in the Challenge, and FlexRML received the Performance Award in the KGCW 2024 Challenge.</p>
      <p>Future research plans focus on combining our result size estimation algorithm with mapping
partitioning to further reduce memory consumption.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was funded by the German Federal Ministry for Economic Afairs and Climate Action
(BMWK) through the Antrieb 4.0 project (Grant No. 13IK015B).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Heterogeneous data and big data analytics</article-title>
          ,
          <source>Automatic Control and Information Sciences</source>
          <volume>3</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          , et al.,
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          .,
          <source>7th Workshop on Linked Data on the Web</source>
          <volume>1184</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>U.</given-names>
            <surname>Şimşek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kärle</surname>
          </string-name>
          , D. Fensel, RocketRML
          <article-title>- A NodeJS implementation of a use-case specific RML mapper</article-title>
          , arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>04969</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>SDM-RDFizer</surname>
          </string-name>
          :
          <article-title>An RML interpreter for the eficient creation of RDF knowledge graphs</article-title>
          ,
          <source>in: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dorsch</surname>
          </string-name>
          , et al.,
          <article-title>FlexRML: A Flexible and Memory Eficient Knowledge Graph Materializer</article-title>
          , in: Extended Semantic Web Conference, Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. Van Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          , et al.,
          <article-title>The RML ontology: A community-driven modular redesign after a decade of experience in mapping heterogeneous data to RDF</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>GTFS-</surname>
          </string-name>
          Madrid-Bench:
          <article-title>A benchmark for virtual knowledge graph access in the transport domain</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>65</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bin</surname>
          </string-name>
          , C. Stadler,
          <source>KGCW2023 Challenge Report RDFProcessingToolkit / Sansa</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>