<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Performance Results of FlexRML in the KGCW Challenge 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Michael</forename><surname>Freund</surname></persName>
							<email>michael.freund@iis.fraunhofer.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer Institute for Integrated Circuits IIS</orgName>
								<address>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sebastian</forename><surname>Schmid</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg</orgName>
								<address>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rene</forename><surname>Dorsch</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer Institute for Integrated Circuits IIS</orgName>
								<address>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andreas</forename><surname>Harth</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer Institute for Integrated Circuits IIS</orgName>
								<address>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Friedrich-Alexander-Universität Erlangen-Nürnberg</orgName>
								<address>
									<settlement>Nürnberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Performance Results of FlexRML in the KGCW Challenge 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">369D860F2CE559AA82F836967F775658</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:22+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph Construction, RDF Mapping Language, KGCW Challenge Orcid 0000-0003-1601-9331 (M. Freund)</term>
					<term>0000-0002-5836-3029 (S. Schmid)</term>
					<term>0000-0001-6857-7314 (R. Dorsch)</term>
					<term>0000-0002-0702-510X (A. Harth)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Knowledge Graph Construction Workshop introduced a challenge to evaluate the performance metrics of different RML interpreters using a set of standardized benchmarks. We participated in the challenge's performance track with our RML interpreter, FlexRML, and report the median execution time and peak memory consumption over five runs on the provided virtual machine using the challenge tool. Through this challenge, we were able to identify weaknesses in FlexRML, such as its current support for CSV data only, lack of support for the latest RML vocabulary, and a crash that occurs when the system executing the mapping runs out of memory. These are issues that we plan to address in future releases of FlexRML.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The construction of Knowledge Graphs (KGs) by creating Resource Description Framework (RDF) data from various sources such as CSV, JSON, or relational databases is becoming increasingly important. This growth is driven by the overall increase in data volumes and the need to integrate heterogeneous data sources <ref type="bibr" target="#b1">[1]</ref>. The most common method of mapping non-RDF data to RDF is a declarative approach using the RDF Mapping Language (RML) <ref type="bibr" target="#b2">[2]</ref>. RML mappings specify the input data sources, their encodings, the ontologies to be used, and the overall structure of the output RDF triples. Using these mappings, RML interpreters then transform the data sources into the specified RDF. The RML method is widely used due to the availability of many well-maintained RML interpreters, including the RMLMapper 1 , RocketRML <ref type="bibr" target="#b3">[3]</ref>, and SDM-RDFizer <ref type="bibr" target="#b4">[4]</ref>.</p><p>Depending on the use case, the constraints of the machine performing the mapping, and the structure and parameters of the source data, such as overall size, number of duplicates, and complexity of the RML mappings, some RML interpreters may be more suitable than others. The suitability of RML interpreters is influenced by factors such as the programming language used for implementation and the internal mapping and data transformation algorithms. To help practitioners choose the right RML interpreter for their project, the Knowledge Graph Construction Workshop (KGCW) has introduced the KGCW Challenge to enable fair comparison. The KGCW Challenge consists of a dataset and corresponding RML mappings divided into two tracks. The first track covers conformance of the RML interpreters to the new RML specifications, while the second track focuses on mapping performance by evaluating execution time and peak memory consumption.</p><p>This paper reports on the results of FlexRML <ref type="bibr" target="#b5">[5]</ref>, a new RML interpreter built from the ground-up, in the performance track of the KGCW Challenge 2024. The rest of the paper is divided into three sections. Section 2 gives a brief overview of FlexRML in its current state and the technical improvements planned for the next releases. Section 3 reports the performance metrics of FlexRML in the empirical evaluation and a discussion of the results. Finally, Section 4 concludes the paper and outlines future research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">FlexRML</head><p>FlexRML is a flexible RML interpreter written in C++ that is designed to be usable across the entire network architecture. This means that FlexRML is designed to operate in almost unconstrained cloud environments, moderately constrained industrial PCs and single-board computers at the network edge, and extremely resource-constrained Internet of Things (IoT) devices and microcontrollers running real-time operating systems. This flexibility sets FlexRML apart from other RML interpreters, which typically focus only on almost unconstrained cloud environments and are therefore written in high-level programming languages such as Java, JavaScript, or Python. In the following, we provide a brief overview of FlexRML's architecture, current supported features, and planned features for future releases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Architecture</head><p>When mapping non-RDF data to RDF, FlexRML performs two main steps: the preprocessing step, which optimizes the RML mappings for speed, and the actual mapping step, which executes the optimized RML mappings and generates the RDF data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Preprocessing</head><p>Step FlexRML applies well-established mapping optimization strategies to improve overall mapping performance, such as self-join elimination or mapping normalization, and replaces joins with reference conditions whenever possible. In addition, FlexRML uses a result size estimation algorithm based on independent Bernoulli sampling. The algorithm generates a small sample from the original non-RDF data sources using a simple random sampling approach, performs the mapping on the sample, enumerates all unique RDF triples generated, and based on the result estimates the number of unique RDF triples that will be generated when all source files are mapped. The estimated number of unique RDF triples is used to select correct bit sizes for hash functions and data structures in the rest of the mapping process, which allows to save memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mapping</head><p>Step The mapping process itself can take advantage of multiple cores, if available, by using the producer-consumer design pattern. In the mapping process, FlexRML generates all triples, uses a hash function with bit sizes of 32, 64, or 128 bits depending on the result of the estimation process to hash the generated triple, and compares the result to a hash set containing all hashes of RDF triples generated up to that point to remove duplicates. If the generated triple is a duplicate, the triple is discarded, otherwise the triple is written to the output and the hash is added to the hash set. This duplicate removal approach is memory efficient, but carries the risk of hash collisions, which can result in missing RDF triples in the output data. By choosing appropriate bit sizes, the risk of missing output RDF triples can be minimized and has never occurred in the KGCW challenge dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Current State and Planned Features</head><p>The current release, FlexRML 1.0, is available in easy-to-use, pre-built binaries for Debian-based 64 bit systems, arm-based 64 bit systems such as Raspberry Pis<ref type="foot" target="#foot_0">2</ref> , and offers a GitHub Repository<ref type="foot" target="#foot_1">3</ref> usable on microcontrollers such as the Arduino Nano 33 IoT and ESP32 via the Arduino IDE. Because the code is open source, FlexRML can also be built locally, giving users access to the latest development features.</p><p>The main drawback of FlexRML is that currently only the mapping of CSV source data to RDF is fully supported. This is because we want to be able to run FlexRML across the entire network architecture, and some microcontrollers only support a subset of the C++ standard library, which limits our ability to reuse existing libraries for input file handling. This forces us to implement the file handling logic ourselves, which is complicated and time-consuming. But we are making progress, in the current development build we already partially support mapping JSON data with a subset of JSONPath expressions. In addition, we currently do not fully support the new RML vocabulary terms <ref type="bibr" target="#b6">[6]</ref>, only those of the RML-IO specification <ref type="foot" target="#foot_2">4</ref> .</p><p>In the next releases of FlexRML, we will fully integrate JSON by enhancing our implementation of a JSONPath parser to cover all features. Additionally, we plan to make FlexRML available in web browsers using WebAssembly, which will allow us to extend FlexRML beyond the cloud and directly into user applications. We are also aware that the performance of joins that cannot be substituted by reference conditions is not optimal, and we plan to apply optimization strategies to address this. A full list of planned features and the implementation progress can be found on our roadmap on GitHub<ref type="foot" target="#foot_3">5</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Empirical Evaluation</head><p>In the following, we discuss the hardware and software used, report the results of the empirical evaluation, and discuss the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Experimental Setup</head><p>For the empirical evaluation, we used the virtual machine provided by the organizers of the KGCW Challenge. To execute the mappings, we used the recommended challenge tool, for which we included a Dockerfile for FlexRML. The non-RDF data to be transformed into RDF was available in CSV format, and in the default process, it is loaded into a database from which the mapping is performed. Since FlexRML does not support mapping from databases, we adjusted the pipeline to directly map the CSV data. Additionally, FlexRML does not support the newest vocabulary terms, so we adjusted the RML mapping rules accordingly. The challenge tool with FlexRML integrated, the adjusted metadata.json files describing the new pipeline, and the adjusted RML mappings used for the evaluation can be found on GitHub<ref type="foot" target="#foot_4">6</ref> . To allow for easy reproducibility, we also included a simple shell script that copies all the adjusted data into the correct directories and needs to be run once the benchmark data has been downloaded. We used a simple Python script to verify the correctness of FlexRML's output against the reference output provided <ref type="foot" target="#foot_5">7</ref> . All performance metrics reported in the following are collected by the challenge tool over five runs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Results</head><p>The challenge tool evaluated the performance of FlexRML in duplicate values, empty values, the GTFS-Madrid-Bench, joins, mappings, properties, and records.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Duplicated Values</head><p>The Duplicated Values dataset comes in five variants with different percentages of duplicates, ranging from 0 percent to 100 percent in steps of 25 percent. The median runtime and peak memory usage is reported in Table <ref type="table" target="#tab_0">1</ref>. All outputs of FlexRML match the expected reference output. The results show that FlexRML's runtime and peak memory usage during the mapping process continuously decrease as the number of duplicates increases. This is because the number of unique output RDF triples also decreases, resulting in fewer write operations to disk and thus a reduction in runtime. In addition, fewer hashes need to be stored in the hash set, resulting in lower memory consumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Empty Values</head><p>The Empty Values dataset is available in the same variations as the previous dataset, containing empty values ranging from 0 percent to 100 percent of the total dataset size in steps of 25 percent. The median runtime and peak memory consumption are reported in faster execution times. In addition, the size of the internal hash set decreases, resulting in less memory consumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>GTFS-Madrid-Benchmark</head><p>The GTFS-Madrid-Benchmark <ref type="bibr" target="#b7">[7]</ref> is used to evaluate the mapping performance with increasing dataset sizes. The benchmark is available in four scales, scale 1, scale 10, scale 100, and scale 1000. The benchmarks testing mixed content mapping, i.e., mapping JSON and XML data, could not be run because FlexRML currently only supports mapping CSV data. The median runtime and peak memory consumption are reported in Table <ref type="table" target="#tab_3">3</ref>. While running the GTFS-Madrid-Benchmark with a scale factor of 1000, FlexRML crashed because the virtual machine doing the mapping ran out of memory. We plan to address this issue in future releases of FlexRML, as it requires us to change the way we store the hashes of generated RDF triples. Specifically, we need to monitor the system's RAM and, when approaching the maximum available RAM, store the additional hashes on disk to avoid a crash.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>Additionally, we noticed that the provided reference data does not match the output generated by FlexRML. A closer examination of the data revealed that most of the numerical values are set to 999.999, and additional data types were expected but not declared in the mappings. This issue was also identified in the KGCW Challenge 2023 <ref type="bibr" target="#b8">[8]</ref>. When using the mappings and data sources with multiple other RML interpreters, the results match those of FlexRML and show the same mismatches with the reference.</p><p>Otherwise, the performance results are as expected. As the size of the source data increases, both execution time and peak memory usage increase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Joins</head><p>The Join dataset evaluates the performance of RML interpreters when mapping definitions containing different types of joins, namely 1-to-1 joins, 1-to-N joins, N-to-1 joins, and N-to-M joins. The Join dataset contains 33 different variations, varying in the number of joined datasets and the percentage of data in each dataset that needs to be joined. Due to the large number of datasets, we report only selected results in Table <ref type="table" target="#tab_4">4</ref>. All outputs are in line with the reference. The performance results for the Join dataset show that FlexRML handles different types of joins with fairly consistent execution times and peak memory usage. For datasets with 50% of the data to be joined, the execution times range from 14.11 seconds to 18.21 seconds, indicating that the join type does not drastically affect the processing time. Peak memory usage shows a small variation, from 489.24 MiB to 543.96 MiB, indicating that memory use is relatively stable across different join variants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mappings</head><p>The Mappings dataset is used to evaluate the impact of different structures in the RML mapping rules. Specifically, the mappings vary the number of TripleMaps (TMs) and PredicateObjectMaps (POMs). TMs specify rules for transforming the input data into RDF triples and consist of POMs, which define how the predicate and object of the predefined subject must be generated. The Mappings dataset consists of mappings with variants of 1 TM with 15 POMs, 15 TMs with 1 POM each, 3 TMs with 5 POMs each, and 5 TMs with 3 POMs each. The resulting median execution time and peak memory consumption are shown in Table <ref type="table" target="#tab_5">5</ref>. All outputs of FlexRML match the reference output data. The performance results for the Mappings dataset reveal that mappings with fewer TMs but more POMs per TM (1 TM with 15 POMs) have slightly higher execution times and memory usage compared to variants with more TMs but fewer POMs per TM (15 TMs with 1 POM each). The variant with 3 TMs and 5 POMs each achieves the lowest execution time of 3.79 seconds but the highest peak memory usage of 486.92 MiB. This is due to the way multithreading is implemented in FlexRML, where each TM is mapped in a separate thread. This increases memory consumption due to threading overhead but also reduces execution time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Properties</head><p>The Properties dataset increases the number of columns while keeping the number of rows constant. This means, that the Properties dataset evaluates RML interpreters for their ability to handle horizontally scaled dataset sizes. The Properties dataset is available in the variants 1M rows with 1 column, 1M rows with 10 columns, 1M rows with 20 columns, and 1M rows with 30 columns. The results are shown in Table <ref type="table" target="#tab_6">6</ref>. All outputs of FlexRML are verified to match the expected output. The performance results for the Properties dataset show that both FlexRML's execution time and peak memory consumption increase as the number of columns in the dataset increases, while keeping the number of rows constant at 1 million. The execution time increases from 5.63 seconds for a dataset with 1 column to 137.30 seconds for a dataset with 30 columns, an increase of approximately 25 times. Similarly, the peak memory usage increases from 403.58 MiB to 1658.16 MiB over the same range, an increase of about a factor of 4. These results show that FlexRML's mapping performance is directly affected by horizontally scaled data, with more columns leading to higher computational requirements, more hashes to compute and more RDF triples to write to the disk, and increased memory usage due to a larger in-memory hash set. However, the scaling is sublinear, as a 30-fold increase in the number of columns results in only a 25-fold increase in execution time and a 4-fold increase in memory consumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Records</head><p>The Records dataset is complementary to the Properties dataset, as it evaluates the performance of RML interpreters in handling vertically scaled data. The Records dataset keeps the number of columns constant while systematically increasing the number of rows with each variant. The dataset consists of the variants 10k rows with 20 columns, 100k rows with 20 columns, 1M rows with 20 columns, and 10M rows with 20 columns, as shown in Table <ref type="table" target="#tab_7">7</ref>. All outputs of FlexRML again match the expected reference output. The performance results for the Records dataset show that both the execution time and peak memory usage of FlexRML increase as the number of rows in the dataset increases, while the number of columns remains constant at 20. The execution time increases from 1.23 seconds for 10k rows to 943.34 seconds for 10M rows, and the peak memory usage increases from 381.51 MiB to 11275.53 MiB over the same range. While these increases are significant, they are not proportional, with the execution time increasing by a factor of about 767 and memory usage increasing by a factor of about 30 for a 1000-fold increase in dataset size. This indicates that</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>The results of FlexRML in the KGCW Challenge 2024 show that the performance metrics for handling the Duplicated Values dataset, the Empty Values dataset, and the GTFS-Madrid-Benchmark are as expected. The number of unique output RDF triples mainly affects memory consumption due to the duplicate removal hash set and execution time due to disk write operations. The Join dataset shows stable performance regardless of join type. The Mappings dataset shows that multithreading increases memory consumption due to overhead, but reduces execution times. The Properties and Records datasets show that FlexRML's execution time and memory consumption increase with larger dataset sizes, but only sublinearly, with memory consumption growing much slower than execution time and both less than the increase in dataset size. Overall, FlexRML's performance was the best of the RML interpreters participating in the Challenge, and FlexRML received the Performance Award in the KGCW 2024 Challenge.</p><p>Future research plans focus on combining our result size estimation algorithm with mapping partitioning to further reduce memory consumption.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Performance metrics of FlexRML mapping the Duplicated Values dataset.</figDesc><table><row><cell>Dataset</cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>Duplicated Values 0%</cell><cell>8.59</cell><cell>454.48</cell></row><row><cell>Duplicated Values 25%</cell><cell>7.46</cell><cell>456.40</cell></row><row><cell>Duplicated Values 50%</cell><cell>6.75</cell><cell>417.48</cell></row><row><cell>Duplicated Values 75%</cell><cell>6.23</cell><cell>406.35</cell></row><row><cell>Duplicated Values 100%</cell><cell>5.73</cell><cell>393.68</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>The output produced by FlexRML again matches the expected reference output. The results of the Empty Values dataset mirror those of the Duplicated Values dataset. As the overall number of empty values increases, the required disk writes are reduced, resulting in</figDesc><table><row><cell>Dataset</cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>Empty Values 0%</cell><cell>7.98</cell><cell>458.34</cell></row><row><cell>Empty Values 25%</cell><cell>7.06</cell><cell>462.36</cell></row><row><cell>Empty Values 50%</cell><cell>5.83</cell><cell>436.75</cell></row><row><cell>Empty Values 75%</cell><cell>4.91</cell><cell>415.30</cell></row><row><cell>Empty Values 100%</cell><cell>3.80</cell><cell>405.78</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Performance metrics of FlexRML mapping the Empty Values dataset.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 Performance</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>GTFS-Madrid Scale 1</cell><cell>2.97</cell><cell>414.10</cell></row><row><cell>GTFS-Madrid Scale 10</cell><cell>24.14</cell><cell>599.30</cell></row><row><cell>GTFS-Madrid Scale 100</cell><cell>251.88</cell><cell>2332.76</cell></row><row><cell>GTFS-Madrid Scale 1000</cell><cell>-</cell><cell>-</cell></row></table><note>metrics of FlexRML mapping the GTFS-Madrid-Benchmark dataset.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Performance metrics of FlexRML mapping selected elements of the Join dataset.</figDesc><table><row><cell>Dataset</cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>Join 1-1 50%</cell><cell>14.52</cell><cell>526.20</cell></row><row><cell>Join 1-10 50%</cell><cell>14.11</cell><cell>489.24</cell></row><row><cell>Join 10-1 50%</cell><cell>14.42</cell><cell>514.92</cell></row><row><cell>Join 5-10 50%</cell><cell>18.20</cell><cell>526.59</cell></row><row><cell>Join 10-5 50%</cell><cell>18.21</cell><cell>543.96</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5</head><label>5</label><figDesc>Performance metrics of FlexRML mapping the Mappings dataset.</figDesc><table><row><cell>Dataset</cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>Mappings 1TM / 15POM</cell><cell>6.54</cell><cell>444.65</cell></row><row><cell>Mappings 15TM / 1POM</cell><cell>6.34</cell><cell>434.35</cell></row><row><cell>Mappings 3TM / 5POM</cell><cell>3.79</cell><cell>486.92</cell></row><row><cell>Mappings 5TM / 3POM</cell><cell>4.60</cell><cell>453.81</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6</head><label>6</label><figDesc>Performance metrics of FlexRML mapping the Properties dataset.</figDesc><table><row><cell>Dataset</cell><cell cols="2">Execution Time (sec) Peak Memory (MiB)</cell></row><row><cell>Properties 1M rows / 1 column</cell><cell>5.63</cell><cell>403.58</cell></row><row><cell>Properties 1M rows / 10 columns</cell><cell>43.66</cell><cell>750.66</cell></row><row><cell>Properties 1M rows / 20 columns</cell><cell>87.01</cell><cell>1141.77</cell></row><row><cell>Properties 1M rows / 30 columns</cell><cell>137.30</cell><cell>1658.16</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 7</head><label>7</label><figDesc>Performance metrics of FlexRML mapping the Records dataset.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://github.com/wintechis/flex-rml/releases</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://github.com/wintechis/flex-rml-esp32/tree/main</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://github.com/kg-construct/rml-io</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">https://github.com/wintechis/flex-rml?tab=readme-ov-file#planned-features-for-flexrml</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://github.com/FreuMi/challenge-tool</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://zenodo.org/records/10973433</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through the Antrieb 4.0 project (Grant No. 13IK015B).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title/>
		<idno>/ 20 columns 943.34 11275</idno>
	</analytic>
	<monogr>
		<title level="j">Records 1M rows / 20 columns</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="page">53</biblScope>
		</imprint>
	</monogr>
	<note>Records 10M rows</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Heterogeneous data and big data analytics</title>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Automatic Control and Information Sciences</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">RML: A generic language for integrated RDF mappings of heterogeneous data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">7th Workshop on Linked Data on the Web</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page">1184</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><surname>Şimşek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kärle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fensel</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1903.04969</idno>
		<title level="m">RocketRML -A NodeJS implementation of a use-case specific RML mapper</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SDM-RDFizer: An RML interpreter for the efficient creation of RDF knowledge graphs</title>
		<author>
			<persName><forename type="first">E</forename><surname>Iglesias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jozashoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</title>
				<meeting>the 29th ACM International Conference on Information &amp; Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">FlexRML: A Flexible and Memory Efficient Knowledge Graph Materializer</title>
		<author>
			<persName><forename type="first">M</forename><surname>Freund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dorsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Extended Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The RML ontology: A community-driven modular redesign after a decade of experience in mapping heterogeneous data to RDF</title>
		<author>
			<persName><forename type="first">A</forename><surname>Iglesias-Molina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Van Assche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arenas-Guerrero</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="152" to="175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">GTFS-Madrid-Bench: A benchmark for virtual knowledge graph access in the transport domain</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Priyatna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Bin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Stadler</surname></persName>
		</author>
		<title level="m">KGCW2023 Challenge Report RDFProcessingToolkit / Sansa</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
