<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Results for Knowledge Graph Creation Challenge 2024: SDM-RDFizer</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Enrique</forename><surname>Iglesias</surname></persName>
							<email>iglesias@l3s.de</email>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Leibniz University of Hannover</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maria-Esther</forename><surname>Vidal</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Leibniz University of Hannover</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">TIB Leibniz Information Centre for Science and Technology</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Results for Knowledge Graph Creation Challenge 2024: SDM-RDFizer</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">6036908930D955A2DEE31C92C86FBB29</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:22+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph Creation</term>
					<term>Data Integration System</term>
					<term>RDF Mapping Languages</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The volume of data generated in recent years has increased drastically, necessitating a unified schema to integrate multiple data sources into a single format. The RDF Mapping Language (RML) was developed to define the structure of knowledge graphs (KGs). Over time, various extensions have been introduced to enhance RML's functionality, creating a need for a new specification that consolidates these extensions. Track 1 of the KGCW 2023 Challenge dataset addresses this need by providing a comprehensive set of test cases to ensure that knowledge graph creation engines comply with the updated RML specification. This paper reports on the conformance evaluation of SDM-RDFizer using this dataset, highlighting its capabilities and areas for improvement in achieving full RML compliance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The substantial surge in data volume has led to the increasing use of knowledge graphs (KGs) to integrate multiple data sources in different formats. Consequently, various mapping languages have emerged to define KGs; some of the more well-known are R2RML and its extension, RDF Mapping Language (RML) <ref type="bibr" target="#b0">[1]</ref>. Both of these languages adhere to the rules established by the Resource Description Framework (RDF) 1 . Over time, new extensions for RML have been developed, adding functionalities like the execution of functions for value transformation (RML+FnO <ref type="bibr" target="#b1">[2]</ref>) and the use of RDF-Star (RML-Star <ref type="bibr" target="#b2">[3]</ref>). A new specification for RML 2 has been defined to incorporate all these extensions and remove references to R2RML formally. The Track 1 dataset of the KGCW 2024 Challenge covers many test cases, including basic cases, functions, RML-Star, remote sources, and specific outputs. These test cases ensure that existing KG creation tools can incorporate the new RML specification and achieve full compliance. This report presents the updates needed to integrate the new specification into SDM-RDFizer and evaluates the results using the Track 1 dataset. This paper is organized into three additional sections. Section 2 provides an overview of SDM-RDFizer, including its techniques, data structures, and physical operators for optimizing KG creation. Section 3 details the results of the challenge, including the dataset definition and the necessary improvements to meet the test cases. Finally, Section 4 presents the conclusions and future steps for SDM-RDFizer. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SDM-RDFizer</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Test Cases of the Knowledge Graph Creation Challenge</head><p>Track 1 of the KGCW 2024 Challenge<ref type="foot" target="#foot_1">4</ref> aims to inspire new methods and techniques for incorporating the new RML specification into existing KG creation engines. This dataset comprises five sets of test cases. RML-Core: This set includes basic test cases originally defined in the RML test cases <ref type="foot" target="#foot_2">5</ref> to test the compliance of KG creation engines. These cases have been updated to reflect the new specification and utilize CSV, JSON, XML files, and relational databases (MySQL and Postgres) as data sources. RML-FNML: This set contains test cases that use functions to transform data, employing a series of pre-defined functions to execute these transformations. RML-Star: This set incorporates RDF-Star<ref type="foot" target="#foot_3">6</ref> test cases, generated from the RML-Star test cases<ref type="foot" target="#foot_4">7</ref> in accordance with the new specification. RML-IO: This set includes a wide variety of remote data sources such as endpoints, compressed files, and JSON and XML files. It also defines outputs for specific properties in various formats like Turtle, RDF/JSON, JSON-LD, and multiple compressed formats like Zip and Tar. The name reflects its focus on Input and Output. RML-CC: This set is comprised of collections and containers. This work presents the results of executing RML-Core, RML-FNML, RML-Star, and RML-IO with SDM-RDFizer. RML-CC will be incorporated at a later date. To parse the new specification, the parser query is updated to replace the rml prefix with its new namespace, replace all mentions of the R2RML namespace with the RML namespace, and update the definition of rml:logicalSource to include the use of the rml:path and rml:root clauses, replacing rml:query with rml:iterator. The rdflib library is used to execute the parser query. SDM-RDFizer successfully performs all 238 test cases from this set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Results of RML-FNML</head><p>RML-FNML consists of test cases that use functions to transform values, including tasks like replacing strings, transforming strings to lower and upper case, concatenating strings, and more. These test cases demonstrate the use of RML+FnO in the new specification. SDM-RDFizer incorporates strategies from FunMap <ref type="bibr" target="#b4">[5]</ref> to execute functions. FunMap is a TM translator that converts TMs containing functions and their corresponding data sources into TMs without functions, reflecting the execution of the functions.</p><p>To parse TMs with functions, a new parser query is explicitly defined for extracting the functions from the TMs. This allows for proper handling of nested functions, as each function is extracted individually, and nested functions are called from the function's parameters. SDM-RDFizer successfully performs all 14 test cases from this set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Results of RML-Star</head><p>RML-Star comprises test cases that use RDF-Star, an extension of RDF that introduces a new term, the quoted triple, which can be used as either the subject or the object of a triple. Therefore, RML-Star presents rml:quotedTriplesMap, enabling the definition of quoted triples in a KG. SDM-RDFizer implements a new operator to generate quoted triples, allowing for recursive application since quoted triples can contain other quoted triples. Another challenge in these test cases was using joins in the rml:subjectMap. These cases were handled similarly to joins in the rml:objectMap, using the OJM operator for join execution and PJTT for storing the results. The parser query was expanded to recognize rml:quotedTriplesMap. SDM-RDFizer successfully performs all 18 test cases from this set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Results of RML-IO</head><p>RML-IO consists of test cases that cover a wide range of remote data sources, including compressed files, JSON and XML files, and data extracted from SPARQL endpoints. Some cases introduce the concept of outputting certain triples to specific output files, which may need to be compressed or translated into different formats, such as JSON-LD, Turtle, etc. SDM-RDFizer uses the requests library to collect data from remote sources. For SPARQL endpoints, the SPARQLWrapper library connects and executes SPARQL queries, with the results converted to a format similar to CSV. When dealing with compressed files, SDM-RDFizer downloads the file locally and decompresses it using the appropriate library for the format, such as the zip library for Zip files. These test cases introduce the concept of defining an alternate output within the TM. These can be specified in the rml:subjectMap, rml:predicateMap, rml:objectMap, rml:languageMap, rml:datatypeMap, or rml:graphMap. Based on its location, triples will be outputted to the alternate output file. If defined in the rml:subjectMap, all valid triples generated from this TM will be sent to that particular output. If the alternate output is defined elsewhere, only triples with that specific property will be outputted there. Any remaining triples not destined for an alternate output will be sent to the original output file, which is defined at the start of the SDM-RDFizer execution. When handling these alternative output files, SDM-RDFizer prioritizes generating them over the standard output and can compress them. Finally, SDM-RDFizer converts the output files into various RDF formats, such as RDF/XML, JSON-LD, etc., using the rdflib library. SDM-RDFizer successfully executes 65 of the 67 test cases. The two failures occurred because SDM-RDFizer cannot upload the generated triples into a SPARQL endpoint.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>The KGCW 2024 Challenge Track 1 dataset evaluates the compliance of state-of-the-art engines with the new formulation of RML. It comprises 366 test cases across five categories: RML-Core, RML-IO, RML-FNML, RML-Star, and RML-CC. SDM-RDFizer successfully executes 335 of these test cases, fully covering RML-Core, RML-FNML, RML-Star, and all but two from RML-IO.</p><p>To achieve this, SDM-RDFizer introduced a new parsing query, strategies from FunMap, and an operator for generating inner triples for RML-Star. Moving forward, SDM-RDFizer aims to address the remaining RML-CC cases by incorporating new methods. For this purpose, a new operator will defined, which will behave differently based on whether it is transforming a list, bag, or sequence. Additionally, a new data structure will be developed for intermediate results in RML-Star to avoid repeated inner triple generation, and an optimized parser will be implemented to manage the increasing complexity. With these improvements, SDM-RDFizer is set to become a fully compliant RML engine.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Predicate Join Tuple Table (</head><label></label><figDesc>SDM-RDFizer<ref type="bibr" target="#b3">[4]</ref> is a KG creation engine that is RML compliant. SDM-RDFizer is comprised of two modules: Triples Maps Planning (TMP) and Triples Maps Execution (TME). Each module has different data structures that optimize different aspects of the KG graph creation process. TMP determines the execution order for the triples maps (TM) to keep memory usage to a minimum. TME generates KG following the order established by TMP. Multiple novel operators are defined to transform different types of TMs. Simple Object Map (SOM) operator executes rml:template and rml:reference, Object Reference Map (ORM) executes parent triples maps, and Object Join Map (OJM) executes joins. All generated triples are compared to the corresponding Predicate Tuple Table (PTT) to determine if it is a duplicate and Dictionary Table (DT) compress the resources stored in PTT. PJTT) stores the result of executing join. SDM-RDFizer is publically available on GitHub 3 .</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Table 1 shows the total number Test Cases of the KGCW 2024 Challenge Track 1 dataset.of test cases in the dataset and which cases were passed and failed by SDM-RDFizer. The full results are available on GitHub 8 . RML-Core comprises test cases covering the fundamentals of RML, such as the definition of classes, the use of rml:template and rml:reference, the execution of parent triples maps and joins, and the definition of data types, languages, and graphs. SDM-RDFizer implements three operators to execute different types of mappings.</figDesc><table><row><cell>Set</cell><cell cols="3"># of Test Cases # of Passed Cases # of Fail Cases</cell></row><row><cell>RML-Core</cell><cell>238</cell><cell>238</cell><cell>0</cell></row><row><cell>RML-FNML</cell><cell>14</cell><cell>14</cell><cell>0</cell></row><row><cell>RML-Star</cell><cell>18</cell><cell>18</cell><cell>0</cell></row><row><cell>RML-IO</cell><cell>67</cell><cell>65</cell><cell>2</cell></row><row><cell>RML-CC</cell><cell>29</cell><cell>0</cell><cell>29</cell></row><row><cell>Total</cell><cell>366</cell><cell>335</cell><cell>31</cell></row><row><cell cols="2">3.1. Results of RML-Core</cell><cell></cell><cell></cell></row></table><note>To extract data from various data sources, SDM-RDFizer employs several Python libraries: csv for CSV files, json for JSON files, xml for XML files, mysql-connector for connecting to MySQL databases, and psycopg2 for connecting to Postgres databases.</note></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://github.com/SDM-TIB/SDM-RDFizer</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://zenodo.org/records/10973433</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">https://rml.io/test-cases/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">https://www.w3.org/2021/12/rdf-star.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_4">https://zenodo.org/records/6518802</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_5">https://github.com/SDM-TIB/SDM-RDFizer/tree/master/kgcw_2024_challenge</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially supported by the Federal Ministry for Economic Affairs and Energy of Germany (BMWK) in the project CoyPu (project number 01MK21007[A-L]). Leibniz Association partially funds Maria-Esther Vidal in the "Leibniz Best Minds: Programme for Women Professors", project TrustKG-Transforming Data in Trustable Insights with grant P99/2020.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Linked Data on the Web</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An ontology to semantically declare and describe functions</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">De</forename><surname>Meester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="46" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Rmlstar: A declarative mapping language for rdf-star generation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Delva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arenas-Guerrero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iglesias-Molina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ó</forename><surname>Corcho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-2980/paper374.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ISWC 2021 Posters, Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 20th International Semantic Web Conference (ISWC 2021), Virtual Conference</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">O</forename><surname>Seneviratne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Pesquita</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Sequeda</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Etcheverry</surname></persName>
		</editor>
		<meeting>the ISWC 2021 Posters, Demos and Industry Tracks: From Novel Ideas to Industrial Practice co-located with 20th International Semantic Web Conference (ISWC 2021), Virtual Conference</meeting>
		<imprint>
			<date type="published" when="2021">October 24-28, 2021. 2980. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs</title>
		<author>
			<persName><forename type="first">E</forename><surname>Iglesias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jozashoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Collarana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-E</forename><surname>Vidal</surname></persName>
		</author>
		<idno type="DOI">10.1145/3340531.3412881</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>CIKM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Funmap: Efficient execution of functional mappings for knowledge graph creation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jozashoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Iglesias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vidal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ó</forename><surname>Corcho</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-62419-4_16</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-62419-4\_16" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2020 -19th International Semantic Web Conference</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><forename type="middle">A M</forename><surname>Tamma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Janowicz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Fu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Seneviratne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename></persName>
		</editor>
		<meeting><address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">November 2-6, 2020. 2020</date>
			<biblScope unit="volume">12506</biblScope>
			<biblScope unit="page" from="276" to="293" />
		</imprint>
	</monogr>
	<note>Proceedings, Part I</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
