<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Conformance of an RML Processor Built from Scratch to Validate Specifications and Test Cases</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christophe</forename><surname>Debruyne</surname></persName>
							<email>c.debruyne@uliege.be</email>
							<affiliation key="aff0">
								<orgName type="department">Montefiore Institute</orgName>
								<orgName type="institution">University of Liège</orgName>
								<address>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dylan</forename><surname>Van Assche</surname></persName>
							<email>dylan.vanassche@ugent.be</email>
							<affiliation key="aff1">
								<orgName type="department">Dept. Electronics &amp; Information Systems</orgName>
								<orgName type="laboratory">IDLab</orgName>
								<orgName type="institution">Ghent University -imec</orgName>
								<address>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Conformance of an RML Processor Built from Scratch to Validate Specifications and Test Cases</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7DF41C0B51C12906DBB86A5F8844EC2D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:22+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>RML</term>
					<term>RML Conformance Checking</term>
					<term>Knowledge Graph Generation</term>
					<term>BURP</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Knowledge Graph Construction community has worked on the new RML specifications for the past few years, consolidating and refining various [R2]RML extensions to support various use cases. This new specification involved scholars and practitioners in one or more of RML's modules. These modules were independently specified (vocabulary, SHACL shapes, and test cases). Moreover, participants in the Knowledge Graph Construction Workshop Challenge usually adapted their existing implementations, which have been developed (often to support research projects) to support specific problems (e.g., rewriting mappings and distributed processing). Rather than starting from existing implementations, which come with an inherent bias, we propose developing an RML Processor from scratch to avoid this bias. This engine aims to support the new RML specification while not being influenced by the prior [R2]RML implementations. We report on implementing the Basic and Unassuming RML Processor (BURP) and the current state of RML compliance. While the impact of BURP has been reported in more detail elsewhere, we hope that BURP will become the reference implementation for other implementations.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Over the past few years, the Knowledge Graph Construction community has dedicated significant effort to consolidate and refine various [R2]RML extensions into a new, comprehensive RML specification <ref type="bibr" target="#b0">[1]</ref>. This endeavor involved collaboration between scholars and practitioners across multiple RML modules (vocabulary, SHACL shapes, and test cases) and aimed to address a broader range of use cases. Notably, participants often adapted existing implementations developed for specific research projects to address unique challenges (e.g., optimizing the RDF generation process by rewriting mappings and distributed processing of RDF generation). Tools often rewrite or extend implementation and aim to ensure some form of backward compatibility. RMLMapper 1 , for instance, now supports R2RML <ref type="bibr" target="#b1">[2]</ref>, the original RML <ref type="bibr" target="#b2">[3]</ref>, its various extensions, and now the new RML-core specification. One could argue that this complicates things.</p><p>In contrast to this approach, we propose the development of an RML Processor from scratch, unburdened by the inherent biases<ref type="foot" target="#foot_0">2</ref> of prior [R2]RML implementations. The Basic and Unassuming RML Processor (BURP) is designed to support the new RML specification. While the impact of BURP has been documented elsewhere <ref type="bibr" target="#b3">[4]</ref>, we anticipate that it will serve as a baseline for future RML implementations. This paper reports on the implementation of BURP and assesses its current compliance with the RML specification, which was covered by Track 1 of the KGC Workshop's Challenge. <ref type="foot" target="#foot_1">3</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Implementation</head><p>The development of BURP was initially driven by the lack of an RML-CC implementation, which consolidated and expanded ideas presented in <ref type="bibr" target="#b4">[5]</ref> and <ref type="bibr" target="#b5">[6]</ref>. Implementing RML-CC, which allows for values to be aggregated into RDF Containers and Collections within and across iterations, required drastic changes in existing codebases of RML implementations. Moreover, we discovered inconsistencies within and across RML modules. Thus, an implementation started from scratch seemed an adequate approach to developing this RML Processor to avoid these inconsistencies and evaluate the community's decisions in the new RML modules.</p><p>"Basic" refers to the implementation following steps that are inspired by R2RML's reference algorithm: it uses simple data structures, relies on nested loops, does not rewrite mappings, does not use concurrent or distributed programming techniques, etc. BURP assumes that all data to be transformed fits in the machine's memory. The algorithm is deliberately kept simple, as introducing the aforementioned techniques may come at a cost. For example, the aggregation of results in a distributed environment relies on associative monoids, which affect the results yielded by Gather Maps. "Unassuming" means that the specification, the shapes, and the test cases solely drive the development of this processor and no other assumptions w.r.t. the mappings and data are made to optimize the RDF generation process.</p><p>BURP follows simple steps to generate RDF, similar to the R2RML reference algorithm. The code and RDF generation algorithm is deliberately kept simple to help RML Processors' developers implement the new RML specifications. BURP uses simple data structures to store all data in memory. Moreover, BURP does not try to recover from or correct errors. When an error occurs, BURP merely outputs an error message to the user indicating what went wrong and, if possible, where it went wrong. Then, it exits with a non-zero exit code. BURP will not even try to generate partial results. Such a feature is potentially and arguably desirable in industry settings as one only needs to rerun the failed mappings. Still, we deem this beyond the scope of a reference implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Compliance</head><p>BURP currently fully supports RML-Core, RML-CC, and RML-FNML. It also supports some functionality of RML-IO 4 . Complete support for RML-IO and RML-Star is planned for the latter part of 2024.</p><p>BURP passes 100% of the RML-Core and RML-CC features and 92% of RML-FNML. The only failed test is RMLFNOTC0000-CSV6. This test relies on generating a UUID via a function that takes no input. The generation of the same UUID is practically unlikely, and the test thus always fails. Given that this test was ill-conceived, we deem that we cover the RML-FNML specification regarding the test cases.</p><p>We noticed issues when running RML-IO test cases; some tests relied on outdated RML, and others contained mistakes. RMLSTC0006c, for instance, relied on an endpoint that was not configured properly. As such, BURP allowed us to determine what went wrong. BURP passes 78% percent of RML-IO source test cases, with the ones failing related to different vocabularies to download data and CSVW dialects. Only one of the RML-IO target test cases passes (2%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion: Are We Sufficiently Conservative?</head><p>It is important to remember that the specifications and test cases mostly drive the development of BURP. This has led to the various RML modules becoming more integrated (e.g., the SHACL shapes must consider the various RML Term Maps and Expression Maps).</p><p>Several questions can be raised concerning the coverage of our test cases. Some examples that can be mentioned are: 1) How can we ensure that we have covered most (if not all) combinations of the various modules? 2) RML-FNO uses rml:inputValueMap to link an input with a Term Map. Some Term Maps have a Graph Map (e.g., Subject Maps), how does that impact the Predicate Object Maps with Graph Maps? 3) Quoted triples can be included in RDF Containers and Collections, but what is the expected behavior when RML Quoted Triples Maps are also used as a Gather Map?</p><p>We recognize that some of these corner cases seem far-fetched, but they require documentation. The behavior can be left to the implementation, but our opinion is to propose documenting the expected behavior via notes and test cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>We reported on the implementation of a simple RML Processor 'BURP' that was initially driven by a need for an RML-CC-compliant processor and turned out to be an exercise to test the validity of various RML modules throughout the Knowledge Graph Construction Workshop Challenge.</p><p>This RML Processor, which we dubbed BURP, is not (necessarily) intended as a productionready tool as everything is processed in memory. No effort was spent on optimizing the RDF generation process (no mapping rewriting, no parallel processing, etc.). We hope that BURP will become the community's reference implementation and sandbox for further research. 4 RML Logical Sources are supported, and RML Logical Target is under development</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">I.e, relying on codebases developed for (or starting from) different mapping languages, [R2]RML dialects, and/or developed for specific use cases.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://kg-construct.github.io/workshop/2024/challenge.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">https://www.ugent.be/en/research/funding/bof/overview.htm</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Dylan Van Assche is supported by the Special Research Fund of Ghent University 5 under grant BOF20/DOC/132. The collaboration of Dylan Van Assche and Christophe Debruyne is stimulated by the KG4DI FWO scientific research network (W001222N).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The RML ontology: A communitydriven modular redesign after a decade of experience in mapping heterogeneous data to RDF</title>
		<author>
			<persName><forename type="first">A</forename><surname>Iglesias-Molina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Van Assche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arenas-Guerrero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>De Meester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Debruyne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jozashoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Maria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2023 -22nd International Semantic Web Conference</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">November 6-10, 2023. 2023</date>
			<biblScope unit="volume">14266</biblScope>
			<biblScope unit="page" from="152" to="175" />
		</imprint>
	</monogr>
	<note>Proceedings, Part II</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sundara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<ptr target="http://www.w3.org/TR/r2rml/" />
		<title level="m">R2RML: RDB to RDF Mapping Language, Working Group Recommendation, World Wide Web Consortium (W3C)</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">RML: A generic language for integrated RDF mappings of heterogeneous data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014)<address><addrLine>Seoul, Korea</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-04-08">April 8, 2014. 2014</date>
			<biblScope unit="volume">1184</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">BURPing Through RML Test Cases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Van Assche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Debruyne</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Workshop on Knowledge Graph Construction (KGCW 2024) co-located with 19th Extended Semantic Web Conference (ESWC 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Serles</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Van Assche</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Iglesias-Molina</surname></persName>
		</editor>
		<meeting>the 5th International Workshop on Knowledge Graph Construction (KGCW 2024) co-located with 19th Extended Semantic Web Conference (ESWC 2024)<address><addrLine>Hersonissos, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2024-05-27">May 27, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Extending R2RML with support for RDF collections and containers to generate MADS-RDF datasets</title>
		<author>
			<persName><forename type="first">C</forename><surname>Debruyne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mckenna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>O'sullivan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research and Advanced Technology for Digital Libraries -21st International Conference on Theory and Practice of Digital Libraries, TPDL 2017</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Thessaloniki, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">September 18-21, 2017. 2017</date>
			<biblScope unit="volume">10450</biblScope>
			<biblScope unit="page" from="531" to="536" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Translation of Relational and Nonrelational Databases into RDF with xR2RML</title>
		<author>
			<persName><forename type="first">F</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Djimenou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Faron-Zucker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Montagnat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WEBIST 2015 -Proceedings of the 11th International Conference on Web Information Systems and Technologies</title>
				<meeting><address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<publisher>SciTePress</publisher>
			<date type="published" when="2015-05-22">20-22 May, 2015. 2015</date>
			<biblScope unit="page" from="443" to="454" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
