<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Extending R2RML to a source-independent mapping language for RDF</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Anastasia</forename><surname>Dimou</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ghent University -iMinds -Multimedia Lab</orgName>
								<address>
									<addrLine>Gaston ; Crommenlaan 8, bus 201</addrLine>
									<postCode>B-9050</postCode>
									<settlement>Ledeberg-Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Miel</forename><forename type="middle">Vander</forename><surname>Sande</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ghent University -iMinds -Multimedia Lab</orgName>
								<address>
									<addrLine>Gaston ; Crommenlaan 8, bus 201</addrLine>
									<postCode>B-9050</postCode>
									<settlement>Ledeberg-Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pieter</forename><surname>Colpaert</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ghent University -iMinds -Multimedia Lab</orgName>
								<address>
									<addrLine>Gaston ; Crommenlaan 8, bus 201</addrLine>
									<postCode>B-9050</postCode>
									<settlement>Ledeberg-Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Erik</forename><surname>Mannens</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ghent University -iMinds -Multimedia Lab</orgName>
								<address>
									<addrLine>Gaston ; Crommenlaan 8, bus 201</addrLine>
									<postCode>B-9050</postCode>
									<settlement>Ledeberg-Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rik</forename><surname>Van De Walle</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Ghent University -iMinds -Multimedia Lab</orgName>
								<address>
									<addrLine>Gaston ; Crommenlaan 8, bus 201</addrLine>
									<postCode>B-9050</postCode>
									<settlement>Ledeberg-Ghent</settlement>
									<country key="BE">Belgium</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Extending R2RML to a source-independent mapping language for RDF</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">53F41895596D007260493486ADC48690</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T06:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Although reaching the fifth star of the Open Data deployment scheme demands the data to be represented in RDF and linked, a generic and standard mapping procedure to deploy raw data in RDF was not established so far. Only the R2RML mapping language was standardized but its applicability is limited to mappings from relational databases to RDF. We propose the extension of R2RML to also support mappings of data sources in other structured formats (indicatively CSV, TSV, XML, JSON). Broadening further its scope, the focus is put on the mappings and their optimal reuse. The language becomes sourceagnostic, and resources are integrated and interlinked at a primary stage.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Today, the idea of the (Linked) Open Data is widely spread and adopted. However, while reaching the fourth star of the Open Data deployment scheme<ref type="foot" target="#foot_0">1</ref> is easily attainable, achieving the fifth demands a well-considered approach and significantly greater effort. Current solutions are either highly customized to each case's specific needs or they follow a schematic and/or syntactic mapping approach. This fails to fully depict the semantics as it remains tied to the source file's structure. To this end, only R2RML<ref type="foot" target="#foot_1">2</ref> became a W3C recommendation aiming to formalize the mappings from relational databases to RDF (RDB2RDF). In practice though, one publishes data available in different source formats which, in turn, requires a more generic approach.</p><p>A generic language that maps the data independently of the source structure (schema-agnostic) and puts the focus on the mappings is a prominent advancement. Thereby, one deals with all different source files in a uniform way; in contrast with other languages that handle the mappings of different source formats separately. Therefore, the initial learning costs remain limited and the potential for the custom-defined mapping's reuse augments. As a result, the per-file mapping model followed so far gets surpassed, leading to contingent data integration and interlinking at a primary stage. In this paper, we propose an extension of the R2RML aiming to broaden its scope to cover also mappings from different structured data formats -CSV, TSV, XML and JSON files-to RDF.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">State of the art</head><p>Beyond R2RML which has already several implementations 3 , other RDB2RDF mapping languages were defined <ref type="bibr" target="#b0">[1]</ref>. In the same context, there are corresponding languages to support CSV-to-RDF mappings (CSV2RDF), e.g., the XLWrap's mapping language <ref type="bibr" target="#b1">[2]</ref>, the Mapping Master's M2 <ref type="bibr" target="#b2">[3]</ref> and Vertere 4 . On the other hand, in the case of mappings from XML to RDF (XML2RDF), the different tools rely mostly on existing XML solutions. To be more precise, XSLT-based approaches were explored, as the Krextor <ref type="bibr" target="#b3">[4]</ref> and the AstroGrid-D 5 mapping tools, while other implementations deploy mappings using XPath and XQuery, e.g., the Tripliser 6 and the XSPARQL <ref type="bibr" target="#b4">[5]</ref>. These solutions for XML sources lead to mappings on the syntactic level rather than on the semantic level or fail to provide a solution applicable to a broader domain. Beyond the standard Extract-Map-Load (EML) mappings, dynamic query translation was also explored, e,g, in the case of Tarql 7 (CSV2RDF) and XSPARQL (mapping and integration of XML, RDB and RDF resources).</p><p>In general, most tools deploy mappings from a certain source format to RDF (source-centric approaches). There are only a few tools that provide mappings from various source formats to RDF -DataLift <ref type="bibr" target="#b5">[6]</ref>, the DataTank <ref type="bibr" target="#b6">[7]</ref>, Karma <ref type="bibr" target="#b7">[8]</ref>, Open Refine 8 and Virtuoso Sponger 9 are the most well known-but only the DataTank uses a mapping language. For the latter's needs, Vertere was extended not only to cover CSV2RDF mappings but mappings from other structured data sources as well, namely databases, XML and JSON. Since R2RML became a W3C standard and due to its analogous nature to Vertere, the extension of R2RML is considered a prominent solution and its applicability verified.</p><p>3 Extending R2RML for a more generic use An extension of the R2RML language is proposed, aiming to broaden its scope beyond RDB2RDF mappings, to cover every structured data format (a Global-As-View approach), and to address the limitations of existing languages. The R2RML's RDF graphs are used to express mappings independently of the source format. Therefore, the same custom mappings are reused whether the source files are in the same format or not, only by redetermining the references to the source values to be mapped, as the expected custom mapping definitions remain the same. The vocabulary extending the R2RML is available at http://mmlab.be/users/andimou/rml.ttl. The expansion is achieved as follows: Extending resources' mapping. In the same context, term maps are extended to generate RDF terms from any logical resource, either this is a table row, an XML element or a JSON object. The column-valued term map is extended to cover every resource term map. Therefore, the R2RML's rr:column property becomes a sub-property of the rml:resource which is a valid column name for relational databases and CSV files, a valid XPath expression for an XML node's or attribute's absolute path and a valid path pattern in JavaScript syntax for objects in JSON source files, as in the aforementioned example.</p><p>Multiple entities per row. Most of the mapping languages (including R2RML) follow the entity-per-row model and consider that each row's RDF triples are mapped to the same subject. In its extended version, R2RML can map sets of columns to different subjects, which are then related among each other with a predicate-object triples map. For example, a row may have several columns with information about an event and a few of them refer to its location, e.g., latitude and longitude. Using this single row a triples map may be defined for the event while another triples map may be defined for the location where the event takes place (this mapping definition might be reused for other locations' mapping) and the two of them are related with a predicate-object triples map, as in the following example: Extended the logical sources. According to the rr:sqlQuery, the rml:xmlQuery is adapted and both are sub-properties of rml:query to serve a query against a source file. In the same context the rml:queryLanguage is defined to determine which language is used (indicatively, a W3C standard in the case of XML).</p><p>Integrated mapping. Extending the reference object map, one can use the subjects of another triples map as the objects generated by a predicate-object map. Since the triples maps may be based on different logical sources, the potential to create triples based on integrated sources emerges. At the aforementioned event example, an element node may refer to the number of the bus going to the event location, but the bus names are associated to the bus numbers at a separate table which is mapped by another triples map. The mappings of both of them are defined and a predicate-object terms map may be used to relate them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions and Future Work</head><p>A generic mapping language is proposed to handle the mappings from different source formats to RDF. The uppermost goal of such an extension is to keep the focus on the mappings to be expressed rather than on the data and their original structure. With this work, we bring into discussion its feasibility, possible barriers and aspects that should be taken into consideration. In the future the arising generic mapping language will be used at the DataTank, instead of Vertere, to cover mappings from different source formats to RDF and, in the same time, to confront with the standard mapping language for the RDB2RDF mappings.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Extending RDF triples mapping. Triples map is extended not only to map each row in the logical table, but each resource in the logical source. To this end, the rr:logicalTable and rr:tableName become a sub-property of rml:logicalSource and rml:sourceName respectively, while rr:elementName for XML sources and rr:objectName for JSON sources are introduced. In the example, books is a logical table's, a JSON object's or an XML element's name.</figDesc><table><row><cell>&lt;# RDB_CSV_map &gt; rml : logicalSource [ rr : tableName " BOOKS " ];</cell></row><row><cell>rr : subjectMap [ rr : template " http :// data . example . com / books /{ ISBN }" ];</cell></row><row><cell>rr : p r ed i ca te O bj ec t Ma p [ rr : predicate ex : id ; rr : objectMap [ rml : resource " ID " ] ].</cell></row><row><cell>&lt;# XML_map &gt; rml : logicalSource [ rml : elementName "/ books " ];</cell></row><row><cell>rr : subjectMap [ rr : template " http :// data . example . com / books /{ book / ISBN }"];</cell></row><row><cell>rr : p r ed i ca te O bj ec t Ma p [ rr : predicate ex : id ; rr : objectMap [ rml : resource " book / ISBN@id " ] ].</cell></row><row><cell>&lt;# JSON_map &gt; rml : logicalSource [ rml : objectName " books " ];</cell></row><row><cell>rr : subjectMap [ rr : template " http :// data . example . com / books /{ book . ISBN }"];</cell></row><row><cell>rr : p r ed i ca te O bj ec t Ma p [ rr : predicate ex : id ; rr : objectMap [ rml : resource " book . id " ] ].</cell></row><row><cell>3 http://www.w3.org/2001/sw/rdb2rdf/wiki/Implementations</cell></row><row><cell>4 https://github.com/knudmoeller/Vertere-RDF</cell></row><row><cell>5 http://www.gac-grid.de/project-products/Software/XML2RDF.html</cell></row><row><cell>6 http://daverog.github.io/tripliser/</cell></row><row><cell>7 https://github.com/cygri/tarql</cell></row><row><cell>8 http://openrefine.org/</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://5stardata.info</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://www.w3.org/TR/r2rml</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A comparison of RDB-to-RDF mapping languages</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Reif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">C</forename><surname>Gall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International Conference on Semantic Systems. I-Semantics &apos;11</title>
				<meeting>the 7th International Conference on Semantic Systems. I-Semantics &apos;11<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="25" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">XLWrap -Querying and Integrating Arbitrary Spreadsheets with SPARQL</title>
		<author>
			<persName><forename type="first">A</forename><surname>Langegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wöß</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Semantic Web Conference. ISWC &apos;09</title>
				<meeting>the 8th International Semantic Web Conference. ISWC &apos;09<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="359" to="374" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Mapping Master: a flexible approach for mapping spreadsheets to OWL</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>O'connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Halaschek-Wiener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Musen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Semantic Web Conference on The Semantic Web -Volume Part II. ISWC&apos;10</title>
				<meeting>the 9th International Semantic Web Conference on The Semantic Web -Volume Part II. ISWC&apos;10<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="194" to="208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Krextor -an extensible framework for contributing content math to the Web of Data</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lange</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th Calculemus and 10th international conference on Intelligent computer mathematics. MKM&apos;11</title>
				<meeting>the 18th Calculemus and 10th international conference on Intelligent computer mathematics. MKM&apos;11<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="304" to="306" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Mapping between rdf and xml with xsparql</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bischof</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Krennwallner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Polleres</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal on Data Semantics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="147" to="185" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Enabling Linked Data publication with the Datalift platform</title>
		<author>
			<persName><forename type="first">F</forename><surname>Scharffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Atemezing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gandon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Villata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hamdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bihanic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Képéklian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cotton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Euzenat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">Y</forename><surname>Vandenbussche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vatant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AAAI workshop on semantic cities</title>
				<meeting>AAAI workshop on semantic cities<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The DataTank: an open data adapter with semantic output</title>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Van Deursen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">21st International Conference on World Wide Web, Proceedings</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Karma: A system for mapping structured sources into the Semantic Web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Knoblock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Goel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taheriyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Muslea</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">9th Extended Semantic Web Conference (ESWC2012)</title>
				<imprint>
			<date type="published" when="2012-05">May 2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
