<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ONETT: Systematic Knowledge Graph Generation for National Access Points</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">David</forename><surname>Chaves-Fraga</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adolfo</forename><surname>Antón</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jhon</forename><surname>Toledo</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Oscar</forename><surname>Corcho</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Ontology Engineering Group</orgName>
								<orgName type="institution">Universidad Politécnica de Madrid</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ONETT: Systematic Knowledge Graph Generation for National Access Points</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B0BE32784BBEF543DC84B01D15EB07DD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T19:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Transmodel</term>
					<term>GTFS</term>
					<term>NAP</term>
					<term>RML</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe our implemented approach for the usage and exploitation of declarative mappings for the publication of open transport data from transport authorities and operators into an ontology based on Transmodel. This allows a homogeneous representation of transport data across EU transport-related organisations and minimises the need to understand ad-hoc heterogeneous representation formats for transport data as currently published by them. We show how we create and use RML mappings for the specific case of transforming GTFS data into a Transmodel-based ontology. In the future, such data may be further transformed into other formats such as NeTEx.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Transport data is being currently published by transport authorities and operators in many different formats, some of which are well-known de-facto standards, such as the General Transit Feed Specification or GTFS, and some others are adhoc data formats whose structure is decided by the data publisher (e.g., current datasets and APIs published by Empresa Municipal de Transportes de Madrid in its open data portal<ref type="foot" target="#foot_0">1</ref> , tram information in Zaragoza<ref type="foot" target="#foot_1">2</ref> , etc.)</p><p>All of these datasets have similarities, associated to the fact that they are describing overlapping sets of information (schedules, stops, vehicles, lines, etc.). They are also made available, commonly, using tabular data formats. For example, GTFS feeds are essentially zip-compressed files containing sets of CSV files following the GTFS specification. And other data sources such as those mentioned above as examples provide the data either in CSV or JSON.</p><p>Having all this data available in a homogeneous manner would actually reduce the total cost of reusing data sources, especially across operators/authorities and cities/regions. That is, developers may be able to develop one application that would be deployable in any city in the world with minor adaptations. This is already happening with GTFS, which is not only being used by Google Maps to provide data about transport infrastructure, but also for route planning, but also by other route planners, such as Navita.io and OpenTripPlanner.</p><p>To achieve this homogeneity, there are several options that may be followed:</p><p>-Transport authorities and operators may agree on using the same data format and hence publish according to such data format. They know well the type of data that they handle, the quality properties on such data, etc., so they should be able to provide this data easily. Transformations may be done programmatically (that is, with ad-hoc code) or declaratively (using mappings in existing languages like R2RML <ref type="bibr" target="#b1">[2]</ref> or RML <ref type="bibr" target="#b2">[3]</ref>).</p><p>In this paper, we present our work on ensuring that declarative mappings can be used for the purpose of transforming transport data published by transport authorities and operators into a homogeneous representation based on Transmodel (the reference data model for public transport at European level, which will be further described in section 2). This data can then be further transformed into NeTEx so as to comply with the EU regulations for the publication of transport-related data in National Access Points.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Transmodel Ontology and GTFS</head><p>In its drive to foster interoperability across Europe, the EU is requiring each Member State to allow access to transportation data via a National Access Point (NAP). According to the EU Regulation 2017/1926, all transportation authorities, transport operators and infrastructure managers must provide static and dynamic data in specific data formats (e.g., NeTEx, SIRI). -the EU Regulation applies to different transportation modes, including air, train, road vehicle, bus, ferry, metro, tram, shuttlebus, car-sharing, car-pooling and bike-sharing.</p><p>Transmodel is the European Reference Data Model for Public Transport. It provides a conceptual model of common public transport concepts and data structures that can be used to build many different kinds of public transport information system such as timetabling, fares, operational management, These parts or sections are usually developed by different standards or specific data formats. One of the most relevant implementations is NeTEx, which covers partially some features of the parts CC, NT, ND, FM and PI. NeTEx releases the 2017/1926 EU Regulation (May 2017) where the European Commission recognized NeTEx as a strategic standard for the cross-border exchange of data. The first step must be taken before December 2019 when every European country must provide data available in NeTEx format at National Access Points to allow EU-wide multi-modal travel information services.</p><p>The General Transit Feed Specification (GTFS) is a de-facto standard for representing public transport data, a collection of at least five required, two optional required and up to fifteen CSV files (with extension .txt and preferably encoded as UTF-8) contained within a compressed file to describe a transit scheduled operations system. The aim of GTFS is providing at least trip-planning functionality. It defines the headers and a set of rules that must be taken into account when the dataset is created. Each file, as well as its headers, can be mandatory or optional and they have relations among them. The specification supports the representation of several public transport features such as trips, routes, stops, times, fares or calendar.</p><p>In order to provide a better GTFS to NeTEx conversion and further full data interoperability, we start to build up a Transmodel Ontology. The development is released in a github repository <ref type="foot" target="#foot_2">3</ref> where every material generated is upload about the different activities carried out during the development (i.e., use cases, user stories, glossary of terms, etc.). Based on the Transmodel base URI proposed by the CEN Transmodel working group model<ref type="foot" target="#foot_3">4</ref> and its documentation<ref type="foot" target="#foot_4">5</ref> we develop the corresponding ontology following the NeOn methodology <ref type="bibr" target="#b5">[6]</ref>. Before performing the transformation from GTFS to the ontology based format of Transmodel, we analyse the relationship between the two standards. For example, in Table <ref type="table" target="#tab_1">1</ref> we show the relation between the properties of Agency in the GTFS model with the corresponding property in Transmodel (Authority). The Open NEtwork of public Transport application (ONETT) <ref type="foot" target="#foot_5">6</ref> uses Semantic Web technologies to perform a systematic knowledge graph generation in the transport domain. More in detail, ONETT applies the concept of Ontology Based Data Access (OBDA) <ref type="bibr" target="#b4">[5]</ref>, which it aims at providing a unified view and common access to a set of data sources, using ontologies and mappings. In this specific case, we create a general mapping between the full specification of GTFS<ref type="foot" target="#foot_6">7</ref> and ontology based Transmodel using the RML specification in its YARRRML <ref type="bibr" target="#b3">[4]</ref> serialization. Before running the transformation, we have to perform a mapping translation <ref type="bibr" target="#b0">[1]</ref> process to adapt the general mapping to the input data as it is not always going have the same structure and number of files due the naturalness of GTFS. Thanks to the simplicity of YARRRML serialization, the translation process is done in a efficient and simple manner. The workflow of the application is shown in Figure <ref type="figure" target="#fig_1">1</ref> where the SDM-RDFizer<ref type="foot" target="#foot_7">8</ref> engine for RML mappings is integrated in the application to perform the transformations of the input data in CSV to RDF. More in detail, the steps following by ONETT for generating the desirable RDF knowledge graph based on the Transmodel ontology from a GTFS feed are:  These steps are a black box for the transport authorities that want to obtain the knowledge graph from their GTFS feeds. Using the web application the user only has to upload the compressed feed or provide a URL and ONETT generates automatically the corresponding knowledge graph. With this approach, we provide a useful tool to generate National Access Point complaint data from a de-facto standard and very popular data format in a systematic manner.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions and Future Work</head><p>The availability of homogeneous transport data from worldwide transport authorities and operators gives us the possibility of creating new types of applications related to transport (trip planners, fare calculators, ticket recommenders, etc.) that can be deployed easily in different regions or cities. In this paper, we have shown our approach to create such homogeneous transport data based on declarative mappings that can be used to generate transport knowledge graphs for any region or city in the world that is currently publishing data in GTFS. The mappings allow transforming GTFS data into RDF according to a TransModelbased ontology. Such data can be queried in a homogeneous manner so that the aforementioned applications can be created more easily.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>realtime data, journey planning. It is divided into eight different sections or Parts: Common Concepts (CC), Public Transport Network Topology (NT), Network Description (ND), Operations Monitoring &amp; Control (OM), Fare Management (FM), Passenger Information (PI), Driver Management (DM), Management Information &amp; Statistics (MI).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>1 .</head><label>1</label><figDesc>Analyse the input data: It decompresses and analyses the input GTFS feed to understand the files and the structure of each file (headers). 2. Mapping translation: It takes the general GTFS YARRRML mapping that represents the full specification and generates a new mapping corresponding to the input data. 3. Knowledge Graph Generation: It runs the SDM-RDFizer engine to transform the raw data to RDF.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. ONETT. The ONETT workflow for the systematic generation of Knowledge Graph following Transmodel from GTFS feeds.</figDesc><graphic coords="4,186.64,418.10,242.08,141.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Example of relation among GTFS properties and Transmodel Ontology</figDesc><table><row><cell>GTFS</cell><cell>Transmodel (Ontology)</cell></row><row><cell>Agency name</cell><cell>https://w3id.org/transmodel/terms#authorityName</cell></row><row><cell>Agency url</cell><cell>https://w3id.org/transmodel/terms#authorityUrl</cell></row><row><cell cols="2">Agency timeZone https://w3id.org/transmodel/terms#authorityTimezone</cell></row><row><cell>Agency lang</cell><cell>https://w3id.org/transmodel/terms#authorityLang</cell></row><row><cell cols="2">3 The ONETT Demo</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://opendata.emtmadrid.es/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.zaragoza.es/sede/servicio/catalogo/327</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/oeg-upm/transmodel-ontology</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://w3id.org/transmodel/terms#</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://www.transmodel-cen.eu/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://osoc-es.github.io/onett/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://github.com/osoc-es/onett-back/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://github.com/SDM-TIB/SDM-RDFizer</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work is partially supported by EIT Digital under Grant Agreement "No. EIT/ EIT DIGITAL/SGA2019/1 through action: SNAP" and by the Spanish Ministerio de Economa, Industria y Competitividad and EU FEDER funds under DATOS 4.0: RETOS Y SOLUCIONES -UPM Spanish national project (TIN2016-78011-C4-4-R) and by an FPI grant (BES-2017-082511). Thank you to our open Summer of code 2019 9 students: Luis Pozo, Pablo Castellanos, Marta Retana.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Towards a New Generation of Ontology Based Data Access</title>
		<author>
			<persName><forename type="first">O</forename><surname>Corcho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Priyatna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chaves-Fraga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web Journal</title>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">R2RML: RDB to RDF Mapping Language</title>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sundara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<ptr target=".www.w3.org/TR/r2rml(2012" />
	</analytic>
	<monogr>
		<title level="j">W3C Recommendation</title>
		<imprint>
			<date type="published" when="2012-09-27">27 September 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>LDOW</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Declarative Rules for Linked Data Generation at your Fingertips!</title>
		<author>
			<persName><forename type="first">P</forename><surname>Heyvaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>De Meester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15 th ESWC: Posters and Demos</title>
				<meeting>the 15 th ESWC: Posters and Demos</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Linking data to ontologies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>De Giacomo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenzerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Journal on data semantics X</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="133" to="173" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The neon methodology for ontology engineering</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Suárez-Figueroa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gómez-Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fernández-López</surname></persName>
		</author>
		<ptr target="https://2019.summerofcode.es/" />
	</analytic>
	<monogr>
		<title level="m">Ontology engineering in a networked world</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page">9</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
