=Paper=
{{Paper
|id=Vol-2939/paper6
|storemode=property
|title=Enabling Cross-Border Travel Offers Through National Access Point Federation via Metadata Harmonisation
|pdfUrl=https://ceur-ws.org/Vol-2939/paper6.pdf
|volume=Vol-2939
|authors=Alessio Carenini,Andrea Fiano,Mario Scrocca,Marco Comerio,Irene Celino
|dblpUrl=https://dblp.org/rec/conf/i-semantics/CareniniFSCC21
}}
==Enabling Cross-Border Travel Offers Through National Access Point Federation via Metadata Harmonisation==
Enabling Cross-Border Travel Offers Through National Access Point Federation via Metadata Harmonisation Alessio Carenini , Andrea Fiano , Mario Scrocca , Marco Comerio , and Irene Celino Cefriel, Milan, Italy name.surname@cefriel.com Abstract. Planning cross-border transportation offers requires gather- ing data from multiple transport operators within and outside a country. The European legislation demands each member state to set up a Na- tional Access Point (NAP) for multimodal transport information, never- theless, interoperability in accessing data from different NAPs is far to be accomplished. In this paper, we describe and validate our approach to consolidate metadata coming from different sources using Semantic Web technologies. The presented solution implements an automated ingestion pipeline harmonising metadata from three different European NAPs in a single metadata catalog. Keywords: Metadata Harmonisation · Cross-border Travel Offers · Na- tional Access Points 1 Introduction In the transportation domain, several data and metadata catalogs coexist, each one being maintained by a different initiative or mandated by a specific EU directive or country law. According to the EU Delegated Regulations 2017/1926, 885/2013, 886/2013 and 2015/962, each EU member state has to implement a National Access Points (NAP) to make national transport data discoverable. A NAP is an intermediary digital platform allowing access to traffic and mobil- ity data, and playing a crucial role in data exchange in the field of mobility in Europe. From the point of view of the transport operator looking for mobility- related information, NAPs represent trusted sources of data and metadata, and their content can be reliably used inside their own information systems. A NAP is a web-based portal handling data concerning Safe and Secure Truck Parking’s (SSTP), Real-Time Traffic Information (road) (RTTI), Safety Related Trans- port Information (road) (SRTI) and Multimodal Travel Information (MMTIS) (all modes like train, busses, metro, cycling etc.). EU regulations mandate the us- age of Transmodel-based specifications for the data exchange between transport Copyright © 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 A. Carenini, A. Fiano, M. Scrocca, M. Comerio, and I. Celino operators and their own reference NAP, therefore aiming for data interoperabil- ity. Nevertheless, the regulations don’t specify which metadata should be used to describe datasets, and how a NAP should be implemented. As a result, each member State is implementing its own National Access Point using different metadata schemas and exposing its functionalities via custom APIs [2,5]. This paper describes how we extended a metadata catalog, named Asset Man- ager, to seamlessly support accessing both local digital assets, directly added by users of the Asset Manager, and remote digital assets from multiple National Access Points. Our scenario is based upon the real requirements coming from Trenitalia1 , which wants to create mobility packages to be sold to tour operators bringing tourists to the Milano-Cortina Winter Olympics in 20262 . Creating such mobility packages means locating and accessing timetables of multiple transport operators. Performing this task, even in the case National Access Points are available, is time-consuming and requires checking multiple sources. Consolidat- ing in a single catalog the metadata coming from multiple NAPs, together with metadata provided by Trenitalia, means being able to perform more efficiently the task and better fulfil the mobility needs of tourists heading to the Winter Olympics. Such a scenario requires mapping different NAP metadata schemas onto a single schema and creating multiple metadata ingestion pipelines. The steps which we implemented were the following: i) Metadata schema mapping: the different NAP metadata schemas were con- ceptually mapped onto a single schema, which is also used to describe the local assets. ii) RML transformation rules: the conceptual mappings from the specific NAP schemas to the Asset Manager metadata schema were implemented in RML [3]. iii) (Meta)Data ingestion pipelines: the RML mappings were integrated in data ingestion and transformation pipelines using the Chimera tool3 [6] for their execution. The resulting RDF triples, defining metadata for remote assets, are added to the RDF repository used by the Asset Manager. iv) Exploration API creation: to ease integration in the user interface, Explo- ration APIs were created to wrap the execution of SPARQL queries as APIs. Such Exploration APIs allow obtaining the lists of assets belonging to a spe- cific type and their metadata. By doing this, we harmonised the access to both local and remote assets. v) User interface: the Asset Manager web interfaces, showing the integrated list of assets and their metadata, were updated. To show the implemented approach for NAP metadata harmonisation, we se- lected three different NAPs from France, Belgium and the Netherlands. Since the approach is completely generic, this implementation opens the possibility to use the Asset Manager as an aggregator of multiple trusted metadata sources, 1 https://www.trenitalia.com/ 2 A video describing the scenario and the implemented solution is available at https: //www.youtube.com/watch?v=SoOLheMv1wQ 3 https://github.com/cefriel/chimera National Access Point Federation via Metadata Harmonisation 3 like open data portals, multimodal National Access Points, or other instances of the Asset Manager. In the following sections, we provide details on each step of the approach and meaningful insights about the implemented solution. 2 Metadata schemas for National Access Points The main focus of the NAP regulations is to promote the usage of a specific set of standards, based on Transmodel, across all Europe to improve transport data interoperability. Even though the role of the NAP as a dataset catalog is well-defined by the regulation, each member state is then free to define its own implementation. Such principle led to the appearance of different metadata vocabularies, and the need for interoperability between the metadata schemas adopted by different NAPs. In our scenario, we analysed in detail the National Access Points provided by France, Belgium and the Netherlands. Belgian NAP is built upon CKAN, therefore its API4 allows for searching for datasets according to specific types or features. The metadata schema is quite rich and contains multi-lingual documentation, geographical coverage, and both contact person and responsible transport operator. French NAP features a rich API5 and metadata schema, containing many de- tails about datasets. This National Access Point supports NeTEx representation of static transport data (leveraging on Chouette [4] features), which are made available as Community resource, which are alternative representations of the same main information described in the asset. Also, spatial information about the covered area is provided, allowing for geographical queries. As a last detail, an asset has only a responsible organization and the metadata does not mandate for a contact person. Netherlands NAP has no clear API to obtain metadata, and the actual end- point6 has been found by analyzing the JavaScript sources of the NAP website. The metadata schema mandates both a responsible person for the dataset publi- cation and an owner transport operator company. The referenced dataset is listed with the attribute publicationURL, and no geographical coverage is present (as opposed to France metadata schema). Summarising the analysis of the selected NAPs, they all feature different metadata schemas, and even basic information describing who is responsible for the asset is not represented in the same way. A working group composed of representatives from the Netherlands, Ger- many, Austria and Sweden started to work on common metadata definitions to be applied to the various NAPs in Europe to increase interoperability and ease the creation of multi-country solutions. The outcome of such group is called Co- ordinated Metadata Catalogue 7 and defines a minimum set of metadata which, according to its authors, should be supported in all the NAP implementations. 4 https://www.transportdata.be/api/3 5 https://transport.data.gouv.fr/swaggerui 6 https://nt.ndw.nu/services-spoa/rest/v1/ui/multimodaal 7 https://www.its-platform.eu/highlights/harmonised-metadata-national-access-points 4 A. Carenini, A. Fiano, M. Scrocca, M. Comerio, and I. Celino Using the Coordinated Metadata Catalogue schema allows harmonising those NAP schemas onto a unified schema, as all the basic information contained in the assets coming from the three different NAPs can be represented. 3 Automating Metadata Aggregation from National Access Points The Asset Manager is an RDF-based metadata catalog developed in the context of the Shift2Rail Innovation Programme 48 . We show the possibility to use it as an aggregator of metadata coming from multiple trusted sources. The objective is to let companies accessing domain-specific knowledge in a coherent way using a single tool. We defined and validated an approach based on Semantic Web technologies to perform metadata ingestion, to define and execute mappings to a single metadata schema. Following the European guidelines to represent metadata in data catalogs, the DCAT Application Profile v2.0.1 [7] was selected as metadata schema for the Asset Manager. Our solution leverages on the Asset Manager and the Chimera tool to: (i) con- nect to each NAP, (ii) fetch the metadata of its assets, (iii) convert such metadata into a coherent RDF representation to be easily queried via SPARQL, (iv) store the resulting triples inside the RDF repositories, (v) show that the Asset Man- ager can visualise both local and remote assets. 3.1 Configuring metadata ingestion The first and most important part to configure the NAP metadata ingestion process is understanding the metadata schemas and identify which attributes and data structures can be found in all the different NAPs. We decided to use the Coordinated Metadata Catalogue as an intermediate model to ease the definition of mappings between the metadata schemas. Indeed, the Coordinated Metadata Catalogue specification acknowledges the existence of other vocabularies and already provide an alignment to DCAT-AP adopted by the Asset Manager. We 8 cf. https://shift2rail.org/research-development/ip4/ Fig. 1: Conceptual mappings defined for the harmonisation of the different Na- tional Access Point metadata schema. National Access Point Federation via Metadata Harmonisation 5 Fig. 2: Overall architecture of the implemented solution to integrate metadata from the NAPs (France, Belgium and Netherlands). exploited such alignment, defining a two-step conceptual mapping, as depicted in Figure 1: first from the specific NAP metadata schema onto Coordinated Metadata Catalogue, and then from that schema onto DCAT-AP. The defined conceptual mappings guided the coding of the actual mapping rules using RML9 . Therefore, we assembled a metadata ingestion service exposed through a Chimera pipeline10 . As shown in Figure 2, calling such service triggers the execution of the following actions for each of the countries: (i) the NAP API endpoint is called to obtain JSON metadata; (ii) lifting is performed on the resulting JSON metadata using the appropriate RML mapping rules and obtaining an RDF representation compliant with the DCAT-AP profile; (iii) the resulting RDF triples are written in the RDF repository used by the Asset Manager as a separate RDF graph. 3.2 Accessing metadata from the Asset Manager The Asset Manager arranges assets in categories according to so-called asset types. In the considered scenario, we mapped the items coming from the NAPs metadata ingestion pipelines to the journey planning asset type, which can be used to describe either datasets containing timetables or services provid- ing timetables. Whenever a user asks for viewing the list of journey planning assets, the Asset Manager performs a single SPARQL query11 to retrieve the basic information about each published asset. As can be noticed in Figure 3, when the NAPs metadata ingestion pipeline is activated, the Asset Manager starts showing both local assets and assets coming from National Access Points. This enables users to browse through the consol- idated list of assets and to search for the most interesting ones. Moreover, the information retrieval can be automatised by exploiting the exposed Exploration 9 The developed RML mappings are available at https://github.com/cefriel/ nap-harmonisation/tree/main/rml 10 The configuration of the ingestion service is available at https://github.com/ cefriel/nap-harmonisation/blob/main/chimera-route/camel-context.xml 11 The query is available at https://github.com/cefriel/nap-harmonisation/blob/ main/asset-manager/query-visualise-assets.sparql 6 A. Carenini, A. Fiano, M. Scrocca, M. Comerio, and I. Celino Fig. 3: Visualisation of both local and remote NAP assets in the Asset Manager. API. As a result, this solution can encourage and facilitate the creation of mul- timodal mobility packages providing standardised access to information coming from several metadata sources. 4 Conclusions and Future Works The general availability of the National Access Points throughout Europe will improve interoperability in the transportation domain, as it will force all actors to provide data according to the Transmodel-based specifications dictated by the regulators. We demonstrated that converging to a common set of metadata (such as the one proposed in the Coordinated Metadata Catalogue initiative) enables the possibility to treat the entire network of NAPs as a source of trusted data and metadata which can facilitate the planning of cross-border travel offers. The integration of remote metadata providers (such as the NAPs) in the IT systems of a transport operator is an operation which must carefully follow the data quality assurance and the information lifecycle processes defined inside the company. As future work, we will investigate how to integrate the detection of changes in the metadata acquired from NAPs inside the lifecycle processes of other assets managed by the Asset Manager. Since NAPs will become the au- thoritative source of information in the transportation domain, it is important to promptly detect the availability of new versions of a remote asset used inter- nally by the company (through the Asset Manager) notifying the owners of the dependant applications to check their functionalities and prevent errors. Although based on a declarative approach, our solution exploits an external integration engine to perform the actual calling of the API provided by the NAPs. We will investigate the recent support introduced in RML for Web APIs [1] as an alternative solution to define mappings for different NAPs endpoints in a fully declarative way. National Access Point Federation via Metadata Harmonisation 7 Acknowledgments The presented research was partially supported by the SPRINT project (Grant Agreement 826172) and the RIDE2RAIL project (Grant Agreement 881825), co-funded by the European Commission under the Horizon 2020 Framework Programme. References 1. Assche, D.V., Haesendonck, G., Mulder, G.D., Delva, T., Heyvaert, P., Meester, B.D., Dimou, A.: Leveraging web of things W3C recommendations for knowledge graphs generation. In: Web Engineering - 21st International Conference, ICWE 2021 Proceedings. vol. 12706, pp. 337–352. Springer (2021). https://doi.org/10.1007/978- 3-030-74296-6 26 2. Carenini, A., et al.: SPRINT project Deliverable D2.3 – Requirements for an IF architectural design (F-REL) (2020), http://sprint-transport.eu/ 3. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.: RML: A generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014. CEUR Workshop Proceedings, vol. 1184. CEUR-WS.org (2014), http: //ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf 4. Gendre, P., Denis, Y., Duquesne, C., Bouziane, Z., Bouree, K., Dezou, L., Lemettais, O.: CHOUETTE an open source software for PT reference data exchange. In: 8th European ITS Congress Lyon (2011) 5. Mylonas, C., Mitsakis, E., Dolianitis, A., Aifadopoulou, G.: A review of european national access points for intelligent transport systems data. In: 23rd IEEE International Conference on Intelligent Transportation Systems, ITSC 2020, Rhodes, Greece, September 20-23, 2020. pp. 1–8. IEEE (2020). https://doi.org/10.1109/ITSC45102.2020.9294463 6. Scrocca, M., Comerio, M., Carenini, A., Celino, I.: Turning transport data to comply with EU standards while enabling a multimodal transport knowledge graph. In: Proceedings of the 19th International Semantic Web Conference. vol. 12507, pp. 411–429. Springer (2020). https://doi.org/10.1007/978-3-030-62466-8 26 7. Van Nuffelen, B.: DCAT Application Profile for data portals in europe (DCAT-AP) v2.0.1. Tech. rep., SEMIC (2020), https://joinup.ec.europa. eu/collection/semantic-interoperability-community-semic/solution/ dcat-application-profile-data-portals-europe/release/201-0