A SKOS Taxonomy of the UN Global Geospatial Information Management Data Themes? Beyza Yaman,1 Kevin Thompson2 and Rob Brennan1 1 ADAPT Centre, Dublin City University, Dublin, Ireland 2 Ordnance Survey Ireland, Dublin, Ireland {beyza.yaman,rob.brennan}@adaptcentre.ie, kevin.thompson@osi.ie Abstract. Complex data domains increase the difficulty of structuring, sharing, discovering and governing information. For the geospatial do- main common models such as INSPIRE have been established in the European Union. The United Nations initiative on Global Geospatial Information Management (UN-GGIM) draws together national and re- gional capacities. Interoperability is the main principle behind these ini- tiatives. Nonetheless there is a lack of published research to date on map- ping agency geospatial linked data leveraging the UN-GGIM taxonomy of information management data themes. Thus, we have identified use cases and defined a Simple Knowledge Organization System (SKOS)[3] taxonomy expressing the UN-GGIM data themes for national spatial in- frastructure. This has been applied in a metadata generation and report- ing tool for Ordnance Survey Ireland (OSi) which underpinned improved governance and reporting infrastructure in OSi. This demonstrated the contribution of Semantic Web technology to spatial data governance as well as its importance for data publishing. This paper presents a doc- umented open license SKOS taxonomy for the UN-GGIM data themes that follows Linked Data best practices. It provides a set of three use cases, an overview of UN-GGIM theme definitions and an example ap- plication of the taxonomy for deployment in OSi for DCAT metadata generation and data publishing pipeline reporting. 1 Introduction Geospatial data is essential in part due to its importance in social, economic, and environmental policy formation and decision making. In the geospatial data domain, large organizations -especially, ones working at a National level- have to confront high data heterogeneity due to the need to collect, analyze and share information within national, regional and global policy frameworks. Interopera- ble aggregation and reporting of this geospatial statistical information is key to avoid data quality problems and to streamline data transfers and management. Interoperability requires common vocabularies, models, meta-data and interfaces for creating, reporting on and curating geospatial statistical data. ? Copyright © 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 B. Yaman et al. In these circumstances, developing standards and norms becomes crucial to providing a coherent service infrastructure and to create meaningful relations among data integrated from multiple organizations. In order to boost countries’ geospatial data activities, the United Nations has developed a framework called the Global Geospatial Information Management (UN-GGIM) Data Themes [4]. These themes are a minimum set of concepts to be used to label datasets in order to enable interoperability between national mapping agencies. They form a foundation to support global geospatial information management and an in- tegrated geospatial information framework to strengthen geospatial information sharing among other global initiatives[4]. They are similar in concept to the EU INSPIRE data themes but with global scope. However, to date it was seen that no paper was published establishing a geo Linked Data approach to supporting UN-GGIM metadata and no documented UN-GGIM data themes use cases for Semantic Web technology. Thus this paper explores the research question of to what extent a geo Linked Data approach could increase the interoperability of geospatial data by modelling UN-GGIM concepts and using standardized tools to report on national geospa- tial data production in terms of UN-GGIM data themes in an understandable manner. In order to solve this problem, we created a taxonomy for the UN- GGIM Data Themes to classify the data and provide a meaningful relationship among the data. The contribution of this paper is a new SKOS vocabulary [3] that can be used to enhance geospatial datasets with the UN-GGIM data themes concepts to increase the interpretability of the data. The remainder of this paper is structured as follows: Section 2 discusses Related Work and Section 3 describes the Use Case for Ordnance Survey Ire- land (OSi) deployment. Section 4 introduces the UN-GGIM Data Themes and Section 5 demonstrates the UN-GGIM taxonomy development and data theme solution for the OSi National Map dataset (Prime2). Finally, the paper provides conclusions in Section 6. 2 Related Work The INSPIRE Data Themes3 are the first set of classification themes imple- mented for geospatial data within the scope of the INSPIRE Directive in the EU. An RDF model version has been developed for the description of the data themes and provided to end-users such as national mapping agencies so that ap- propriate descriptive metadata could be generated for the datasets. The model consists of 34 INSPIRE data themes which makes the model more detailed than UN-GGIM themes. The UN-GGIM Secretariat has provided an interactive web interface im- plemented by the ArcGIS tool employing the UN-GGIM data themes4 . The interface demonstrates the use case examples for statistical reporting based on UN-GGIM data themes and how they can be used in the context. The use cases 3 https://inspire.ec.europa.eu/theme 4 https://www.arcgis.com/apps/Cascade/index.html?appid=4741ad51ff7a463d833d18cbcec29fff UN-GGIM Data Themes 3 include population distribution, transport network maps integrating the stan- dardised, fundamental data from national mapping agencies, statistical offices and other institutions. The W3C’s data catalog vocabulary standard, DCAT [1], allows the descrip- tion of data themes and they are proposed as recommended properties by the DCAT application profile (DCAT-AP) for describing a resource (dataset, data service or dataset distribution). The DCAT vocabulary provides two different properties for describing data theme information: the dcat:themeTaxonomy property for the schema to be used for data themes, and the dcat:theme prop- erty for the specific “theme/category” from the available theme options5 . Thus a DCAT record can be used to describe a dataset and associate it with one or more data themes in one or more data theme taxonomies. 3 Use Cases Ordnance Survey Ireland (OSi) is the national mapping agency of Ireland that produces geospatial data. The data is first captured through surveying the land using aeroplanes and then adjusted by surveyors if necessary. This digital data is typically in image or point cloud formats and later converted and stored in an Oracle Spatial and Graph database using the Prime2 model. Prime2 is the object-oriented spatial model of over 50 million spatial objects tracked in time followed by conversion for printing as cartographic products or data sales and distribution by OSi. After going through transformation phases the data is provided in different formats either with open access or with a cost to private end-users, stakeholders, or other governmental institutions. Acting as a source of spatial data, OSi has to comply with several statisti- cal regulations such as National, European Directives like INSPIRE and global statistical reporting like UN agencies. OSi reporting use cases must take into account three different aspects: i) Classification of the data: The end-users need a common understanding of key characteristics of the data they are consum- ing. For instance, an insurance company is interested in subdatasets related to flood risk management, or a post office is interested in addresses. ii) Report- ing to stakeholders: Institutions like the United Nations and the Organisation for Economic Co-operation and Development (OECD) demand the standardiza- tion of geospatial statistical data according to the field of geospatial information management. It is also mandatory for INSPIRE Directive states, to share the country spatial data and metadata created for spatial data sets and services corresponding to the themes through interoperable infrastructures. iii) Dataset metadata generation: Internally the departments need to catalog their data and provide information about suitability for intended uses. This allows high level monitoring of the data including quality scores or provenance. The lack of pro- viding the data and metadata in an interoperable and standardized way can result in failures in the system, reduced co-operation with partners and fines or other penalties for non-compliance. 5 https://www.w3.org/TR/vocab-dcat-2/#Property:catalog themes 4 B. Yaman et al. 4 UN-GGIM Data Themes The UN-GGIM created strategic pathways6 proposing implementation and cus- todianship guidelines for best practices in collection and management of inte- grated geospatial information to establish a global geospatial data framework. This way UN-GGIM supports and guides the geospatial data infrastructure of the member countries through the approach, content, rationale, options and con- siderations, principles that align with actions, and sample outcomes for compar- ison [4]. UN-GGIM Data Themes are the set of prioritized national data themes, aligned to the globally endorsed fundamental geospatial data themes. 14 Global Fundamental Geospatial Data Themes are proposed to be used by the strategic pathways to classify geospatial data. The 14 data themes from UN-GGIM are as follows [UN-GGIM]: Global Geodetic Reference Frame (GGRF), Addresses, Buildings and Settlements, Elevation and Depth, Functional Areas, Geograph- ical Names, Geology and Soils, Land Cover and Land Use, Land Parcels, Or- thoimagery, Physical Infrastructure, Population, Distribution, Transport Net- works, and Water. Theme descriptions can be found in the Global Fundamental Geospatial Data Themes document[2]. Data practitioners can follow these strategic plans and data themes to design, develop, maintain a high standards, high quality and sustainable geospatial data infrastructure. On the other hand, the data themes also allow the data to be analysed and provide statistical results w.r.t. the specific data theme such as population distribution in a specific area, e.g. for policy development. They are specifically important for organizing a country’s geospatial, statistical and other information. Fundamental data themes (e.g. transportation) are required for a broad range of decision-making applications, or application data themes (e.g. flood models) required for specific studies; and socio-economic data themes that provide demographic information, such as census and population data 7 . Integrating data themes information into a data catalogue allows end-users to decide the suitability of the dataset for their purpose. 5 Taxonomy Development This section describes the implementation of the taxonomy for the OSi use case. UN-GGIM Data Themes Taxonomy was designed and implemented as a Linked Data controlled vocabulary to provide common and standardized definitions for managing the data. SKOS concept taxonomy was selected as the most appro- priate modelling language for this task and it corresponds to the DCAT data theme property requirements. A 3 step methodology was followed to create this vocabulary. First, UN-GGIM Data Themes were encoded to RDF to generate the taxonomy in Linked Data format. Second, error checking process was per- formed to validate the data. Third, the generated vocabulary was published on the web. 6 https://ggim.un.org/IGIF/part2.cshtml 7 https://ggim.un.org/IGIF/documents/SP4-Data 10Jan2020 GLOBAL CONSULTATION.pdf UN-GGIM Data Themes 5 unggim−dt : B u i l d i n g s −S e t t l e m e n t s a s k o s : Concept ; skos : broader unggim−dt : DataTheme ; skos : prefLabel ” B u i l d i n g s and S e t t l e m e n t s ”@en ; s k o s : n o t e ”A B u i l d i n g r e f e r s t o any r o o f e d s t r u c t u r e p e r m a n e n t l y c o n s t r u c t e d o r e r e c t e d on i t s s i t e , f o r t h e p r o t e c t i o n o f humans , a n i m a l s , things , or the pr od uc ti on o f economic goods . S e t t l e m e n t s ar e c o l l e c t i o n s o f b u i l d i n g s and a s s o c i a t e d f e a t u r e s where a community c a r r i e s o u t s o c i o −e c o n o m i c a c t i v i t i e s . ” . Listing 1.1. Example Data Theme Snippet Each proposed theme in the UN-GGIM data themes was defined as a SKOS concept class. All the classes in the data theme vocabulary were collected under a generic Data Theme class which was defined as a top concept of the scheme and the themes were described as the narrower concepts of this concept. The created vocabulary is available online and it can be seen partially in Listing 1.1. Since there is no direct relation between concepts they have been classified under the top concept of data theme8 . OSi have identified the UN-GGIM data themes as an important framework for reporting to their stakeholders while using Prime2 dataset. A key issue was how to map or present the contents of the 50 million spatial objects captured in Prime2 as data themes. This would enable adding additional meaning that not only humans but also machines could use and interpret the data. In turn this would enable Prime2 data quality reports to be generated for a specific data theme or group of data themes. At first glance it seems that Prime2 (as a universal model) spans all UN-GGIM data themes. However within Prime2 there are different types of spatial classes totaling to 34, e.g. Building and Locale datasets are spatial object classes in the Prime2 dataset so each class can be considered to form a sub-dataset in Prime2. Our approach has been to create a mapping between Prime2 spatial objects and UN-GGIM data themes. Then a DCAT record can be created for each sub-dataset and this in turn enables it to be assocated with one or more data themes. This enhanced data catalog for OSi data assets enables us to write queries for parts of Prime2 that correspond to specific data themes and to associate the quality data for those spatial objects with the data theme. It was important to take into consideration what type of processes will be performed upon these datasets and decide the relevant themes for each class. The created vocabulary is used to organize Prime2 dataset. The datasets (spatial classes) were enriched by one or more than one label using the UN-GGIM themes. The preliminary version of the assignments are performed as in Table 1. There are 3 options for a relationship between a Prime 2 spatial object class and a data theme category: partial match, full match (yes in table), no match (no in table). A full match means all the instances of a class are relevant to the data theme. A partial match means some instances of the class are relevant to the data theme. No match means it does not have any instances related to that data theme. 8 https://linkeddataops.adaptcentre.ie/vocabularies/unggim-data-themes 6 B. Yaman et al. Table 1. Example OSi Prime2 Sub-Datasets to UN-GGIM Data Theme Mapping Buildings and Geographical Transport Addresses Water Settlements Names Networks Boundary Area partial partial yes no no Building yes yes partial no no Locale yes yes yes no no Site partial yes yes no no Water no no partial yes yes Way yes no partial yes no A DCAT catalog description (Listing 1.2) was created for each datasets - including subdatasets- in the pipeline. Listing 1.2 presents this description sam- ple of Building dataset which was enhanced by Buildings-Settlements theme relation. This enabled us using a standard model to describe the metadata features from multiple vocabularies and having different values throughout the time. This description was used as a metadata repository to enable effective data governance controls to store and track the data practically. Beside the advan- tages adding to the data practitioners, the themes add value to the data in the international level. For instance, the data with transport network theme allows to track the transportation ways around Europe and between the continents. The dashboard (Fig. 1) presents the reporting features based on the OSi data catalog which includes data theme, provenance and data quality metadata about each dataset and subdataset. This is a part of the LinkedDataOps project which aims to manage the data in a useful way [5]. The dashboard page has several filters on the left side of the page which allows users to click interactively. Users can pose various queries with different views by clicking on the filters and visualize the dataset relations semantically. The description of the data themes allowed an easy classification and exploration process of the datasets in the pipeline. Thus, while the classification of subdatasets helps OSi to better understand, classify and provide the data, it also helps the end users to consume the parts of data they need. This approach brings convenience not only to the data producer but also to the end user of the data. a dcat : Dataset ; a prov : E n t i t y ; dc : t i t l e ” B u i l d i n g ”@en ; dc : d e s c r i p t i o n ” B u i l d i n g i s a permanent r o o f e d c o n s t r u c t i o n , c u r r e n t l y o r f o r m e r l y u s e d o r i n t e n d e d f o r s h e l t e r . The c o n s t r u c t i o n must have permanent f o u n d a t i o n s . A work under c o n s t r u c t i o n , i s i n c l u d e d a s a B u i l d i n g , i f i t i s a p p a r e n t t h a t , on c o m p l e t i o n , i t w i l l meet t h e d e f i n i t i o n f o r B u i l d i n g . A s t r u c t u r e that i s i d e n t i f i a b l e as having once been a B u i l d i n g but which no l o n g e r has a r o o f , i s i n c l u d e d a s a B u i l d i n g . ”@en ; d c t : c r e a t e d ”2019−01−09” ˆˆ xsd : d a t e ; d c t : m o d i f i e d ”2020−01−09” ˆˆ xsd : d a t e ; d c a t : theme unggim−dt : A d d r e s s e s , unggim−dt : B u i l d i n g s −S e t t l e m e n t s . Listing 1.2. Part of OSi Data Catalog UN-GGIM Data Themes 7 Fig. 1. OSi Dasboard for Data Catalog 6 Conclusions/Future Work A UN-GGIM data theme taxonomy was implemented in order to increase inter- operability, facilitate the integration of non-centralized data, and improve search facilities with specific data themes. It is anticipated that using this taxonomy will be used as a means of integrating geospatial data in the national and in- ternational level and will address global challenges and help build international awareness. As a future work, we would like to map the UN-GGIM data themes to INSPIRE data themes in order to improve the interoperability further. Acknowledgement This research received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No. 801522, by Science Foundation Ireland and co-funded by the European Regional Development Fund through the ADAPT Centre for Digital Content Technology [grant number 13/RC/2106] and Ordnance Survey Ireland. References 1. R. Albertoni, D. Browning, S. Cox, A. G. Beltran, A. Perego, and P. Winstanley. Data catalog vocabulary (dcat) - version 2. World Wide Web Consortium, 2020. 2. C. Hadley. The global fundamental geospatial data themes journey. United Nations Committee of Experts on Global Geospatial Information Management, 2018. 3. A. Miles and S. Bechhofer. Skos simple knowledge organization system reference. W3C recommendation, 2009. 4. UN-GGIM. The Global Fundamental Geospatial Data Themes. https://ggim.un.org/documents/Fundamental%20Data%20Publication.pdf, 2019. Accessed on 05.12.2020. 5. B. Yaman and R. Brennan. Linkeddataops: linked data operations based on quality process cycle. Proceedings of the EKAW 2020 Posters and Demonstrations Session, Globally online & Bozen-Bolzano, Italy, September 17, 2020.