Geospatial data integration and visualisation using Linked Data Weiming Huang Ali Mansourian Lars Harrie GIS Centre, Department of GIS Centre, Department of GIS Centre, Department of Physical Geography and Physical Geography and Physical Geography and Ecosystem Science, Lund Ecosystem Science, Lund Ecosystem Science, Lund University University University Sölvegatan 12, 223 62 Sölvegatan 12, 223 62 Sölvegatan 12, 223 62 Lund, Sweden Lund, Sweden Lund, Sweden weiming.huang@nateko.lu.se ali.mansourian@nateko.lu.se lars.harrie@nateko.lu.se Abstract Geospatial data are increasingly available nowadays, and this leads to more analyses and visualisation of geospatial data from several sources. To enable this, we need homogenous data as well as proper integration methods. Geospatial data integration has been a long- standing research topic for decades, and this paper discusses the utilisation of Linked Data technology stack to alleviate the geospatial data integration, particularly in the multi-scale context. Furthermore, this paper also discusses the possibilities of incorporating symbolisation information in Linked Data along with the integrated linked geospatial data for visualisation. Keywords: geospatial data integration; multi-scale; Linked Data; visualisation; symbolisation. 1 Introduction web map), the thematic data are usually simply overlaid on the top of a base map without explicit The rapid development of the Internet, together links and integration. However, the scales of the with the incentives from legislation, commence, thematic data and the base map are generally not and the open data trend, has led to the improvement synchronised because unlike the thematic layer, the of the availability of geospatial data, including both base map is usually a multi-scale map from an the authoritative geospatial data accessible from authoritative mapping agency and has multiple governmental Spatial Data Infrastructures (SDIs) representations (for details, see Huang et al., 2016). and the prevalent Volunteered Geographic In this context, the Semantic Web technologies, Information (VGI). For example, in Europe, the particularly the ones concerning Linked Data, INSPIRE1 directive formulated that in a few years’ provide a promising technical framework to ease time, several authorities that are responsible for the integration and linking between geospatial data. creating and maintaining geospatial data are obliged “Linked Data” is the term for the collection of to set up download services to facilitate the access design principles and technologies centred around a and sharing of geospatial data. The substantial paradigm to publish, retrieve, reuse, and integrate improvement of data availability will enable cross- data on the Web (Kuhn et al. 2014). The adoption data set analysis and visualisation, in which the and application of Linked Data in the geospatial integration of geospatial data from different sources community have developed considerably in recent is indispensable. years. A number of geospatial data sets have been The productions of geospatial data from different released as Linked Data, and some of them have sources are generally isolated, and this causes the made up an indispensable portion in the linking syntactic and semantic heterogeneity that are two open data (LOD) cloud (The Linking Open Data significant obstacles for geospatial data integration. cloud diagram, 2017; Figure 1). On the other hand, Furthermore, the links between multi-source the visualisation and symbolisation of linked geospatial data that are of relevance are often geospatial data has been rarely exploited, and it is lacking. The absence of links between data sets even trickier in a multi-scale context. Hence, this impedes the integration of geospatial data for project mainly concentrates on investigating the visualisation and analysis, and this impediment is integration and visualisation of multi-source especially significant in a multi-scale environment. geospatial data utilising the Linked Data technology For example, in a map mashup (a common form of stack, in particular in a multi-scale context. The AGILE PhD school 2017 following research questions will be addressed Data integration is a long-standing research theme Figure 1. The central part of LOD cloud of November, 2017 within the work: in the geospatial domain where geometric, • How to organise geospatial data in different topological as well as semantic information are scales in Linked Data, the design of unique resource used (see e.g. , Walter and Fritch 1999, Du et al. identifiers and ontologies is important to link the 2012, Yang et al. 2014). With a few exceptions multiple representations of each geographic object; (e.g., Mustière and Devogele. 2008), these studies • How to establish the links between different have concentrated on the integration of data of geospatial Linked Data sets, particularly in a multi- similar levels of detail. scale context; In the abovementioned environment of map • How the links between data sets can be utilised mashup, in which multi-source geospatial data are for the synchronisation of scales between multi- generally simply overlaid together without any source geospatial data sets; links established between each other, the • How the linked geospatial data sets should be integration usually is about multi-scale data sets. visualised and symbolised, namely how the Stern and Sester (2013) studied mashups of natural symbolisation information should be defined and protected areas on top of a base map, where the organised, and on which level (feature level, feature protected areas often have common geometries with collection level, etc.) it should be defined. the base map. To overcome the problem of • How the linked geospatial data sets would inconsistencies in the multi-scale representation, benefit the SDI. they argued that the base map should act as constraints for generalising the thematic data. 2 Related work Toomanian et al. (2013) used Semantic Web technologies to integrate multi-source data in map 2.1 Geospatial data integration using Linked mashups. They defined the semantic relationships Data between feature types in the thematic data and the base map in the map mashups using ontologies. AGILE PhD school 2017 These semantic relationships were then used to of multi-scale geospatial data sets has been rarely enable real-time adjustment of the thematic features explored, and this is the focus of this project. to the base map. Linked Data technology has been adopted to 2.2 Visualisation of geospatial linked data facilitate geospatial data integration in some other The linked geospatial data are situated at rather studies. For instance, Wiemann and Bernard (2016) central places in the LOD because geospatial and investigated possibilities for the integration of SDI location data often serve as nexus and linkage and Linked Data paradigm in terms of spatial data between different data items and sets (Janowicz, integration. They implemented a prototype system 2012). However, the portrayal and symbolisation of where the spatial relationships were explored by the linked geospatial data have been seldom discussed. OGC Web Processing Service (WPS) and then When it comes to the visualisation of linked explicitly and separately stored using Linked Data, geospatial data, the providers of such data generally including the information of involved features, use external styling service or hard-coded relationship types and conducted relationship symbolisation parameters. The LinkedGeoData measurements. Lutz et al. (2009) addressed a (LGD) project which released OpenStreetMap hybrid ontology-based solution for overcoming the (OSM) data in Linked Data used separate renderer semantic heterogeneity in SDI. They designed a service where the symbolisation rules are settled to shared vocabulary on top of which the application render the LGD data (Stadler et al. 2012). The ontologies were designed, then they used the GeoNames2 has an online portal in which the ontology reasoner (DL query) to identify the entities can be shown on the top of either a digital subsumption relationships between concepts, thus base map or satellite images; the entities are simply the corresponding concepts in different shown as labels with numerical signs or bounding classification systems were recognised; they also boxes with uniform symbology. In these cases, the used semantic annotations to label the data services portrayal information is not explicit and can be to enable the data requestor to use a tailored hardly reused by the users or other organisations language to retrieve data. The tailored language was which are interested in the geospatial data in RDF then translated into DL query and subsequently the and the visualisation of the data. WFS requests were invoked. There have been some studies using ontologies to In the framework of Linked Data technology, organise and semantically annotate the symbology some techniques have been extended in order to information in Linked Data. For example, the OGC improve the handling of linked geospatial data. For (Open Geospatial Consortium) explored semantic example, SPARQL, as the query protocol for RDF, mediation of portrayal information of geospatial has a standardised geospatial extension – data using ontology in their testbed 11 and 12 GeoSPARQL (Perry and Herring, 2011). (Fellah, 2015; 2017). They designed symbology GeoSPARQL also provides an ontology as a ontologies during the testbeds, and the ontologies in standardised exchange basis for geospatial RDF testbed 11 was more inclined to the ISO 19117 data (Battle and Kolas, 2012) and this has been standard (Kresse and Fadaie, 2004) and the adopted in several studies in which the geospatial ontology in testbed 12 was better aligned to data sets are published as Linked Data and linked to Symbology Encoding (SE; Müller, 2006) and other data sets. For example, Patroumpas et al. Styled Layer Descriptor (SLD; Lupp, 2007). In (2015) exposed the INSPIRE-compliant data and outline, the ontologies that they developed were metadata as Linked Data by transforming them into modularised to avoid huge-sized ontology and the data model of resource description framework foster the reusability, specifically the vocabulary (RDF) using XSLT transformations and then was modularised into style ontology, symbol exposing them through (Geo)SPARQL endpoints, ontology, symbolizer ontology and graphic they adopted the GeoSPARQL ontology for the ontology. However, there still very few study geometric representation of their RDF data sets. concerning how the symbolisation information These technical advances enable the geospatial data should be associated with geospatial information in to be linked and referenced. However, the linking the LOD cloud, and how the multi-scale AGILE PhD school 2017 symbolisation should be arranged if the data are in The PhD study of the Weiming Huang at GIS several different levels of detail. Centre, Lund University is jointly funded by China Scholarship Council (CSC) and Lund University. 3 Method Notes The Linked Data technology will be leveraged in this project. Specifically, the data will be 1.https://inspire.ec.europa.eu/ constructed upon their connections with the 2.http://www.geonames.org/ reference data sets. For example, the natural protected areas are generally defined by their connections with other geographic objects (e.g., Reference river, lake, road, etc.). Assuming that the reference geospatial data that have the topographic and Abele, A., McCrae, J.P., Buitelaar, P., Jentzsch, A. cadastral objects are released in Linked Data, then and Cyganiak, R. (2017) Linking open data cloud the natural protected areas can be defined upon their diagram 2017 [online]. Available from: http://lod- relations with the objects in base map, and the cloud.net/ [accessed 15 February 2017] scales between the reference data and the thematic data that are built upon the reference data can be Battle, R. and Kolas, D. (2012) Enabling the automatically synchronised. Several case studies geospatial semantic web with parliament and will be performed to verify the feasibility of the GeoSPARQL. Semantic Web, 3(4), 355-370. approach. In addition to this, the symbolisation information of both thematic and reference data will Du, H., Anand, S., Alechina, N., Morley, J., Hart, also be incorporated into the Linked Data sets to G., Leibovici, D., Jackson, M. and Ware, M. (2012) enable tailored visualization. The symbolisation of Geospatial information integration for authoritative thematic data also can be dependent on the styling and crowd sourced road vector data. Transactions or other information in reference data. in GIS, 16 (4), 455–476. To realise this idea, we need: • Multiple representation databases that are released Fellah, S. (ed.) (2015) OGC Testbed-11 Symbology as Linked Data to serve as reference data sets, the Mediation Engineering Report, Open Geospatial GeoSPARQL can be employed to act as vocabulary Consortium. for geometries; and the design of URI still needs to be explored; Fellah, S. (ed.) (2017) Testbed-12 Semantic • Ontologies that define the formal semantics of the Portrayal, Registry and Mediation Engineering relations between thematic and reference data, these Report, Open Geospatial Consortium. can be extended from GeoSPARQL; • Ontologies that define the styling information of Huang, W., Harrie, L. and Mansourian, A. (2016). linked geospatial data, some concepts from SE and On-demand mapping and integration of thematic SLD can serve as reference; data. In ICA Workshop on Generalisation and • A mechanism for generating thematic data from Multiple Representation, 14 June, Helsinki, the relations with reference data for visualisation Finland. and analysis; • A prototypical system that can automatically Janowicz, K. (2012) Place and Location on the Web generate thematic data from the above data of Linked Data. WWW document, modelling and visualise them according to the http://stko.geog.ucsb.edu/location_linked_data tailored symbolisation information. Kresse, W. and Fadaie, K. (2004). ISO standards for geographic information. Springer Science & Acknowledgement Business Media. AGILE PhD school 2017 Kuhn, W., Kauppinen, T., and Janowicz, K. (2014) data in viewing services. Journal of Spatial Linked Data – a paradigm shift for geographic Information Science, 2013(6), 43-58. information science. In: Duckham, M., Pebesma, E., Stewart, K., Frank, A. ed., Geographic Walter, V. and Fritsch, D. (1999) Matching spatial information science. Berlin: Springer, 173–186. data sets: a statistical approach. International Journal of Geographical Information Science Lupp, M. (2007) Styled layer descriptor profile of 13(5), 445-473. the web map service implementation specification. Open Geospatial Consortium Inc. OGC. Wiemann, S. and Bernard, L. (2016) Spatial data fusion in Spatial Data Infrastructures using Linked Lutz, M., Sprado, J., Klien, E., Schubert, C. and Data. International Journal of Geographical Christ, I. (2009) Overcoming semantic Information Science, 30(4), 613-636, heterogeneity in spatial data Yang, B., Zhang, Y., and Lu, F. (2014) Geometric- infrastructures. Computers & Geosciences, 35(4), based approach for integrating VGI POIs and road 739-752. networks. International Journal of Geographical Information Science, 28(1), 126-147. Mustière, S. and Devogele, T. (2008). Matching networks with different levels of detail. Geoinformatica, 12(4), 435-453. Müller, M. (2006) Symbology Encoding Implementation Specification. Open Geospatial Consortium. Patroumpas, K., Georgomanolis, N., Stratiotis, T., Alexakis, M. and Athanasiou, S. (2015) Exposing INSPIRE on the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web, 35, 53-62. Perry, M. and Herring, J. (2012) OGC GeoSPARQL – a geographic query language for RDF data [online]. Technical report, Open Geospatial Consortium. Available from: https://portal.opengeospatial.org/files/?artifact_id= 47664 [Accessed 28 August 2016] Stadler, C., Lehmann, J., Höffner, K. and Auer, S. (2012) Linkedgeodata: A core for a web of spatial open data. Semantic Web, 3(4), 333-354. Stern, C. and Sester, M. (2013) Deriving constraints for the integration and generalization of detailed environmental spatial data in maps of small scales. In ICA Workshop on Generalisation and Multiple Representation, 23–24 August Dresden, Germany. Toomanian, A., Harrie, L., Mansourian, A. and Pilesjo, P. (2013). Automatic integration of spatial