=Paper=
{{Paper
|id=Vol-1111/om2013_poster11
|storemode=property
|title=Matching geospatial instances
|pdfUrl=https://ceur-ws.org/Vol-1111/om2013_poster11.pdf
|volume=Vol-1111
|dblpUrl=https://dblp.org/rec/conf/semweb/DuAJH13
}}
==Matching geospatial instances==
Matching Geospatial Instances Heshan Du1 , Natasha Alechina1 , Michael Jackson1 , Glen Hart2 1 University of Nottingham 2 Ordnance Survey of Great Britain The work presented in this paper extends our work on matching formal and informal geospatial ontologies [1], aimed to realize the synergistic use of author- itative and crowd-sourced geospatial information. A geospatial instance is an object which has a certain and verifiable location (geometry, topographic foot- print), as well as a meaningful label (for example, Victoria Shopping Centre in Nottingham, UK). The source of examples in this paper are: The OpenStreetMap (OSM) [2] and the Ordnance Survey of Great Britain (OSGB) [3]. Different geospatial instances may have the same purely lexical information and the same classification in terms of an ontology, but different locations. For example, there may be several restaurants called ‘Prezzo Ristorante’ in the same city. Therefore, when matching geospatial instances, it is essential to use location information. However, few of existing ontology matching or data interlinking methods can match spatial instances effectively. There is also a choice in representing objects such as shopping centres as a collection of parts or as a single instance. For example, Victoria Centre is rep- resented as a collection of shops and other businesses in OSGB and as a single instance in OSM. In order to produce a meaningful correspondence between in- stances in OSGB and in OSM, we propose to use ‘partOf’ relation (mereological partOf in geometry and having similar labels). If for two instances a and b we get ‘partOf’ relations in both directions, we generate a hypothesis that a and b belong to ‘sameAs’. Here we propose a new method for establishing ‘sameAs’ and ‘partOf’ re- lations between geospatial instances from different ontologies. In crowd-sourced data, there is an increased possibility of error in measurement, and tendency to simplify shapes of buildings. For this reason, in our method we use buffers to compare geometries of instances. The size of the buffer (intuitively, ‘the margin of error’ σ) is a parameter which can be arrived at experimentally or within rea- son set arbitrarily. Intuitively the optimal value of σ corresponds to the maximal deviation between two representations of the same object in two different data sets. In the case of OSGB and OSM data for Nottingham, this is experimentally determined as 20m. The new method has four steps. Step 1: Extracting geometry sets. An ABox contains facts (geometry, lexical and semantic classification information) about geospatial instances. We extract a set of geometries, Gi , from all the spatial instances in each ABox Ai , i = 1, 2. Step 2: Matching geometry sets. For two sets of geometries, G1 , G2 , a level of tolerance α and tolerance for the second best β, we generate the best two candidate matches for each geometry in G1 if they exist in G2 , and the best two candidate matches for each geometry in G2 if they exist in G1 . The candidates are 2 selected by comparing minimal buffers. The buffer of a geometry g, buffer (g, σ) = {p : ∃p0 ∈ g. distance(p, p0 ) ≤ σ}, (σ > 0). For two geometries g and h, the minimal buffer of h containing g is buffer (h, σ) such that g ⊆ buffer (h, σ) and for all σ 0 < σ, g 6⊆ buffer (h, σ 0 ). For any geometry g, the minimal buffer α (α ≤ σ) of its best candidate o1 (g ⊆ buffer (o1 , α)) is the smallest among those of all the candidates. We generate a ’buffered part of’ (BP T ) relation between each geometry and its candidates, i.e. (g, o1 ) ∈ BP T (σ). Step 3: Comparing labels. We use string comparison, including equality, inclusion, abbreviation and edit distance to check whether the labels, such as names or addresses, of two instances are similar. If there is no pair of labels of spatial instances s1 , s2 that are similar, then their lexical information is in- compatible, (s1 , s2 ) ∈ LF . Otherwise, their lexical information is compatible, (s1 , s2 ) ∈ LT . For every pair of spatial instances s1 , s2 , if (g1 , g2 ) ∈ BP T (α) (where gi is the geometry of si , i = 1, 2) and (s1 , s2 ) ∈ LT , then (s1 , s2 ) ∈ partOf possibly holds, and we will add it to the initial instance mapping M . Step 4: Verifying initial instance mapping M using semantic classification information. It is part of ontology (ABox and TBox) matching process, presented in [1]. We implement the method described above as part of GeoMap [1]. From the studied area (2km sq) of Nottingham city centre, 713 geospatial individuals of 47 types are added to OSGB Buildings and Places ontology from the OSGB Address Layer 2 and the OSGB Topology Layer [3], 253 geospatial individuals of 39 types are added into OSM ontology automatically from the building layer of OSM data. The ground truth instance mapping is obtained from manually matching all the instances in the two ontologies. It contains 286 ‘partOf’ relations, and 73 ‘sameAs’ relations can be inferred. The data used is available on http:// www.cs.nott.ac.uk/~hxd/GeoMap.html. We compare the performance of our method with LogMap [4] and KnoFuss [5]. The precisions of mappings produced by GeoMap, LogMap and KnoFuss are 1, 0.24 and 0.18 respectively, and the recalls are 0.95, 0.38, 0.25 respectively. The precision and recall of GeoMap are much higher, mainly because LogMap and KnoFuss cannot make effective use of location information. References 1. Du, H., Alechina, N., Jackson, M., Hart, G.: Matching Formal and Informal Geospa- tial Ontologies. In: Geographic Information Science at the Heart of Europe. Lecture Notes in Geoinformation and Cartography. Springer (2013) 155–171 2. OpenStreetMap: http://www.openstreetmap.org (2012) 3. Ordnance Survey: http://www.ordnancesurvey.co.uk/oswebsite (2012) 4. Jiménez-Ruiz, E., Grau, B.C.: LogMap: Logic-Based and Scalable Ontology Match- ing. In: International Semantic Web Conference (1). (2011) 273–288 5. Nikolov, Andriy and Uren, Victoria and Motta, Enrico: KnoFuss: a Comprehen- sive Architecture for Knowledge Fusion. In: Proceedings of the 4th International Conference on Knowledge Capture. (2007) 185–186