=Paper= {{Paper |id=Vol-1111/om2013_poster11 |storemode=property |title=Matching geospatial instances |pdfUrl=https://ceur-ws.org/Vol-1111/om2013_poster11.pdf |volume=Vol-1111 |dblpUrl=https://dblp.org/rec/conf/semweb/DuAJH13 }} ==Matching geospatial instances== https://ceur-ws.org/Vol-1111/om2013_poster11.pdf
               Matching Geospatial Instances

        Heshan Du1 , Natasha Alechina1 , Michael Jackson1 , Glen Hart2
                             1
                               University of Nottingham
                        2
                            Ordnance Survey of Great Britain

    The work presented in this paper extends our work on matching formal and
informal geospatial ontologies [1], aimed to realize the synergistic use of author-
itative and crowd-sourced geospatial information. A geospatial instance is an
object which has a certain and verifiable location (geometry, topographic foot-
print), as well as a meaningful label (for example, Victoria Shopping Centre in
Nottingham, UK). The source of examples in this paper are: The OpenStreetMap
(OSM) [2] and the Ordnance Survey of Great Britain (OSGB) [3].
    Different geospatial instances may have the same purely lexical information
and the same classification in terms of an ontology, but different locations. For
example, there may be several restaurants called ‘Prezzo Ristorante’ in the same
city. Therefore, when matching geospatial instances, it is essential to use location
information. However, few of existing ontology matching or data interlinking
methods can match spatial instances effectively.
    There is also a choice in representing objects such as shopping centres as a
collection of parts or as a single instance. For example, Victoria Centre is rep-
resented as a collection of shops and other businesses in OSGB and as a single
instance in OSM. In order to produce a meaningful correspondence between in-
stances in OSGB and in OSM, we propose to use ‘partOf’ relation (mereological
partOf in geometry and having similar labels). If for two instances a and b we
get ‘partOf’ relations in both directions, we generate a hypothesis that a and b
belong to ‘sameAs’.
    Here we propose a new method for establishing ‘sameAs’ and ‘partOf’ re-
lations between geospatial instances from different ontologies. In crowd-sourced
data, there is an increased possibility of error in measurement, and tendency to
simplify shapes of buildings. For this reason, in our method we use buffers to
compare geometries of instances. The size of the buffer (intuitively, ‘the margin
of error’ σ) is a parameter which can be arrived at experimentally or within rea-
son set arbitrarily. Intuitively the optimal value of σ corresponds to the maximal
deviation between two representations of the same object in two different data
sets. In the case of OSGB and OSM data for Nottingham, this is experimentally
determined as 20m. The new method has four steps.
    Step 1: Extracting geometry sets. An ABox contains facts (geometry, lexical
and semantic classification information) about geospatial instances. We extract
a set of geometries, Gi , from all the spatial instances in each ABox Ai , i = 1, 2.
    Step 2: Matching geometry sets. For two sets of geometries, G1 , G2 , a level
of tolerance α and tolerance for the second best β, we generate the best two
candidate matches for each geometry in G1 if they exist in G2 , and the best two
candidate matches for each geometry in G2 if they exist in G1 . The candidates are
2

selected by comparing minimal buffers. The buffer of a geometry g, buffer (g, σ) =
{p : ∃p0 ∈ g. distance(p, p0 ) ≤ σ}, (σ > 0). For two geometries g and h, the
minimal buffer of h containing g is buffer (h, σ) such that g ⊆ buffer (h, σ) and
for all σ 0 < σ, g 6⊆ buffer (h, σ 0 ). For any geometry g, the minimal buffer α
(α ≤ σ) of its best candidate o1 (g ⊆ buffer (o1 , α)) is the smallest among those
of all the candidates. We generate a ’buffered part of’ (BP T ) relation between
each geometry and its candidates, i.e. (g, o1 ) ∈ BP T (σ).
     Step 3: Comparing labels. We use string comparison, including equality,
inclusion, abbreviation and edit distance to check whether the labels, such as
names or addresses, of two instances are similar. If there is no pair of labels
of spatial instances s1 , s2 that are similar, then their lexical information is in-
compatible, (s1 , s2 ) ∈ LF . Otherwise, their lexical information is compatible,
(s1 , s2 ) ∈ LT .
     For every pair of spatial instances s1 , s2 , if (g1 , g2 ) ∈ BP T (α) (where gi is
the geometry of si , i = 1, 2) and (s1 , s2 ) ∈ LT , then (s1 , s2 ) ∈ partOf possibly
holds, and we will add it to the initial instance mapping M .
     Step 4: Verifying initial instance mapping M using semantic classification
information. It is part of ontology (ABox and TBox) matching process, presented
in [1].
     We implement the method described above as part of GeoMap [1]. From the
studied area (2km sq) of Nottingham city centre, 713 geospatial individuals of 47
types are added to OSGB Buildings and Places ontology from the OSGB Address
Layer 2 and the OSGB Topology Layer [3], 253 geospatial individuals of 39 types
are added into OSM ontology automatically from the building layer of OSM
data. The ground truth instance mapping is obtained from manually matching
all the instances in the two ontologies. It contains 286 ‘partOf’ relations, and
73 ‘sameAs’ relations can be inferred. The data used is available on http://
www.cs.nott.ac.uk/~hxd/GeoMap.html. We compare the performance of our
method with LogMap [4] and KnoFuss [5]. The precisions of mappings produced
by GeoMap, LogMap and KnoFuss are 1, 0.24 and 0.18 respectively, and the
recalls are 0.95, 0.38, 0.25 respectively. The precision and recall of GeoMap are
much higher, mainly because LogMap and KnoFuss cannot make effective use
of location information.

References
1. Du, H., Alechina, N., Jackson, M., Hart, G.: Matching Formal and Informal Geospa-
   tial Ontologies. In: Geographic Information Science at the Heart of Europe. Lecture
   Notes in Geoinformation and Cartography. Springer (2013) 155–171
2. OpenStreetMap: http://www.openstreetmap.org (2012)
3. Ordnance Survey: http://www.ordnancesurvey.co.uk/oswebsite (2012)
4. Jiménez-Ruiz, E., Grau, B.C.: LogMap: Logic-Based and Scalable Ontology Match-
   ing. In: International Semantic Web Conference (1). (2011) 273–288
5. Nikolov, Andriy and Uren, Victoria and Motta, Enrico: KnoFuss: a Comprehen-
   sive Architecture for Knowledge Fusion. In: Proceedings of the 4th International
   Conference on Knowledge Capture. (2007) 185–186