<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Finding spatial equivalences across multiple RDF datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Mart n Salas</string-name>
          <email>juan.salas@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Harth</string-name>
          <email>harth@kit.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FRLP, Universidad Tecnologica Nacional</institution>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute AIFB, Karlsruhe Institute of Technology (KIT)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The importance of geospatial information is being re ected on the growing amount of spatial datasets on the Semantic Web. However, the high variability of the data presents challenges for integration. In this paper, we address the problem of nding spatial equivalences between geospatial RDF datasets. First, we present mappings between our NeoGeo vocabulary and the vocabularies used by some well-known spatial RDF datasets. Second, we describe a method to nd spatially co-located features across spatial RDF datasets. To nd equivalences, we rely on analyzing the Hausdor distance distribution in the compared datasets, with the objective of nding a sensible criterion that aids the recognition of equivalent regions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Geospatial data is ubiquitous in information management, supporting scienti c,
industrial or just everyday activities. The relevance of geospatial data is re ected
by the growing amount of geospatial datasets on the Web.</p>
      <p>A feature is an abstraction of a real world phenomenon (e.g. a building, a
mountain or an administrative region). A geographic feature is a feature associated
with a location relative to the Earth, which is usually represented by a certain
geometric shape (e.g. a point, a curve or a polygon). Features that are spatially
colocated (i.e. share the same location) are not necessarily always the same. However,
nding spatially co-located regions is a powerful measure of similarity between
features.</p>
      <p>Factors like rounding e ects, di erent scales and di erent formats, present a
challenge when attempting to elicit equivalences between geospatial resources. We
de ne a method for obtaining a criterion that best ts the di erences between the
datasets merged.</p>
      <p>This work was successfully applied to integrate our RDF representations of
the NUTS nomenclature of the European Union 3 and of the GADM project 4
to other datasets describing spatial information on the Semantic Web, and also
between each other.</p>
      <p>Our contributions are as follows:
{ Representation and modeling of datasets: we survey the representation of
existing geospatial datasets, and distill an integration vocabulary which covers the
3 http://nuts.geovocab.org/
4 http://gadm.geovocab.org/
core set of classes and properties in existing data. We also integrate existing
vocabularies and publish two geospatial datasets (Section 2).
{ Integration and mapping of multiple datasets: we develop an algorithm for
nding equivalences for geometric shapes across multiple datasets (Section 3).
{ Evaluation of the presented approach: we conduct experiments in which we
evaluate the accuracy of the results (Section 4).</p>
      <p>We discuss the related work in Section 5. Finally, we identify areas for future
work and conclude in Section 6.
2</p>
      <p>Representing geospatial data on the web
2.1</p>
      <sec id="sec-1-1">
        <title>Analyzed datasets</title>
        <p>
          We start with providing a brief summary of the analyzed datasets.
{ UN FAO Geopolitical Ontology5: The Food and Agriculture Organization of
the United Nations (FAO) is a specialized agency of the UN. The UN FAO
Geopolitical Ontology provides the FAO and its associated partners with a
master reference for geopolitical information.
{ OS OpenData6 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]: The Ordnance Survey (OS) is the national mapping agency
for Great Britain. OS has released a number of its products as Linked Data.
{ GeoLinkedData.es [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]: The initiative provides geospatial information about
the national territory of Spain. The information provided as RDF at their
website is gathered from di erent national sources. However, the integration
process is based on string matching.
{ LinkedGeoData.org [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]: The project provides data from OpenStreetMap7 as
        </p>
        <p>Linked Data.
{ GeoNames.org: GeoNames is a geographical database that covers all countries
and provides Linked Data under a Creative Commons attribution license.
{ Uberblic.org: Uberblic provides an integration service that includes data from</p>
        <p>GeoNames, Wikipedia, MusicBrainz, Freebase, Last.fm and Foursquare.
{ RAMON NUTS8: The Nomenclature of Units for Territorial Statistics (NUTS)
is a geocoding standard for referencing the subdivisions of countries for
statistical purposes developed by the European Union and published as Linked
Data.
{ DBpedia.org: The community e ort extracts structured information from
Wikipedia for publication as Linked Data.
{ NeoGeo: We provide an integration vocabulary, described in more detail in</p>
        <p>Sections 2.4 and 2.5.
5 http://www.fao.org/countryprofiles/geoinfo/geopolitical/resource/
6 http://data.ordnancesurvey.co.uk/
7 http://openstreetmap.org/
8 http://rdfdata.eionet.europa.eu/ramon/nuts2008/
2.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Representing location</title>
        <p>The analyzed spatial datasets represent the location of features in di erent ways.
We identi ed four main kinds of representation: point, bounding box, points in
lists, points using a single property and literals. Geometric shapes are not only
described using di erent vocabularies, but also these vocabularies are based on
di erent structures, which increases the di culty of working with GeoData across
datasets.</p>
        <p>
          { Point Location of objects is merely represented by a geographic point. The
most common vocabulary to do so is W3C Geo[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], sometimes complemented
with a GeoRSS representation [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], such is the case of the UK Ordnance Survey,
even if GeoRSS is not a proper RDF vocabulary but an XML-Schema. In some
cases, neither W3C Geo nor GeoRSS is used, but an own vocabulary, as is the
case of the Uberblic Ontology, which uses its own "latitude", "longitude" and
"altitude" predicates.
{ Bounding box The location is represented by two points or four line
segments forming a georeferenced rectangle (on cylindrical projections). This is
the case of the FAO Geopolitical Ontology, which uses four predicates
(hasMinLongitude, hasMinLatitude, hasMaxLongitude, hasMaxLatitude) to represent
a rectangle. The rectangle is represented by line segments, which should be
tangential to the region at some point.
{ Points in lists The geometric shape of a region is represented by a
collection of points, each being described as a single RDF node. The whole
collection of points is then linked together using either an RDF Collection or an
RDF Container. LinkedGeoData.org represents geometric shapes by using a
"hasNodes" object property, which links to a rdf:Seq container. The rdf:Seq
container describes the nodes of a shape, which are represented using the W3C
Geo Vocabulary.
{ Points using a single property In the GeoLinkedData.es ontology, rivers
are represented by a group of "Curva" (curve) RDF resources (similar to a
GML LineString). "Curva" resources use a single "formadoPor" object
property to link each of their nodes, which at the same time contain the WGS-84
coordinates (represented with the W3C Geo Ontology) and an "orden" (order)
value property, de ning the position of each node within the geometric shape.
{ Literals Both Ordnance Survey and GeoLinkedData.es (for rivers only)
ontologies include a predicate allowing to include a GML representation of the
geometric data, which is coded in RDF as a literal. A "geometry:extent"
property links a feature to its geometric representation.
2.3
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Representing spatial relations</title>
        <p>A spatial relation states the location of an object in relation to another. We created
a set of vocabulary mappings to the NeoGeo vocabulary using the rdfs:subPropertyOf
predicate. Table 1 shows which predicates are used in each dataset to describe
spatial relations.</p>
        <p>Disjoint Touches
hasBorder</p>
        <p>With
disjoint touches</p>
        <p>Overlaps Within</p>
        <p>isInGroup
partially- within
Overlaps
contains equals
formaParteDe
parentFeature
formadoPor
children</p>
        <p>Features
containinglocation
partOf
locatedInArea
PP
nearby /
nearbyFeatures
Dataset
UN FAO
Ordnance Survey
GeoLinkedData.es
LinkedGeoData.org
GeoNames.org
Uberblic.org
RAMON NUTS
DBpedia.org
NeoGeo
neighbour
/
neighbouringFeatures
adjoininglocation
DC</p>
        <p>EC</p>
        <p>PO</p>
        <p>PC</p>
        <p>EQ
Given the lack of a standardized spatial vocabulary, we developed our own set of
spatial ontologies, which we call NeoGeo9. We manually created a set of mappings
between our vocabularies and the vocabularies used by some acknowledged spatial
datasets like the Ordnance Survey and LinkedGeoData.org. Both the GADM and
NUTS datasets use the NeoGeo vocabularies.</p>
        <p>
          The Geometry Vocabulary10 is an RDF vocabulary for the description of
georeferenced geometric shapes. It is based on the Core Pro le of the Spatial Schema
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and the General Feature Model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Hopefully, the lack of a standardized RDF
vocabulary in this domain will probably be addressed by GeoSPARQL[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] shortly.
For experimentation reasons, the Geometry vocabulary allows to encode geometric
shapes in a representation fully based on RDF or as a WKT representation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
embedded in an XMLLiteral.
        </p>
        <p>
          The Spatial Ontology11 provides a vocabulary for the representation of the
spatial relations used in the Region Connection Calculus (RCC) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. It also provides
monotonic reasoning by mapping most of the semantics of RCC into OWL.
2.5
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>NeoGeo datasets</title>
        <p>We provide two datasets, NUTS and GADM, containing geospatial information as
Linked Data.
9 http://geovocab.org/
10 http://geovocab.org/geometry
11 http://geovocab.org/spatial</p>
        <p>The Nomenclature of Territorial Units for Statistics (NUTS) is a classi cation
de ned by the Eurostat o ce of the European Union. It is intended to divide the
administrative regions of the European Union, in a way that the resulting regions
are demographically equivalent.</p>
        <p>The RDF representation of the NUTS nomenclature contains a 1:60,000,000
geospatial representation of the NUTS statistical units mapped to RDF. The
resources representing NUTS regions in our dataset include (among others) links to
resources in DBpedia, FAO Geopolitical Ontology, GeoNames, Ordnance Survey
and GeoLinkedData.es.</p>
        <p>The Global Administrative Areas (GADM)12 is a project seeking to become
a collaborative e ort on building a spatial database containing information about
all of the administrative regions in the world. GADM aims to provide high
resolution mappings for all administrative areas in the world, along with additional
information about them. The latest version of GADM (0.9) maps 226.439
administrative areas. The information can be downloaded at their website in the following
formats: Shape le, ESRI geodatabase, RData and KMZ.</p>
        <p>Given the value of the GADM project, we have created an RDF representation
of the information contained in the original GADM project, which we seek to enrich
with additional capabilities like the materialization of spatial relations, mappings
to other RDF datasets and SPARQL querying support.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Instance mappings</title>
      <p>The alignment of the vocabularies is only the rst step for the integration of the
datasets. The second step in the process is to nd matchings between the features.</p>
      <p>We can classify the features into three general categories, in relation on how
their location is represented in the datasets. First are the resources that present
no quantitative spatial information at all. Second, the features that approximate
their location by using only a single point. And nally, features which present
rich information about their location (i.e. include a description of their geometric
shape).
3.1</p>
      <sec id="sec-2-1">
        <title>Resources with no quantitative geospatial information</title>
        <p>
          Resources which include no quantitative information about their location can be
integrated by relying either on text matching [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], or object property matching [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. These techniques are also suitable for the disambiguation of spatially obtained
mappings [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This kind of resources is not the topic of this work.
3.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Resources with poor quantitative geospatial information</title>
        <p>Sometimes the location of a feature is approximated by using a single point (e.g.
using the W3C Geo vocabulary) instead of representing its actual extent (i.e. the
geometric shape). Examples of this kind of representation are DBpedia,
LinkedGeoData.org, GeoNames and LinkedGeoData.es.
12 http://gadm.org/</p>
        <p>This kind of representation can lead to false assertions while performing
comparisons against a spatial index if these features are not especially considered. For
example, DBpedia uses the W3C Geo Vocabulary to describe the latitude and
longitude coordinates of features as points. The resource for Germany in DBpedia
http://dbpedia.org/resource/Germany is spatially represented by a point with
latitude 52.516666 and longitude 13.383333. If we intended to obtain the
containment relations for such resource by comparing it with a spatial index, the result
would be that Germany is part of Berlin, which is false. Therefore, even though
these relations can be obtained using the coordinates represented in DBpedia, rst
it is necessary to ensure that the process will not return such false statements.
The false matches can be avoided, for example, by ltering the features that will
be compared by its class, in a way that ensures that the feature will be properly
contained in the features that it will be compared to (e.g. cities in provinces or
restaurants in cities).
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Resources with rich quantitative geospatial information</title>
        <p>Resources that include an accurate description of their extent as a geometric shape
can be compared using this information. We will focus on obtaining links
between spatially co-located regions (we use the spatial:EQ predicate). Whether
owl:sameAs links can be deduced from the obtained links depends on the
modeling of the datasets (e.g. the class the resource belongs to).</p>
        <p>To perform the comparison, we will adopt a Plate Carree projection for both of
the compared datasets. Being this projection equirectangular, we can treat latitude
and longitude coordinates as if they were cartesian. Therefore, the units will be
presented in centesimal degrees.</p>
        <p>The bene t of using an equirectangular projection is that it simpli es the
calculations by avoiding local reprojections (e.g. to UTM), and also allows to use a
global spatial index, improving the performance of the process. In our approach
it is not important if the projection distorts the real size or the actual
geometric shapes on the surface of the Earth, as long as the geometric data is equally
distorted for all datasets.</p>
        <p>Due to a series of factors like rounding e ects and di erent scales, there is no
guarantee that both geometric shapes will be vertex by vertex identical. Figure 1
exempli es these di erences by showing the boundaries for Luxembourg as they
are represented in the GADM and NUTS datasets.</p>
        <p>
          An e ective method of determining how similar two geometric shapes are is
to compute the Hausdor distance between them. The Hausdor distance is the
"maximum distance of a set to the nearest point in the other set\ [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. More
formally, given two sets of points A = fa1; a2; :::; ang and B = fb1; b2; :::; bng, the
Hausdor distance is de ned by:
dH (A; B) = max (farg maxa A arg minb B d(a; b); arg maxb B arg mina A d(a; b)g)
        </p>
        <p>It can be deduced from the formula that in the particular case of calculating the
Hausdor distance between points, the Hausdor distance matches the Euclidean
distance d(a; b).</p>
        <p>Fig. 1: Incongruency of the geometric data (GADM: blue, NUTS: violet) due to
di erences in resolution.</p>
        <p>Figure 2 shows the values of correct and wrong guesses for similar regions in
both datasets. In order to better appreciate the variability of the values, only
small areas are plotted in the chart. From the gure it can be deduced that smaller
regions (e.g. boroughs) require greater precision than larger regions (e.g. countries),
in order to di erentiate them from each other. Therefore, the Hausdor distance
margins allowed for regions which are suspected to be spatially co-located must be
di erent, depending on the area size of the regions being compared.</p>
        <p>To address this issue, it is desirable to obtain a function for a Hausdor
distance threshold for a given area size. In order to do this, rst we calculate the
midpoint between the lowest and second lowest Hausdor distance values, for a
representative set of features in both datasets. Afterwards we perform a quadratic
regression from the midpoint Hausdor distance values. This produces a formula
that allows to determine the maximum Hausdor distance allowed between two
regions, in order to consider them similar. The resulting function has the following
form:</p>
        <p>M axHDist(x) = A x2 + B x + C</p>
        <p>Where MaxHDist is the maximum Hausdor distance allowed between two
regions in order for them to be considered similar. The x variable is the area of
the region. The quadratic function gives more precision for small regions while
allowing a greater margin for large regions. The A,B and C constants are tunable
parameters for the integration procedure.</p>
        <p>A yet unresolved issue of using a quadratic function is that the samples must
include values for the approximated maximum area for which the integration will
be performed. This is because the values of the function will tend to decrease
after reaching a maximum value. We are performing experiments with logarithmic
functions to solve this issue.</p>
        <p>Table 2a shows sample execution times and Hausdor distance values between
features in the NUTS and GADM datasets.</p>
        <p>Region Name NUTS</p>
        <p>Id
Finland FI 62.2835
Iceland IS 19.3357
Croatia HR 6.2139
Schleswig-Holstein DEF 2.1126
Karlsruhe DE12 0.8433
Seine-Saint-Denis FR106 0.0358</p>
        <p>Area Hausdor Distance Time (ms)</p>
        <p>
          Calculating the Hausdor distance between the original geometric data is quite
expensive, especially for large regions. In order to increase the performance of the
process, as an optional step, we chose to simplify the geometric shapes using the
Ramer-Douglas-Peucker algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], prior to the calculation of the Hausdor
distances.
        </p>
        <p>The Ramer-Douglas-Peucker algorithm starts by considering a line segment
between the rst and last points of the line. Then, it nds the furthest point from
the line segment between the rst and last points of the line. If the point found
is closer than a prede ned distance " to the line segment, all other points that
were not chosen to be used in the solution can be discarded. If the point furthest
from the line segment is greater than ", then the point is used in the solution. The
algorithm then calls itself recursively with the found point and the last point as
parameters.</p>
        <p>Tables 2b and 2c show the Hausdor distance between the NUTS regions and
their matching GADM region, as well as execution times for di erent levels of
simpli cation. As it can be seen, execution times are dramatically reduced, especially
for large regions.</p>
        <p>A further re nement of the process is to calculate the simpli cation distance
for the Ramer-Douglas-Peucker algorithm depending on the Hausdor distance
threshold and therefore of the area of the regions. This is based on the same
principle applied for the Hausdor distance, where small areas require greater
precision than large areas.</p>
        <p>Given two spatial datasets A and B, the algorithm can be summarized as
Algorithm 1.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Implementation</title>
        <p>We implemented the algorithm presented in Section 3.3 using the PostGIS 1.5.2
extension running on PostgreSQL 8.4.8. The computer is running on Ubuntu 10.04
on an Intel SU7300 processor with 4GB DDR3 RAM.</p>
        <p>PostGIS includes the ST Hausdor Distance function, which implements an
approximation to the original algorithm. This approximation can be thought of as
the "Discrete Hausdor distance\, which is the Hausdor distance restricted to
discrete points for one of the geometric shapes. If more precision is needed, the
function receives also an optional "denistyFrac\ parameter which performs a
segment densi cation before computing the discrete Hausdor distance.</p>
        <p>Since we are not concerned about the actual Hausdor distance values, but just
use it as a measure to determine if two regions are similar enough to be considered
spatially co-located, this approximation is su cient.</p>
        <p>
          For the simpli cation of geometric data we will use the ST SimplifyPreserveTopology
function included in PostGIS. This function is a re ned version of ST Simplify,
which is based on the Ramer-Douglas-Peucker algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The query used with PostGIS to nd regions which are supposed to be spatially
co-located is very similar to the one presented below. To avoid having to perform
the same calculations repeatedly, the values of the maximum Hausdor distance
function are cached into the "max hausdor dist\ column. The "geometry\ column
in both tables belongs to the "Geometry\ datatype provided by PostGIS.
end
else
end
end
end
end
input : Datasets A, B
output: Equivalent regions from A and B
Convert the compared resources to a shared coordinate reference system.
Project the data into an equirectangular projection.</p>
        <p>Obtain a representative set of regions in dataset A which intersect regions in
dataset B and have a maximum arbitrary Hausdor distance between each other.
foreach region a of a representative set of regions in dataset A do</p>
        <p>Get the minimum Hausdor distance to a region in dataset B.</p>
        <p>Get the second minimum Hausdor distance to a region in dataset B.</p>
        <p>Calculate the midpoint between the minimum and second minimum Hausdor
distances.
end
Perform a regression on the midpoints between the Hausdor distances to
calculate the Hausdor threshold function.
foreach region a in A do
foreach region b in B do
if a intersects b then</p>
        <p>Calculate the Hausdor distance between a and b.
if Hausdor distance between a and b is lower than the threshold for
the area of a then
a and b can be considered as spatially co-located.
a and b cannot be considered as spatially co-located.</p>
        <p>Algorithm 1: Matching algorithm
SELECT g.gadm_level, g.gadm_id, n.nuts_id</p>
        <p>FROM nuts n INNER JOIN gadm g ON (n.geometry &amp;&amp; g.geometry)
WHERE
n.shape_area BETWEEN (g.shape_area*0.9) AND (g.shape_area*1.1)
AND ST_HausdorffDistance(ST_SimplifyPreserveTopology(n.geometry, 0.5),
ST_SimplifyPreserveTopology(g.geometry,0.5)) &lt; g.max_hausdorff_dist;</p>
        <p>Basically this query selects the identi ers for the GADM region (level and id),
and for the NUTS region (id). The &amp;&amp; operator matches an intersection between
the bounding boxes of the of the regions. Since similar regions will also have a
similar area size, the rst condition in the "where\ clause lters regions that have
a similar area size with an error of 10%. The second condition checks if the discrete
Hausdor distance between the simpli ed geometric shapes is within the limit
calculated by the function presented in Section 3.3.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation</title>
        <p>We can analyze the e ectiveness of the method by looking at the results of the
process of nding spatial equivalences between the NUTS and GADM datasets.</p>
        <p>The NUTS dataset codes the geometric shapes fully in RDF using the NeoGeo
vocabulary, and the coordinate system used is WGS-84. The data is retrieved by
using a Construct SPARQL query and then converted into WKT using XSLT.</p>
        <p>After retrieving the geometric data, it is merged with the GADM dataset using
the method presented in Section 3.3.</p>
        <p>Not all NUTS regions are expected to match a GADM region, since many
NUTS regions represent parts or aggregations of administrative boundaries. Also
a GADM administrative region in a certain level should be able to match di erent
NUTS regions in di erent levels, and vice-versa.</p>
        <p>From the existing 1,671 NUTS regions of the 2008 nomenclature that were
included in the comparison, the algorithm detected 965 matches, from which 13
were false positives, as Table 3 shows.</p>
        <p>Area Hausdor Distance
These false positives are due to the fact that the threshold is set too high for
very small and very large areas. It is desirable to produce a larger gradient for
small areas and a smaller Hausdor distance threshold for large areas. This is still
a matter for further research.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Related work</title>
      <p>The problem of aligning spatial datasets is not new in the Semantic Web
community and much work has been put on nding sensible solutions both at T-Box and
A-Box level.</p>
      <p>
        Ontology alignment is a heavily researched topic. Proposed solutions have been
based on the terminological [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], structural [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], semantic [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and extensional [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
aspects of the aligned ontologies. The last two works consider the alignment of
spatial T-Boxes in particular.
      </p>
      <p>
        Algorithms have also been proposed for feature matching across spatial datasets.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presents a series of algorithms for the integration of features, for which the
location is approximated by a single point. These approaches have the problems
exposed in Section 3.2 and therefore, are not suitable for all cases. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] considers
feature matching as an assignment problem based on a minimization of the
Hausdor distance between the geometric shapes. However, being this a case of Linear
Programming, the method can only be applied for all the geometric shapes of both
datasets at the same time, making it more di cult to integrate to live crawling.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have presented a generic method that can be used to map multiple spatial
datasets. We also showed its functioning by describing the integration between our
two spatial datasets and analyzed its results.</p>
      <p>Although the method has been used successfully to align the GADM and NUTS
datasets, the false positive rate can still be improved when analyzing regions
covering a wide spectrum of area sizes. However, the presented method has proven to
be usable in Semantic Web applications.</p>
      <p>Since the rst experiments showed promising results, we are developing a tool
that automates the whole mapping process. We are also further re ning the
algorithm to improve its precision and performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The authors acknowledge the support of the European Commission's Seventh
Framework Programme FP7/2007-2013 (PlanetData, Grant 257641).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Beeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kanza</surname>
          </string-name>
          , E. Safra, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sagiv</surname>
          </string-name>
          .
          <article-title>Object fusion in geographic information systems</article-title>
          .
          <source>In Proceedings of the thirtieth international conference on very large data bases</source>
          . Volume
          <volume>30</volume>
          , pages
          <fpage>816</fpage>
          {
          <fpage>827</fpage>
          . VLDB Endowment,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.M.V.</given-names>
            <surname>Blazquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Villazon-Terrazas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Saquicela</surname>
          </string-name>
          , A. de Leon,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          .
          <article-title>Geolinked data and inspire through an application case</article-title>
          .
          <source>ACM SIGSPATIAL GIS</source>
          , pages
          <volume>446</volume>
          {
          <fpage>449</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>A. de Leon</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Saquicela</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          <string-name>
            <surname>Vilches</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Villazon-Terrazas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Priyatna</surname>
            , and
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
          </string-name>
          .
          <article-title>Geographical Linked Data: a Spanish use case</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Semantic Systems</source>
          , pages
          <fpage>1</fpage>
          <article-title>{3</article-title>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          .
          <article-title>Reference reconciliation in complex information spaces</article-title>
          .
          <source>In Proceedings of the 2005 ACM SIGMOD international conference on management of data</source>
          , pages
          <volume>85</volume>
          {
          <fpage>96</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.H.</given-names>
            <surname>Douglas</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.K.</given-names>
            <surname>Peucker</surname>
          </string-name>
          .
          <article-title>Algorithms for the reduction of the number of points required to represent a digitized line or its caricature</article-title>
          .
          <source>Cartographica</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>112</volume>
          {
          <fpage>122</fpage>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Dube</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Egenhofer</surname>
          </string-name>
          .
          <article-title>Establishing similarity across multi-granular topological{ relation ontologies</article-title>
          .
          <source>Quality of Context</source>
          , pages
          <volume>98</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>An API for ontology alignment</article-title>
          .
          <source>The Semantic Web{ISWC</source>
          <year>2004</year>
          , pages
          <fpage>698</fpage>
          {
          <fpage>712</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Goodwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dolbear</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hart. Geographical Linked</surname>
          </string-name>
          <article-title>Data: The administrative geography of Great Britain on the Semantic Web</article-title>
          . Transactions in GIS,
          <volume>12</volume>
          :
          <fpage>19</fpage>
          {
          <fpage>30</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hakimpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Aleman-Meza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Spatiotemporal-thematic data processing for the Semantic Web</article-title>
          .
          <source>The Geospatial Web</source>
          , pages
          <volume>79</volume>
          {
          <fpage>89</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Herring. OpenGIS R Implementation</surname>
          </string-name>
          <article-title>Speci cation for Geographic Information - Simple Feature Aaccess - Part 1: Common Architecture</article-title>
          .
          <source>Open Geospatial Consortium</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. International Organization for Standardization.
          <source>ISO 19101. Geographic information Reference model</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. International Organization for Standardization.
          <source>ISO 19137</source>
          .
          <article-title>Geographic information Core pro le of the spatial schema</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.F.</given-names>
            <surname>Goodchild</surname>
          </string-name>
          .
          <article-title>Optimized feature matching in con ation</article-title>
          .
          <source>In Geographic Information Science: 6th International Conference, GIScience</source>
          <year>2010</year>
          , Zurich, Switzerland,
          <source>September 14-17</source>
          ,
          <year>2010</year>
          . Proceedings,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Morie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          .
          <article-title>Semantic integration in text: From ambiguous names to identi able entities</article-title>
          .
          <source>AI magazine</source>
          ,
          <volume>26</volume>
          (
          <issue>1</issue>
          ):
          <fpage>45</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>S.</given-names>
            <surname>Melnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Similarity ooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In Proceedings of the IEEE CS International Conference on Data Engineering</source>
          , pages
          <volume>117</volume>
          {
          <fpage>140</fpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Open Geospatial Consortium Inc.
          <article-title>GeoSPARQL - A geographic query language for RDF data</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Rahul</surname>
            <given-names>Parundekar</given-names>
          </string-name>
          , Craig A.
          <string-name>
            <surname>Knoblock</surname>
          </string-name>
          , and Jose Luis Ambite.
          <article-title>Aligning Ontologies of Geospatial Linked Data</article-title>
          .
          <source>In Proceedings of the Workshop on Linked Spatiotemporal Data</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>U.</given-names>
            <surname>Ramer</surname>
          </string-name>
          .
          <article-title>An iterative procedure for the polygonal approximation of plane curves</article-title>
          .
          <source>Computer Graphics and Image Processing</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <volume>244</volume>
          {
          <fpage>256</fpage>
          ,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>D.A.</given-names>
            <surname>Randell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          .
          <article-title>A spatial logic based on regions and connection</article-title>
          .
          <source>KR</source>
          ,
          <volume>92</volume>
          :
          <fpage>165</fpage>
          {
          <fpage>176</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Gu</surname>
          </string-name>
          <article-title>nter Rote. Computing the minimum Hausdor distance between two point sets on a line under translation</article-title>
          .
          <source>Information Processing Letters</source>
          ,
          <volume>38</volume>
          :
          <fpage>123</fpage>
          {
          <fpage>127</fpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Raj</surname>
            <given-names>Singh</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ron Lake</surname>
          </string-name>
          , Josh Liberman, Mikel Maron, and
          <string-name>
            <given-names>Carl</given-names>
            <surname>Reed</surname>
          </string-name>
          .
          <article-title>An Introduction to GeoRSS: A Standards Based Approach for Geo-enabling RSS feeds</article-title>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Claus</surname>
            <given-names>Stadler</given-names>
          </string-name>
          , Jens Lehmann, Konrad Ho ner, and Soren Auer.
          <article-title>LinkedGeoData: A Core for a Web of Spatial Open Data</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>S.</given-names>
            <surname>Tejada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Minton</surname>
          </string-name>
          .
          <article-title>Learning object identi cation rules for information integration</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>26</volume>
          (
          <issue>8</issue>
          ):
          <volume>607</volume>
          {
          <fpage>633</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>World Wide Web Consortium (W3C) - Semantic Web</surname>
          </string-name>
          Interest Group.
          <article-title>W3C Geo Vocabulary</article-title>
          . http://www.w3.org/
          <year>2003</year>
          /01/geo/,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>