=Paper=
{{Paper
|id=None
|storemode=property
|title=Geographical Service: a compass for the Web of Data
|pdfUrl=https://ceur-ws.org/Vol-628/ldow2010_paper15.pdf
|volume=Vol-628
|dblpUrl=https://dblp.org/rec/conf/www/CorrendoSYGS10
}}
==Geographical Service: a compass for the Web of Data==
Geographical Service: a compass for the Web of Data.
Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt
Intelligence, Agents, Multimedia (IAM) Group
School of Electronics and Computer Science
Southampton, UK
{gc3, ms8, yy1402, nmg, nrs}@ecs.soton.ac.uk
ABSTRACT lishing Public Sector Information (PSI), adopting Linked
This paper describes a Linked Data service that supports Data tenets as future best practices. Data sets recently de-
the navigation and retrieval of geographical entities for the livered to the public include: government expenses, NHS
UK territory. Geographical entities, in the extent of this trusts’ performances, public transportation, and a whole set
paper, are linked data resources that describe objects that of statistics about crime, mortality, census, environment,
have a geographical extension. The service presented in this school and social indicators. Some of the data sets men-
paper allows the querying of resources that contain or are tioned have been published already in Linked Data format,
contained by a given entity URI. The recent publication of others have been translated within the EnAKTing project,
UK Public Sector Information (PSI) data sets has brought and many others are waiting to be freed in the LOD cloud.
to the attention of the community the redundant presence Such a prolific inflow of Linked Data poses new questions
of location based context. At the same time it stresses the and challenges to the community of researchers and develop-
inadequacy of current Linked Data services for exploiting ers: how is it possible to integrate such different information
the semantics of such contextual dimensions for easing en- into a meaningful schema? How is it possible to exploit the
tity retrieval and browsing. We present an approach for a little semantics that goes a long way? How do we choreo-
geography based service that helps in querying qualitative graph the publishing activity of separate organizations from
spatial relations for the UK geography (proper containment the public sector? A common trait of PSI seems to be its
so far). We also provide an exploitation scenario based on locality: local and national public organisations are in fact
a backlinking service and PSI Open Linked Data, published mainly concerned with the collection of data about their
within the EnAKTing project. territory, and the distribution of their resources.
In the WoD vision, links between resources from differ-
ent publishers are particularly important since they are the
Categories and Subject Descriptors ones that allow new data to be discovered and integrated
H.3.4 [Systems and Software]: Distributed systems; H.5.4 into the current discourse. It is frequently the case that
[Web]: Navigation; H.3.5 [Online Information Services]: different URIs are used to refer to the same things, moti-
Web-based services vating the use of co-reference services for the resolution of
instance equivalences. Knowledge of this type of relation-
General Terms ship increases the potential for reuse since information from
previously unknown sources is now accessible, and makes the
Linked Data, geographical services
problem of co-reference resolution of primary importance [9].
In any case, we can expect more and more of this linking
Keywords data to be made available as the number of Linked Data
Linked Data, geographical reasoning, Web of Data publishers increases.
The publication of an authoritative geography of the UK,
1. INTRODUCTION (its regions, counties, districts and their connections) by
Ordnance Survey (the national mapping agency for Great
The Linked Data Initiative represents the first collabora- Britain, OS henceforth) as Linked Data, has opened inter-
tive effort to create a Web of Data (WoD henceforth) of con- esting scenarios for exploiting semantics in contextualising
siderable scale, providing few, simple guidelines for publish- the information sources published on data.gov.uk. The ge-
ing content using well established standards [3]. Such guide- ographical dimensions in PSI data sets are already repre-
lines and standards are leading the way to a new paradigm of sented, but their semantics may be lost if they are not ex-
interaction between government and citizens in the UK. In ploited for creating new collections of data, browsing related
order to pursue better access for citizens to information held resources, and making connections.
by local as well as national public organisations, the UK gov- In this paper, we present a service for querying spatial
ernment has recently launched1 a public initiative for pub- relationships for the UK (extensible to other countries when
1
Public access to the site http://data.gov.uk has been authoritative knowledge bases are available). We start in
granted the 19th of January, 2010. Section 2 where the available knowledge bases are described
Copyright is held by the author/owner(s).
along with an introduction of the qualitative spatial rea-
LDOW2010, April 27, 2010, Raleigh, USA. soning supported. Section 3 provides a rationale for the
.
developing of such service in support of Linked Data brows- ticularly useful in our case for reasoning are the geographic
ing and retrieval. In Section 4 the implementation of the location relationships.
geographical service and its APIs are described. The paper
then concludes with a description of an evaluation of the
presented service using public sector information from the
UK government in Section 5 and some concluding notes in
Section 6.
2. BACKGROUND Figure 1: RCC Eight Jointly Exhaustive and Paris-
The World Wide Web and the WoD can both be under- wise Disjoint Relations
stood as hypertext systems, where the general purpose of
the hypertext system is for information discovery by navi- Within the Linked Data context, there are several ser-
gation. Providing reasoning over hyperlinks for the purpose vices providing resolvable URIs for geographic locations.
of navigation can benefit information discovery. In 1990, Geo-names2 for example, is a community based service that
Nanard brought the concept of “semantic network” from Ar- provides geographical representation of geographical entities
tificial Intelligence [16] into the hypertext field by creating covering all countries worldwide and manages eight million
a Conceptual Hypertext System [13], in which a hyperlink URIs for geographical resources. As a further example, the
can be reasoned by using a domain model classification. In national mapping agency of Great Britain, Ordnance Sur-
the above system, typed links and typed chunks are used vey, maintains a continuously updated database of the to-
to define relationship between types in order to incorporate pography of Great Britain3 and is responsible for surveying
knowledge into a hypertext. This domain model classifica- the boundaries of the administrative areas.
tion is used to classify the documents and documents that In this paper, we exploited the Administrative Geogra-
share metadata, and which are deemed to be similar in some phy ontology provided by Ordnance Survey as an author-
way. The Conceptual Open Hypermedia Service (COHSE) itative knowledge base for querying the UK geographical
project [4] later took this approach forward by providing on- structure [10]. Such ontology explicitly represents the mere-
tological reasoning based on links of services to bridge the ological relationships within the administrative hierarchy, as
navigation gap between the Web and Linked Data, where well as topologically representing the boundary information
the link services provided a mapping between concepts and between administrative units at the same hierarchical level.
the lexical labels on the web page. The following depicts the class hierarchy created in the ad-
Many of the PSI data sets published so far can be plotted ministrative ontology from Ordnance Survey:
within a spatial and temporal dimension, in other words, all
data can be linked together by its spatial and temporal in- • CivilAdministrativeArea
dexes. Within this context, the need to provide services
to reason the spatial and temporal aspects of the linked – EuropeanRegion
data is of key importance. This is unsurprising, the spa- – Country
tial and temporal reasoning have always been considered
to be an important part of common-sense reasoning in Ar- – UnitaryAuthority
tificial Intelligence. In this section, we will mainly focus – MetropolitanDistrict
on qualitative spatial representation and reasoning. There – GreaterLondonAuthority
are two major approaches to qualitative spatial represen-
tation - point based and region based [6]. Region based – LondonBorough
approaches, such as Topology [7] which describe relation- – District
ships between spatial regions are more intuitive than point
– CivilParish
based approaches. The commonly known approaches for for-
malizing topological properties of spatial regions are based – Community
on work from Whitehead [17] and Clarke [5] who axiom-
atized mereotopologies (a theory that combines mereology • Country
and topology) using a single primitive relation and binary
connectivity relationships. By using these primitive rela- The topological relations adopted by this ontology were
tions, other relations can be defined. The Region Con- taken from the RCC8 and correspond to the properties NTP-
nection Calculus (RCC8) proposed by Randell, Cui and Pi, TPPi, EC and EQ respectively. The topology of admin-
Cohn[14] defines a set of jointly exhaustive and pairwise istrative geography of Great Britain contains no overlapping
disjoint relations DC, EC, PO, EQ, TPP, NTPP, TPPi an regions, therefore, the PO relation was not required. Later
NTPPi, as illustrated in Figure 1, and is the most well- version of the ontology reported overlapping entities as well.
known approach in the domain. Since the RCC Calculus is The property of spatial containment used in the OS ontol-
expressed in first-order predicate calculus, a wide range of ogy (equivalent to the NTTP(i) and TTP(i) relations in Fig-
theorem provers can be used for reasoning. For instance, ure 1), implies a mereological relationship. For instance, if
Given a fixed vocabulary of relations, Ri, given R1(x,y) and Hampshire spatially contains Fareham, then Fareham is a
R2(y,z), one can answer questions about the possible rela- part of Hampshire.
tions (from the set Ri) that can hold between x and z by 2
http://www.geonames.org last accessed 10/02/2010
looking up the composition table [8]. Although general 1st- 3
With the exception of Northern Ireland that is covered by
order theorem proving is too inefficient to be useful for many a different agency, the Land and Property Services Northern
purposes [11], it is relatively simple to implement and par- Ireland.
Dereferenceable URIs adopted by the Linked Data com- a solution to overcome such issue that soundly enhance data
munity inherit the same properties of hyperlinks in the Web retrieval and browsing when geographical dimensions are in-
hypertext system, which is (among others) uni-directionality. volved.
The problem of such kind of links is that it is not possible The issue is about the usage of geographical entities for
to navigate back to the original resource by using dereferen- contextualising local information (i.e. information that are
ciation mechanism only. This problem becomes even more related to a particular geographical location, for example
relevant when URIs from previous authoritative data sets the population of a region, the MPs of a constituency, or
are reused in order to provide context and meaning to new various statistical data based on territory). In publishing
data. It is in fact possible to browse from the new data to this kind of information, we provided alignments of our data
the old one, but not the other way around. The back-linking (at least for the geographical dimensions represented in the
service4 we have implemented for UK public sector informa- data) to authoritative knowledge bases using co-reference
tion supports the discovery of back-links between datasets. systems [9]. The problem we have to deal with originates
The benefit of a back-linking service is that it enables users with the fact that, since the public sector information pub-
to discover, from a single dataset, other datasets which ref- lished was originated by different sectors of UK government,
erence back to it, creating therefore data linkage opportuni- the kind of spatial classifications used were highly heteroge-
ties between datasets, increasing the recall of valuable data neous, ranging from local parishes to counties and up to
sources, and doubling the network effect [15] that increases European regions (e.g. South East of England). The differ-
even more when co-reference systems are employed. ent granularities used to classify the data means, in Linked
In this paper, we will mainly focus on exploring the possi- Data terms, that related information sources link to differ-
bility of exploiting semantics from authoritative knowledge ent URIs. Some data may be in fact relevant for constituen-
bases to provide support for consuming Linked Data re- cies, while others may use a different granularity (by county
sources. The service provided will allow users to retrieve for example), and the URI of a county is obviously differ-
contained (and container) entity URIs from popular data ent from the set of URIs of all its constituencies. Available
sets by exploiting a co-reference service. Moreover, a back- knowledge bases about the geographical or administrative
linking service which we previously created in the EnAK- subdivision of a territory can be exploited to cover such gap
Ting project5 , will allow us to retrieve the information re- in data granularity.
sources that addressed such URIs. Far from trying to pro-
vide a general purpose reasoner for geographical entities, the The County of Hampshire
aim of the service described in the following sections is to
os:7000000000017765
exploit the semantically rich knowledge base for UK geogra-
phy in order to ease users’ navigation through the published
PSI data sets. Similar capabilities were already provided by http://mortality.psi.enakting.org
DBpedia Mobile [2], an application that retrieved DBpedia
entries mashed up on a map based on users’ geographical scovo:dimension
mortality:ds_1_299_1 mortality:Hampshire
coordinates. The results provided by our service although mortality:ds_1_299_1
mortality:ds_1_299_1
are based on a spatial subdivision of the territory, subdivi-
sion that is already used by public sector organizations to
classify their data (e.g. crime statistics are based on a police
based subdivision of the territory, while MPs activities are http://crime.psi.enakting.org
scovo:dimension
related to the constituency they were voted in). crime:ds1_37_1 crime:Hampshire
crime:ds1_37_1
crime:ds1_37_1
crime:ds1_37_1
crime:ds1_37_1
3. MOTIVATION
The Linked Data principles [3] promote a Web of Data
whose architecture is inherently decentralised, relying on
http://parliament.psi.enakting.org
data already published (when available) in order to give se- Winchester
dc:coverage
mantics and context to new data. The growth the WoD parliament:member/10395 parliament:cons-426
has experienced over recent years relies on the simplicity Eastleigh
dc:coverage
of publishing and linking data. However, up to now a se- parliament:member/101 parliament:cons-203
mantically coherent orchestration of data publishing is still
Fareham
a mirage. Nevertheless, relying purely on data linkage for dc:coverage
the discovery and browsing of linked data resources would parliament:member/11884 parliament:cons-228
.
lead to a serious knot to untie in the near future. The use .
of ontologies and powerful ontology languages in publishing
Linked Data will be an effort that must be justified against a
scenario where such explicit semantics are rarely exploited. owl:sameAs resource accessible
In publishing UK Public Sector Information (UK PSI),
contained_in resource inaccessible
we have identified an issue concerning data accessibility and
navigability that addresses in particular the missing exploita-
tion of semantics (in this case about qualitative spatial de-
scription of geographical entities). In this paper we present Figure 2: Resource irretrievable via geographical
gap
4
http://backlinks.psi.enakting.org
5
http://enakting.org Taking as an example the PSI data sets published re-
cently6 , we adopted the Ordnance Survey administrative ready partially aligned. The integration of different knowl-
ontology in order to provide context to our data items (i.e. edge bases could lead to the possible exploitation of such
SCOVO items instances7 and local governmental data). The alignments in order to bridge data sets and reuse the avail-
SCOVO ontology allows us to describe statistical data as a able knowledge in more than one context.
collection of Items where each item describes a statistical
value (i.e. a single cell in a multidimensional table) along 4. GEOGRAPHICAL SERVICE FOR UK
with all the dimensions that characterise it. In the case of
UK PSI statistics, many data sets collected were related to To support the user’s experience in browsing and discov-
geographical regions (counties, districts, etc.) ery of new resources in the WoD, we have developed a ge-
In this case, users who wished to discover useful informa- ographical service for querying the UK territory structure.
tion about their own region (e.g. the County of Hampshire, The decision to restrict the service to the UK territory is
top Figure 2) would start their searching activity by brows- mainly due to the fact that the service is mainly used in
ing one of its available URIs. The OS URI for such geo- order to support the discovery of UK PSI resources. Knowl-
graphical entity would be os:70000000000177658 , but any edge about geographical containment is exploited here to
equivalent URI provided by a co-reference system will pro- link information that is contextually related because of their
vide the same results as will be described in the following. spatial dimension.
Using a backlinking service for resolving the entities link- For this use case we have implemented a service for query-
ing to the given URI for Hampshire, we are able to retrieve ing the topological structure of UK (from the broader entity
links to mortality statistics (mortality:ds1_299_[1...3]9 ) to the more particular and the other way around) that can
and crime statistics (crime:ds1_37_[1...11]10 ). In Figure be easily integrated into a web of linked data. The service,
2 those URIs are contained in boxes labelled as “accessible”, accessible at http://geoservice.psi.enakting.org is de-
meaning that those URIs are retrievable following back al- signed in order to be easily integrated both into web appli-
ready existent arcs. Those SCOVO data sets’ items address cations and in linked data resources and it follows few basic
in fact Hampshire county as one of their dimensions. What principles:
is missing is the further data collected that reports valuable
Lightweight Service : The service should be easy to use
information about regions contained in Hampshire. In par-
and resolve a specific problem. A geographical ser-
ticular, within the EnAKTing project, we published linked
vice is a component of the WoD that supports discov-
data about the singular constituencies too. In detail we pub-
ery when geographical entities are involved, it is not a
lished, for each of constituency, an historical record of the
general purpose reasoning engine.
MP in charge for that constituency, his/her voting records
and expenses. In Figure 2 those resources are contained in Linked Data Compatible : The geographical service sho-
dotted boxes labelled as “inaccessible”, meaning that they uld be used as a resolvable URI like any other resource,
cannot be retrieved with the existent knowledge. in order to be used in linked data content as a use-
Example URIs for such inaccessible resources are11 : ful provider of relevant URIs. Moreover the service
should provide the results in a number of different for-
parliament:cons-637 rdfs:label "Winchester" mats that will be decided using content negotiation
parliament:cons-203 rdfs:label "Eastleigh" and HTTP 303 redirection.
parliament:cons-228 rdfs:label "Fareham"
Co-reference Support : The service should exploit the
The URIs for, respectively: Winchester, Eastleigh, and already available knowledge about instance equivalence
Fareham, are therefore not retrieved by the resolution of provided by co-reference systems12 in order to return
the Hampshire URI (obviously) or by the additional service results useful in a number of different data sets.
provided from the backlinking service.
Despite the fact that an entity is still semantically differ- 4.1 Data collection and normalisation
ent from the parts that compose it, the information relevant OS provides an ontology13 and an RDF dump about spa-
for all its constituting parts can still be relevant for the en- tial relations between UK regions. The triples from OS
tity as a whole. Without covering such geographical gap it have been parsed and only the relation of physical contain-
is not possible to access all the relevant sets of information, ments have been retained, normalised and completed with
provide them to the user or process them in some way in the inverse relations in a separate knowledge base. The
order to summarise their content. service presented here, for the sake of simplicity and effi-
The aim of this research is to exploit authoritative knowl- ciency14 , manages only the NTPP, and the relative inverse,
edge bases in order to cover such gaps, allowing therefore the NTPPi relations. The knowledge extracted from the OS
citizens to retrieve information resources relevant to their data set has been then normalised in terms of an internal
region of interest. Moreover, there are many data sources ontology that represent qualitative spatial relations.
that describes geographical resources, and all of those are al- The normalisation step has been introduced in order to
6 allow the service to integrate further geographical hierar-
http://browser.psi.enakting.org
7 chies in the future (e.g. geonames provides containment of
http://purl.org/NET/scovo
8 12
PREFIX os: Like http://sameas.org
9
PREFIX mortality: ontology/SpatialRelations/v0.2/SpatialRelations.
10
PREFIX crime: owl
11
PREFIX parliament: in order to provide a very focused service.
http://dbpedia.org http://crime.psi.enakting.org
dbpedia:Hampshire crime:Hampshire
http://data.ordnancesurvey.co.uk
Hampshire county
os:7000000000017765
Fareham Winchester
os:7000000000025157 ... os:7000000000025128
http://dbpedia.org http://parliament.psi.enakting.org
dbpedia:Fareham parliament:cons-637
(UK Parliament constituency)
... ...
dbpedia:Winchester
(UK Parliament constituency) parliament:cons-228
owl:sameAs part part_of
Figure 3: Coupling of co-reference and Ordnance Survey geographic ontology
geographical features). The future integration of qualita- a target data set provided by the user, see bottom part of
tive spatial knowledge bases is devised in order to extend Figure 3). The co-reference service used in this paper is
the service outside the borders of UK and for providing an the http://sameas.org service from Glaser et al. [9]. The
assessment of co-references between geographical entities. relevant bundles have been retrieved from the service and
A simple example of how the normalised triples from OS cached for performance. It is important to note that, in or-
ontology are used in coupling with a co-reference service for der to chose the wanted quality of service, one could opt for
bridging the navigational gap for different data sets is de- using one co-reference service instead of another. The func-
picted in Figure 3; in the figure it is possible to see that a sin- tionality provided is transparent from the provenance of the
gle statement from OS describing the fact that the County co-reference bundles.
of Hampshire contains Fareham and Winchester15 : Exploiting co-reference services and OS ontology, it is
therefore possible to infer containment relation between re-
os:7...17765 os:contains os:7...25157. sources from different data sets. For example:
os:7...17765 os:contains os:7...25128.
dbpedia:Hampshire owl:sameAs os:7...17765
has been translated into an internal representation con- AND
taining both relations: part, and part of; like the following: os:7...17765 geoservice:part os:7...25128
AND
os:7...17765 geoservice:part os:7...25157. os:7...25128 owl:sameAs dbpedia:Winchester
os:7...25157 geoservice:part_of os:7...17765 . =⇒
os:7...17765 geoservice:part os:7...25128. dbpedia:Hampshire geoservice:part dbpedia:Winchester
os:7...25128 geoservice:part_of os:7...17765 .
4.2 RESTful API
The containment relations so normalised (see central part The service is accessed via HTTP GET requests and pro-
of Figure 3) are then internally stored in the system and vide two essential information: the list of entities contained
queried for serving users requests. the input URI, and the list of entities that contains the in-
The normalised containment relations are integrated with put URI. The interface is then accessible via the following
the information provided by the co-reference system that URIs:
allows to bridge different data sources both in the input
phase (i.e. where the input URI must be translated in http://geoservice.psi.enakting.org/{command}/
the OS equivalent, see top part of Figure 3) and the out- {dictionary}/{format}/{URI}
put phase (i.e. when the results must be translated into
In the above API description, the parameters are enclosed
15
OS URIs are shortened, the trail of ’0’ are replaced by ’. . . ’. in brackets and their meaning is the following:
http://geoservice.psi.enakting.org
2.
1.
dbpedia:Hampshire co-reference
geoservice
http://sameas.org
4.
5.
3. os:7000000000017765
dbpedia:Fareham_
(UK_Parliament_constituency)
geoservice:KB
(4store)
Figure 4: Overall architecture and interaction with co-reference system
command: can be either contains or container: in the use one of the data set of preference (e.g. DBpedia or
first case it returns the URIs of the entities contained Geonames) and ask for contained, or container, enti-
by the input URI; in the second case it returns the ties in one of the desired target data set (e.g. again
URIs of the entities that contains the input URI. DBpedia, Geonames, or enAKTing published informa-
tion).
dictionary: can be one of the followings (dbpedia, os,
statistics, geonames, enakting, opencyc, open- The service returns a list of URIs if the content type is
lylocal, or none) and instructs the service to use the text or json. The RDF content, for both rdf and turtle,
co-reference system in order to retrieve the equivalent describes the containment relations between the input URI
URIs in the respective data sets (i.e. DBpedia [1], and the resulting resources. In both cases the returned URIs
Ordnance Survey [10], UK National Statistics16 , Geon- are translated into the desired address space.
ames17 , PSI enAKTing18 , OpenCYC [12], Openly Lo- The procedure followed by the service, and an overall ar-
cal project19 ). The value none is used for not applying chitecture, is depicted in Figure 4, and can be describe as
any filter. In this case the URIs returned will be the follows:
ones from the Ordnance Survey plus the ones returned
from the co-reference service. 1. user generated request (HTTP GET request)
2. normalisation of the input URI to OS
format: the format parameter is optional and can be one
of the followings (rdf, text, ttl, or json). The value 3. computation of the property closure (i.e. part or part-
of the format parameter decide then the format of of ) over the normalised URI
the returned content: RDF/XML for rdf ; list of URIs
separated by new lines for text; RDF/Turtle for tur- 4. optional phase of translation and filtering of the re-
tle; and finally JSON20 for json. If the parameter sulting URIs to the target URI space
is not given the right content is decided using the
5. formatted content, as per user request, returned to the
303 HTTP redirection. Even for the content requests
user (HTTP Response)
Accept:text/html done using the browser, the client
is redirected to the HTML page of the service ini- As an example, consider the case of a software client
tialised with the input URI. who needs to know all the geographical entities contained
in the Hampshire, the request can adopt as an input one
URI: is the URI of the input entity to query using the ser- of many available URIs describing the Hampshire county,
vice. The service uses a co-reference system in order to a popular choice could be the DBpedia URI (i.e. URI =
find the equivalent URI for the Ordnance Survey and http://dbpedia.org/resource/Hampshire). The agent can
the Geonames data set. This means that the user can then explicit the desired target data set, for example the DB-
16
http://statistics.data.gov.uk last accessed 10/02/10 pedia data set itself (i.e. dictionary = dbpedia), and in-
17
http://geonames.org last accessed 10/02/10 struct the server to return the JSON format of the document
18
http://browser.psi.enakting.org last accessed (i.e. HTTP header contains Accept:application/json).
10/02/10 The HTTP request will be then the following:
19
Community devoted to provide linked data access for lo-
cal government data, see http://openlylocal.com last ac- GET /contains/dbpedia/http://dbpedia.org/
cessed 10/02/10 resource/Hampshire
20
http://json.org last accessed 10/02/10 Host: geoservice.psi.enakting.org
Accept: application/json http://dbpedia.org/resource/Southampton_Test_
%28UK_Parliament_constituency%29
http://dbpedia.org/resource/Southampton_Itchen_
And the service will return a response redirecting the client
%28UK_Parliament_constituency%29
to the right URL:
HTTP/1.1 302 Found From those URIs we are then able to check then the iden-
Location: http://geoservice.psi.enakting.org/ tities of the MPs in charge (in the Southampton page from
contains/dbpedia/json/http://dbpedia.org/ DBpedia their are mentioned both as leaders of the city
resource/Hampshire whereas an MP is actually in charge only to its constituency
where s/he has been elected. Asking then for the URIs from
the data sets provided by the EnAKTing project we would
That, once resolved, will finally return the desired content, be able to retrieve the followings:
a JSON array of strings that represents the URI of the DB-
pedia resource describing entities contained in Hampshire: http://parliament.psi.enakting.org/id/cons-536
HTTP/1.1 200 OK http://parliament.psi.enakting.org/id/cons-535
Content-Type: application/json
Following such links the user would be able then to re-
trieve other information about the MPs from each constituen-
["http://dbpedia.org/resource/North_East
cy (even retrieving an historical record of them) and further
_Hampshire_%28UK_Parliament_constituency%29",
information about their political activity.
"http://dbpedia.org/resource/East_Hampshire
_%28UK_Parliament_constituency%29", ...
5. EVALUATION
The client agent can obviously immediately refer to the We have evaluated our geographical service from two dif-
right URL and retrieve the content in the right format straight ferent perspectives. The first one looks at the direct ben-
away. A useful way to exploit such service can be seen when efit that our backlinking service for Public Sector Informa-
data sets other than OS one are queried. Not every data tion22 would gain from expanding its navigability through
set in fact provides a clear semantic representation about geographic containments (see Section 5.1). The second eval-
mereological relations. This is due to the fact that the focus uation is more analytic and looks at the new knowledge
of many data set is to provide information about a particu- generated as part of the translation process from an author-
lar region: encyclopaedic information from DBpedia, statis- itative geographic closure to the covered vocabularies (see
tics information from the UK National Statistics, geograph- Section 5.2) .
ical features from Geonames21 , conceptual description from
Open CYC, local government information from Openly Lo- 5.1 Backlinking Service Integration
cal, and UK PSI from EnAKTing. This section studies the navigability improvement that
Using the service presented in this paper is easy to ex- our backlinking service for the PSI in the UK would ex-
ploit the OS administrative ontology in order to retrieve ge- periment by plugging the containments from a wide range
ographically relevant information regardless from the start- of vocabularies.The PSI Backlinking Service provides an ac-
ing data set. As an example, let us consider the case where cess point to retrieve backlinks from Foreign URIs. Foreign
a user may want to retrieve information about local govern- URIs make data discovery difficult because it is not possible
ment of its own city, for example about Southampton, UK. to navigate the RDF documents of the WoD bidirectionally.
The easiest thing to do is to start from a recognizable URI http://backlinks.psi.enakting.org provides an API to
such as the DBpedia ones: retrieve collections of backlinks for a given URI. The study
of the covered knowledge bases23 in the UK PSI Backlinking
http://dbpedia.org/resource/Southampton Service made explicit that one of the most highly connected
From this URI the user can retrieve general information data sets in the PSI WoD are the ones representing some
about the city, even the names about some of the city lead- type of geographic information.
ers. No further information is available on the Southampton In this evaluation we have used the Backlinking Service
DBpedia page about local government information. Asking as a client of the Geographical Service in order to expand
the geographical service to return the contained entities from the backlinks that we can get from geographic resources. We
the Openly Local site we can then retrieve more resources: have kept the decentralization nature of the Backlinking and
Geographical services and basically the Backlinking Service
http://openlylocal.com/id/wards/4925 performs HTTP requests to get the geographic containments
http://openlylocal.com/id/wards/4929 (see Figure 5). When the geography extension is enabled
... the backlinking service gets the list of contained entities for
http://openlylocal.com/id/wards/4938 the input URI and returns the backlinks connected to any
Those URIs are the ones published for each one of the URI part of containments. The request to the Geography
wards present in the city of Southampton and provides not Service is performed using “contains” as command JSON as
only the names of the local councillors but also some other format and “none” as dictionary. The selected dictionary is
statistics about the ward (i.e. demographics and religious “none” because the Backlinking Service doesn’t know before
statistics). Moreover, asking again the service for the DB- 22
http://backlinks.psi.enakting.org last accessed
pedia URIs we are able to retrieve the followings: 10/02/10
21 23
Geonames provides a containment relation that does not http://backlinks.psi.enakting.org#KBs last accessed
however reflect any administrative subdivision 10/02/10
HTTP GET http://backlinks.psi.enakting.org/resource/URI?geo=enabled
linked to dbpedia:Hampshire or equivalent URIs but to ge-
ographic containments of it in at least one of the data sets
http://backlinks.psi.enakting.org
(RESTFul API) covered by the Geographical Service.
This scenario has shown one of possible scenarios where
the exploitation of explicit semantic can improved the ac-
HTTP GET http://geoservice.psi.enakting.org/contains/none/json/URI cessibility of the resources in the Web of Data. In esence
for URI' in geoPartonomyBundle:
BackLinks += GetBackLinks(URI')
the backlinking service is improving its graph connectivity
http://geoservice.psi.enakting.org by being aware of the new layer of Linked Data that the
(RESTFul API)
Backlinks
Knowledge Base
Geographical Service publishes via its RESTFul API. This
(4store)
case study also shows how different Linked Data RESTFul
Co-reference services (such as co-reference, backlinking and geographical
http://sameAs.org
services) can cooperate in a layer built on top of current
Web of Data to improve its navigability.
Figure 5: Interaction of the backlinking and geo- 5.2 Vocabulary Closure Coverage
graphical services
The geographical service can be seen as an extra layer of
linked data based on an initial geographic closure provided
by Ordnance Survey and its extensions to other data sets via
hand what type of URIs will be the source of backlinks for co-references. This extra layer of linked data is obviously an
a certain geographic region. So as to improve the coverage added value to the Web of Data. This section analyses the
we aim to get all the possible containments from all the interlinking improvement between the data sets by means of
dictionaries supported in the geographical service. number of triples produced by the Geographical Service.
There is a natural outcome from this integration and it can Table 1 represents the amount of triples generated by our
be shown using how the systems works when asking for back- service in terms of number of triples that contain where the
links connected the URI dbpedia:Hampshire. Prior to the use predicate is geoservice:part or geoservice:part of. This
of the geographical extension a request to retrieve backlinks table shows the numbers of triples linking every pair of data
for dbpedia:Hampshire would just give back 14 URIs related set in the system. For instance our Geographical Service has
the UK region of dbpedia:Hampshire or any equivalent URI produced 30995 geographic containments between dbpedia
part of the same co-reference bundle in sameAs.org (see Fig- and mortality.psi.enakting.org.
ure 6). This same request when the geographical service is Of particular interest are the results from the geonames
integrated returns the following additional backlinks: data set. In fact, the number of containment relations within
such dataset is quite small compared to the number of con-
• 6 010 resources that represent schools from http:// tainment relations provided by geonames itself (a rough esti-
education.data.gov.uk. These RDF documents rep- mate done by the authors counts about 9K relations). Such
resents the totality of education entities in the region additional source of spatial knowledge open a scenario where
of Hampshire. the two knowledge bases can be compared and integrated for
providing a better recall for the service. An important as-
• 42 mortality statistical resources from http://mortali pect to take into account in such a scenario would be the
ty.psi.enakting.org. This statistics are segmented quality of the results computed by the integration.
by geography and gender. The data seed that triggered this new knowledge is the OS
to OS containments, 60M of statements. The total number
• 981 CO2 emission measurements from http://co2emis of triples generated are 223M and these are partially inter-
sion.psi.enakting.org. These resources represent linking every pair of data sets. Partially because the com-
the CO2 emissions for the region of Hampshire be- pleteness of every pair of datasets’ closure relies on the accu-
tween 2005 and 2007. racy of the co-reference bundles extracted from sameAs.org.
As the number of co-references from sameAs.org grows and
• 300 resources with information of energy consumption improves its accuracy the Geographical Service will reflect
from http://energy.psi.enakting.org. This data those changes automatically. This side effects is one of key
sets publishes the energy consumption in the UK in aspects of the Web of Data and its decentralized nature.
respect to fuel in the road network between 2005 and
2007. These results represent all the RDF documents
linked to geographical regions contained in Hampshire.
6. CONCLUSIONS
We have presented in this paper a service that helps users
• 4 788 population census information segmented by age in browsing geographical resources from different datasets
and sex from http://population.psi.enakting.org. (dbpedia, geonames, data.gov.uk. psi.enakting.org, . . . ) by
exploiting an authoritative ontology for the UK territory
• 224 parliamentary identities from http://parliament. (Ordnance Survey). One of the novel aspects of this research
psi.enakting.org. These represent mandates for dif- is the use of a co-reference system (http://sameas.org) to
ferent members of the UK Parliament and House of extend the containments from one geographic data set to
Commons. others where such containments are not so rich or com-
plete. Moreover, the added value of integrating such geo-
Figure 6 shows the output of the backlinking service with graphical service with a backlinking service has been shown
and without geographical extensions in the Backlinking Ser- with respect to demonstrate a possible exploitation scenario
vice. All the resources enumerated above are not specifically on Public Sector Information. Due to the particular na-
Backlinks Geographical Service Integration Disabled
http://backlinks.psi.enakting.org/resource/doc/http://dbpedia.org/resource/Hampshire
Backlinks Geographical Service Integration Enabled
http://backlinks.psi.enakting.org/resource/doc/http://dbpedia.org/resource/Hampshire?geo=enabled
Figure 6: Output comparison for dbpedia:Hampshire with and without geographical partonomies
Table 1: Datasets linkage improvement statistics
OS dbpedia statistics mortality parliament crime geonames openlylocal opencyc
OS 60469910 1757760 45354078 1035901 1338214 235906 94072 18559453 1106900
dbpedia 1757760 59640 1393322 30995 46077 9570 3035 540619 35250
statistics 45354078 1393322 36179867 813660 1056892 206217 71232 14430773 819965
mortality 1035901 30995 813660 19109 23929 4607 1488 344436 17415
parliament 1338214 46077 1056892 23929 37631 7654 2410 436883 28070
crime 235906 9570 206217 4607 7654 2249 334 82559 4160
geonames 94072 3035 71232 1488 2410 334 224 26427 2475
openlylocal 18559453 540619 14430773 344436 436883 82559 26427 6498462 312120
opencyc 1106900 35250 819965 17415 28070 4160 2475 312120 27975
ture of the knowledge provided (i.e. closure of geographi- Hampshire (i.e. not contained any more), although being
cal containment properties), there is the possibility of over- still part of it as a ceremonial county. Versioning of infor-
whelming the user with information when asking about top mation resources is an hot topic in Linked Data community
level features (e.g. England). In order to cope with this and it is even more important when publishing Public Sec-
eventuality, the service will be provided soon with the ca- tor Information, whose content and validity must be put
pability to limit the results by depth. Therefore, when into context.
asked about all the entities contained in the top level fea- The research work reported here tackles an important
ture England at the first level of depth, the service will re- aspect of Linked Data, the exploitation of explicit seman-
turn only: North East, North West, South East, Eastern, tic content for enhancing resource retrieval and browsabil-
South West, East Midlands, West Midlands, Yorkshire & ity. The choice to tackle geographical knowledge rather
the Humber, Scotland, Wales, London (different from the than some other data facet is mainly due to the analysis
City of London). of the available data sources, their structure and the avail-
Another important aspect not tackled in this work, and able knowledge exploitable for a better integration of the
subject of future research, is the temporal extent of admin- available information.
istrative divisions. The version of administrative geography The use of co-reference systems allowed us to exploit the
of UK will change shortly and has changed frequently during knowledge created in one organization (Ordnance Survey in
the years (e.g. the number and borders of constituencies are this case) in different, and potentially novel, data collec-
reviewed every 10 or 15 years). New entities can be defined, tions, overlapping a qualitative spatial dimension that was
old ones can be abolished or change status. For example not present before. Such reuse of knowledge is potentially in-
Southampton, once part of Hampshire, became a Unitary novative but poses many questions about the management
Authority on the 1st of April 1997. Since then, Southamp- of the quality of the knowledge and the entity alignments
ton has been administratively detached from the county of used. The presence, integration, and comparison of different
geographical knowledge bases can be beneficial for the main- [13] J. Nanard and M. Nanard. Using structured types to
tenance and discovery of entity alignments of good quality. incorporate knowledge in hypertext. In HYPERTEXT
Another interesting aspect related to the use of co-reference ’91: Proceedings of the third annual ACM conference
services integrated with an additional knowledge source is on Hypertext, pages 329–343, New York, NY, USA,
the ability to exploit the data semantics in order to change 1991. ACM.
the navigability of the datasets. Such change in the naviga- [14] A. A. Randell, Z. Cui, and A. G. Cohn. A spatial logic
bility is clear when new arcs are provided within the same based on regions and connection. In B. Nebel, W.
data set (e.g. between dbpedia resource where they were not Swartout, and C. Rich, editors, Principles of
linked before) or between resources belonging to different Knowledge Representation and Reasoning, 1992.
data sets (see Table 1 for a complete account of the data [15] J. Rohlfs. A theory of interdependent demand for a
sets connected). communications service. The Bell Journal of
Economics and Management Science, 5(1):16–37,
7. ACKNOWLEDGEMENTS 1974.
This work was supported by the EnAKTing project funded [16] J. Sowa and A. Borgida. Principles of semantic
by the Engineering and Physical Sciences Research Council networks : explorations in the representation of
under contract EP/G008493/1. knowledge. Morgan Kaufmann, 1991.
[17] A. N. Whitehead. Process and Reality. The MacMillan
8. REFERENCES Company, New York, NY, USA, 1929.
[1] S. Auer, S. Auer, C. Bizer, G. Kobilarov, J. Lehmann,
and Z. Ives. Dbpedia: A nucleus for a web of open
data. in 6th International Semantic Web Conference,
Busan, Korea, pages 11–15, 2007.
[2] C. Becker and C. Bizer. DBpedia mobile: A
location-enabled linked data browser. In 1st Workshop
about Linked Data on the Web (LDOW2008), April
2008.
[3] T. Berners-Lee. Design issues: Linked data.
http://www.w3.org/DesignIssues/LinkedData.html,
2006.
[4] L. Carr, W. Hall, S. Bechhofer, and C. Goble.
Conceptual linking: ontology-based open hypermedia.
In WWW ’01: Proceedings of the 10th international
conference on World Wide Web, pages 334–342, New
York, NY, USA, 2001. ACM.
[5] B. L. Clarke. A calculus of individuals based on
“connection”. Notre Dame J. Formal Logic,
22(3):204–218, 1981.
[6] A. G. Cohn, B. Bennett, J. Gooday, and N. M. Gotts.
Qualitative spatial representation and reasoning with
the region connection calculus. Geoinformatica,
1(3):275–316, 1997.
[7] M. J. Egenhofer. A formal definition of binary
topological relationships. In 3rd International
Conference, on Foundations of Data Organization and
Algorithms (FODO), pages 457–472, New York, NY,
USA, 1989. Springer-Verlag New York, Inc.
[8] C. Freksa. Temporal reasoning based on
semi-intervals. Artif. Intell., 54(1-2):199–227, 1992.
[9] H. Glaser, A. Jaffri, and I. Millard. Managing
co-reference on the semantic web. In WWW2009
Workshop: Linked Data on the Web (LDOW2009),
April 2009.
[10] J. Goodwin, C. Dolbear, and G. Hart. Geographical
linked data: The administrative geography of great
britain on the semantic web. Transaction in GIS,
12(1):19–30, February 2009.
[11] Grzegorczyk. Undecidability of some topological
theories. Fundamenta Mathematicae, 38:137–152,
1951.
[12] D. B. Lenat. Cyc: a large-scale investment in
knowledge infrastructure. Commun. ACM,
38(11):33–38, 1995.