Geographical Service: a compass for the Web of Data. Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt Intelligence, Agents, Multimedia (IAM) Group School of Electronics and Computer Science Southampton, UK {gc3, ms8, yy1402, nmg, nrs}@ecs.soton.ac.uk ABSTRACT lishing Public Sector Information (PSI), adopting Linked This paper describes a Linked Data service that supports Data tenets as future best practices. Data sets recently de- the navigation and retrieval of geographical entities for the livered to the public include: government expenses, NHS UK territory. Geographical entities, in the extent of this trusts’ performances, public transportation, and a whole set paper, are linked data resources that describe objects that of statistics about crime, mortality, census, environment, have a geographical extension. The service presented in this school and social indicators. Some of the data sets men- paper allows the querying of resources that contain or are tioned have been published already in Linked Data format, contained by a given entity URI. The recent publication of others have been translated within the EnAKTing project, UK Public Sector Information (PSI) data sets has brought and many others are waiting to be freed in the LOD cloud. to the attention of the community the redundant presence Such a prolific inflow of Linked Data poses new questions of location based context. At the same time it stresses the and challenges to the community of researchers and develop- inadequacy of current Linked Data services for exploiting ers: how is it possible to integrate such different information the semantics of such contextual dimensions for easing en- into a meaningful schema? How is it possible to exploit the tity retrieval and browsing. We present an approach for a little semantics that goes a long way? How do we choreo- geography based service that helps in querying qualitative graph the publishing activity of separate organizations from spatial relations for the UK geography (proper containment the public sector? A common trait of PSI seems to be its so far). We also provide an exploitation scenario based on locality: local and national public organisations are in fact a backlinking service and PSI Open Linked Data, published mainly concerned with the collection of data about their within the EnAKTing project. territory, and the distribution of their resources. In the WoD vision, links between resources from differ- ent publishers are particularly important since they are the Categories and Subject Descriptors ones that allow new data to be discovered and integrated H.3.4 [Systems and Software]: Distributed systems; H.5.4 into the current discourse. It is frequently the case that [Web]: Navigation; H.3.5 [Online Information Services]: different URIs are used to refer to the same things, moti- Web-based services vating the use of co-reference services for the resolution of instance equivalences. Knowledge of this type of relation- General Terms ship increases the potential for reuse since information from previously unknown sources is now accessible, and makes the Linked Data, geographical services problem of co-reference resolution of primary importance [9]. In any case, we can expect more and more of this linking Keywords data to be made available as the number of Linked Data Linked Data, geographical reasoning, Web of Data publishers increases. The publication of an authoritative geography of the UK, 1. INTRODUCTION (its regions, counties, districts and their connections) by Ordnance Survey (the national mapping agency for Great The Linked Data Initiative represents the first collabora- Britain, OS henceforth) as Linked Data, has opened inter- tive effort to create a Web of Data (WoD henceforth) of con- esting scenarios for exploiting semantics in contextualising siderable scale, providing few, simple guidelines for publish- the information sources published on data.gov.uk. The ge- ing content using well established standards [3]. Such guide- ographical dimensions in PSI data sets are already repre- lines and standards are leading the way to a new paradigm of sented, but their semantics may be lost if they are not ex- interaction between government and citizens in the UK. In ploited for creating new collections of data, browsing related order to pursue better access for citizens to information held resources, and making connections. by local as well as national public organisations, the UK gov- In this paper, we present a service for querying spatial ernment has recently launched1 a public initiative for pub- relationships for the UK (extensible to other countries when 1 Public access to the site http://data.gov.uk has been authoritative knowledge bases are available). We start in granted the 19th of January, 2010. Section 2 where the available knowledge bases are described Copyright is held by the author/owner(s). along with an introduction of the qualitative spatial rea- LDOW2010, April 27, 2010, Raleigh, USA. soning supported. Section 3 provides a rationale for the . developing of such service in support of Linked Data brows- ticularly useful in our case for reasoning are the geographic ing and retrieval. In Section 4 the implementation of the location relationships. geographical service and its APIs are described. The paper then concludes with a description of an evaluation of the presented service using public sector information from the UK government in Section 5 and some concluding notes in Section 6. 2. BACKGROUND Figure 1: RCC Eight Jointly Exhaustive and Paris- The World Wide Web and the WoD can both be under- wise Disjoint Relations stood as hypertext systems, where the general purpose of the hypertext system is for information discovery by navi- Within the Linked Data context, there are several ser- gation. Providing reasoning over hyperlinks for the purpose vices providing resolvable URIs for geographic locations. of navigation can benefit information discovery. In 1990, Geo-names2 for example, is a community based service that Nanard brought the concept of “semantic network” from Ar- provides geographical representation of geographical entities tificial Intelligence [16] into the hypertext field by creating covering all countries worldwide and manages eight million a Conceptual Hypertext System [13], in which a hyperlink URIs for geographical resources. As a further example, the can be reasoned by using a domain model classification. In national mapping agency of Great Britain, Ordnance Sur- the above system, typed links and typed chunks are used vey, maintains a continuously updated database of the to- to define relationship between types in order to incorporate pography of Great Britain3 and is responsible for surveying knowledge into a hypertext. This domain model classifica- the boundaries of the administrative areas. tion is used to classify the documents and documents that In this paper, we exploited the Administrative Geogra- share metadata, and which are deemed to be similar in some phy ontology provided by Ordnance Survey as an author- way. The Conceptual Open Hypermedia Service (COHSE) itative knowledge base for querying the UK geographical project [4] later took this approach forward by providing on- structure [10]. Such ontology explicitly represents the mere- tological reasoning based on links of services to bridge the ological relationships within the administrative hierarchy, as navigation gap between the Web and Linked Data, where well as topologically representing the boundary information the link services provided a mapping between concepts and between administrative units at the same hierarchical level. the lexical labels on the web page. The following depicts the class hierarchy created in the ad- Many of the PSI data sets published so far can be plotted ministrative ontology from Ordnance Survey: within a spatial and temporal dimension, in other words, all data can be linked together by its spatial and temporal in- • CivilAdministrativeArea dexes. Within this context, the need to provide services to reason the spatial and temporal aspects of the linked – EuropeanRegion data is of key importance. This is unsurprising, the spa- – Country tial and temporal reasoning have always been considered to be an important part of common-sense reasoning in Ar- – UnitaryAuthority tificial Intelligence. In this section, we will mainly focus – MetropolitanDistrict on qualitative spatial representation and reasoning. There – GreaterLondonAuthority are two major approaches to qualitative spatial represen- tation - point based and region based [6]. Region based – LondonBorough approaches, such as Topology [7] which describe relation- – District ships between spatial regions are more intuitive than point – CivilParish based approaches. The commonly known approaches for for- malizing topological properties of spatial regions are based – Community on work from Whitehead [17] and Clarke [5] who axiom- atized mereotopologies (a theory that combines mereology • Country and topology) using a single primitive relation and binary connectivity relationships. By using these primitive rela- The topological relations adopted by this ontology were tions, other relations can be defined. The Region Con- taken from the RCC8 and correspond to the properties NTP- nection Calculus (RCC8) proposed by Randell, Cui and Pi, TPPi, EC and EQ respectively. The topology of admin- Cohn[14] defines a set of jointly exhaustive and pairwise istrative geography of Great Britain contains no overlapping disjoint relations DC, EC, PO, EQ, TPP, NTPP, TPPi an regions, therefore, the PO relation was not required. Later NTPPi, as illustrated in Figure 1, and is the most well- version of the ontology reported overlapping entities as well. known approach in the domain. Since the RCC Calculus is The property of spatial containment used in the OS ontol- expressed in first-order predicate calculus, a wide range of ogy (equivalent to the NTTP(i) and TTP(i) relations in Fig- theorem provers can be used for reasoning. For instance, ure 1), implies a mereological relationship. For instance, if Given a fixed vocabulary of relations, Ri, given R1(x,y) and Hampshire spatially contains Fareham, then Fareham is a R2(y,z), one can answer questions about the possible rela- part of Hampshire. tions (from the set Ri) that can hold between x and z by 2 http://www.geonames.org last accessed 10/02/2010 looking up the composition table [8]. Although general 1st- 3 With the exception of Northern Ireland that is covered by order theorem proving is too inefficient to be useful for many a different agency, the Land and Property Services Northern purposes [11], it is relatively simple to implement and par- Ireland. Dereferenceable URIs adopted by the Linked Data com- a solution to overcome such issue that soundly enhance data munity inherit the same properties of hyperlinks in the Web retrieval and browsing when geographical dimensions are in- hypertext system, which is (among others) uni-directionality. volved. The problem of such kind of links is that it is not possible The issue is about the usage of geographical entities for to navigate back to the original resource by using dereferen- contextualising local information (i.e. information that are ciation mechanism only. This problem becomes even more related to a particular geographical location, for example relevant when URIs from previous authoritative data sets the population of a region, the MPs of a constituency, or are reused in order to provide context and meaning to new various statistical data based on territory). In publishing data. It is in fact possible to browse from the new data to this kind of information, we provided alignments of our data the old one, but not the other way around. The back-linking (at least for the geographical dimensions represented in the service4 we have implemented for UK public sector informa- data) to authoritative knowledge bases using co-reference tion supports the discovery of back-links between datasets. systems [9]. The problem we have to deal with originates The benefit of a back-linking service is that it enables users with the fact that, since the public sector information pub- to discover, from a single dataset, other datasets which ref- lished was originated by different sectors of UK government, erence back to it, creating therefore data linkage opportuni- the kind of spatial classifications used were highly heteroge- ties between datasets, increasing the recall of valuable data neous, ranging from local parishes to counties and up to sources, and doubling the network effect [15] that increases European regions (e.g. South East of England). The differ- even more when co-reference systems are employed. ent granularities used to classify the data means, in Linked In this paper, we will mainly focus on exploring the possi- Data terms, that related information sources link to differ- bility of exploiting semantics from authoritative knowledge ent URIs. Some data may be in fact relevant for constituen- bases to provide support for consuming Linked Data re- cies, while others may use a different granularity (by county sources. The service provided will allow users to retrieve for example), and the URI of a county is obviously differ- contained (and container) entity URIs from popular data ent from the set of URIs of all its constituencies. Available sets by exploiting a co-reference service. Moreover, a back- knowledge bases about the geographical or administrative linking service which we previously created in the EnAK- subdivision of a territory can be exploited to cover such gap Ting project5 , will allow us to retrieve the information re- in data granularity. sources that addressed such URIs. Far from trying to pro- vide a general purpose reasoner for geographical entities, the The County of Hampshire aim of the service described in the following sections is to os:7000000000017765 exploit the semantically rich knowledge base for UK geogra- phy in order to ease users’ navigation through the published PSI data sets. Similar capabilities were already provided by http://mortality.psi.enakting.org DBpedia Mobile [2], an application that retrieved DBpedia entries mashed up on a map based on users’ geographical scovo:dimension mortality:ds_1_299_1 mortality:Hampshire coordinates. The results provided by our service although mortality:ds_1_299_1 mortality:ds_1_299_1 are based on a spatial subdivision of the territory, subdivi- sion that is already used by public sector organizations to classify their data (e.g. crime statistics are based on a police based subdivision of the territory, while MPs activities are http://crime.psi.enakting.org scovo:dimension related to the constituency they were voted in). crime:ds1_37_1 crime:Hampshire crime:ds1_37_1 crime:ds1_37_1 crime:ds1_37_1 crime:ds1_37_1 3. MOTIVATION The Linked Data principles [3] promote a Web of Data whose architecture is inherently decentralised, relying on http://parliament.psi.enakting.org data already published (when available) in order to give se- Winchester dc:coverage mantics and context to new data. The growth the WoD parliament:member/10395 parliament:cons-426 has experienced over recent years relies on the simplicity Eastleigh dc:coverage of publishing and linking data. However, up to now a se- parliament:member/101 parliament:cons-203 mantically coherent orchestration of data publishing is still Fareham a mirage. Nevertheless, relying purely on data linkage for dc:coverage the discovery and browsing of linked data resources would parliament:member/11884 parliament:cons-228 . lead to a serious knot to untie in the near future. The use . of ontologies and powerful ontology languages in publishing Linked Data will be an effort that must be justified against a scenario where such explicit semantics are rarely exploited. owl:sameAs resource accessible In publishing UK Public Sector Information (UK PSI), contained_in resource inaccessible we have identified an issue concerning data accessibility and navigability that addresses in particular the missing exploita- tion of semantics (in this case about qualitative spatial de- scription of geographical entities). In this paper we present Figure 2: Resource irretrievable via geographical gap 4 http://backlinks.psi.enakting.org 5 http://enakting.org Taking as an example the PSI data sets published re- cently6 , we adopted the Ordnance Survey administrative ready partially aligned. The integration of different knowl- ontology in order to provide context to our data items (i.e. edge bases could lead to the possible exploitation of such SCOVO items instances7 and local governmental data). The alignments in order to bridge data sets and reuse the avail- SCOVO ontology allows us to describe statistical data as a able knowledge in more than one context. collection of Items where each item describes a statistical value (i.e. a single cell in a multidimensional table) along 4. GEOGRAPHICAL SERVICE FOR UK with all the dimensions that characterise it. In the case of UK PSI statistics, many data sets collected were related to To support the user’s experience in browsing and discov- geographical regions (counties, districts, etc.) ery of new resources in the WoD, we have developed a ge- In this case, users who wished to discover useful informa- ographical service for querying the UK territory structure. tion about their own region (e.g. the County of Hampshire, The decision to restrict the service to the UK territory is top Figure 2) would start their searching activity by brows- mainly due to the fact that the service is mainly used in ing one of its available URIs. The OS URI for such geo- order to support the discovery of UK PSI resources. Knowl- graphical entity would be os:70000000000177658 , but any edge about geographical containment is exploited here to equivalent URI provided by a co-reference system will pro- link information that is contextually related because of their vide the same results as will be described in the following. spatial dimension. Using a backlinking service for resolving the entities link- For this use case we have implemented a service for query- ing to the given URI for Hampshire, we are able to retrieve ing the topological structure of UK (from the broader entity links to mortality statistics (mortality:ds1_299_[1...3]9 ) to the more particular and the other way around) that can and crime statistics (crime:ds1_37_[1...11]10 ). In Figure be easily integrated into a web of linked data. The service, 2 those URIs are contained in boxes labelled as “accessible”, accessible at http://geoservice.psi.enakting.org is de- meaning that those URIs are retrievable following back al- signed in order to be easily integrated both into web appli- ready existent arcs. Those SCOVO data sets’ items address cations and in linked data resources and it follows few basic in fact Hampshire county as one of their dimensions. What principles: is missing is the further data collected that reports valuable Lightweight Service : The service should be easy to use information about regions contained in Hampshire. In par- and resolve a specific problem. A geographical ser- ticular, within the EnAKTing project, we published linked vice is a component of the WoD that supports discov- data about the singular constituencies too. In detail we pub- ery when geographical entities are involved, it is not a lished, for each of constituency, an historical record of the general purpose reasoning engine. MP in charge for that constituency, his/her voting records and expenses. In Figure 2 those resources are contained in Linked Data Compatible : The geographical service sho- dotted boxes labelled as “inaccessible”, meaning that they uld be used as a resolvable URI like any other resource, cannot be retrieved with the existent knowledge. in order to be used in linked data content as a use- Example URIs for such inaccessible resources are11 : ful provider of relevant URIs. Moreover the service should provide the results in a number of different for- parliament:cons-637 rdfs:label "Winchester" mats that will be decided using content negotiation parliament:cons-203 rdfs:label "Eastleigh" and HTTP 303 redirection. parliament:cons-228 rdfs:label "Fareham" Co-reference Support : The service should exploit the The URIs for, respectively: Winchester, Eastleigh, and already available knowledge about instance equivalence Fareham, are therefore not retrieved by the resolution of provided by co-reference systems12 in order to return the Hampshire URI (obviously) or by the additional service results useful in a number of different data sets. provided from the backlinking service. Despite the fact that an entity is still semantically differ- 4.1 Data collection and normalisation ent from the parts that compose it, the information relevant OS provides an ontology13 and an RDF dump about spa- for all its constituting parts can still be relevant for the en- tial relations between UK regions. The triples from OS tity as a whole. Without covering such geographical gap it have been parsed and only the relation of physical contain- is not possible to access all the relevant sets of information, ments have been retained, normalised and completed with provide them to the user or process them in some way in the inverse relations in a separate knowledge base. The order to summarise their content. service presented here, for the sake of simplicity and effi- The aim of this research is to exploit authoritative knowl- ciency14 , manages only the NTPP, and the relative inverse, edge bases in order to cover such gaps, allowing therefore the NTPPi relations. The knowledge extracted from the OS citizens to retrieve information resources relevant to their data set has been then normalised in terms of an internal region of interest. Moreover, there are many data sources ontology that represent qualitative spatial relations. that describes geographical resources, and all of those are al- The normalisation step has been introduced in order to 6 allow the service to integrate further geographical hierar- http://browser.psi.enakting.org 7 chies in the future (e.g. geonames provides containment of http://purl.org/NET/scovo 8 12 PREFIX os: Like http://sameas.org 9 PREFIX mortality: ontology/SpatialRelations/v0.2/SpatialRelations. 10 PREFIX crime: owl 11 PREFIX parliament: in order to provide a very focused service. http://dbpedia.org http://crime.psi.enakting.org dbpedia:Hampshire crime:Hampshire http://data.ordnancesurvey.co.uk Hampshire county os:7000000000017765 Fareham Winchester os:7000000000025157 ... os:7000000000025128 http://dbpedia.org http://parliament.psi.enakting.org dbpedia:Fareham parliament:cons-637 (UK Parliament constituency) ... ... dbpedia:Winchester (UK Parliament constituency) parliament:cons-228 owl:sameAs part part_of Figure 3: Coupling of co-reference and Ordnance Survey geographic ontology geographical features). The future integration of qualita- a target data set provided by the user, see bottom part of tive spatial knowledge bases is devised in order to extend Figure 3). The co-reference service used in this paper is the service outside the borders of UK and for providing an the http://sameas.org service from Glaser et al. [9]. The assessment of co-references between geographical entities. relevant bundles have been retrieved from the service and A simple example of how the normalised triples from OS cached for performance. It is important to note that, in or- ontology are used in coupling with a co-reference service for der to chose the wanted quality of service, one could opt for bridging the navigational gap for different data sets is de- using one co-reference service instead of another. The func- picted in Figure 3; in the figure it is possible to see that a sin- tionality provided is transparent from the provenance of the gle statement from OS describing the fact that the County co-reference bundles. of Hampshire contains Fareham and Winchester15 : Exploiting co-reference services and OS ontology, it is therefore possible to infer containment relation between re- os:7...17765 os:contains os:7...25157. sources from different data sets. For example: os:7...17765 os:contains os:7...25128. dbpedia:Hampshire owl:sameAs os:7...17765 has been translated into an internal representation con- AND taining both relations: part, and part of; like the following: os:7...17765 geoservice:part os:7...25128 AND os:7...17765 geoservice:part os:7...25157. os:7...25128 owl:sameAs dbpedia:Winchester os:7...25157 geoservice:part_of os:7...17765 . =⇒ os:7...17765 geoservice:part os:7...25128. dbpedia:Hampshire geoservice:part dbpedia:Winchester os:7...25128 geoservice:part_of os:7...17765 . 4.2 RESTful API The containment relations so normalised (see central part The service is accessed via HTTP GET requests and pro- of Figure 3) are then internally stored in the system and vide two essential information: the list of entities contained queried for serving users requests. the input URI, and the list of entities that contains the in- The normalised containment relations are integrated with put URI. The interface is then accessible via the following the information provided by the co-reference system that URIs: allows to bridge different data sources both in the input phase (i.e. where the input URI must be translated in http://geoservice.psi.enakting.org/{command}/ the OS equivalent, see top part of Figure 3) and the out- {dictionary}/{format}/{URI} put phase (i.e. when the results must be translated into In the above API description, the parameters are enclosed 15 OS URIs are shortened, the trail of ’0’ are replaced by ’. . . ’. in brackets and their meaning is the following: http://geoservice.psi.enakting.org 2. 1. dbpedia:Hampshire co-reference geoservice http://sameas.org 4. 5. 3. os:7000000000017765 dbpedia:Fareham_ (UK_Parliament_constituency) geoservice:KB (4store) Figure 4: Overall architecture and interaction with co-reference system command: can be either contains or container: in the use one of the data set of preference (e.g. DBpedia or first case it returns the URIs of the entities contained Geonames) and ask for contained, or container, enti- by the input URI; in the second case it returns the ties in one of the desired target data set (e.g. again URIs of the entities that contains the input URI. DBpedia, Geonames, or enAKTing published informa- tion). dictionary: can be one of the followings (dbpedia, os, statistics, geonames, enakting, opencyc, open- The service returns a list of URIs if the content type is lylocal, or none) and instructs the service to use the text or json. The RDF content, for both rdf and turtle, co-reference system in order to retrieve the equivalent describes the containment relations between the input URI URIs in the respective data sets (i.e. DBpedia [1], and the resulting resources. In both cases the returned URIs Ordnance Survey [10], UK National Statistics16 , Geon- are translated into the desired address space. ames17 , PSI enAKTing18 , OpenCYC [12], Openly Lo- The procedure followed by the service, and an overall ar- cal project19 ). The value none is used for not applying chitecture, is depicted in Figure 4, and can be describe as any filter. In this case the URIs returned will be the follows: ones from the Ordnance Survey plus the ones returned from the co-reference service. 1. user generated request (HTTP GET request) 2. normalisation of the input URI to OS format: the format parameter is optional and can be one of the followings (rdf, text, ttl, or json). The value 3. computation of the property closure (i.e. part or part- of the format parameter decide then the format of of ) over the normalised URI the returned content: RDF/XML for rdf ; list of URIs separated by new lines for text; RDF/Turtle for tur- 4. optional phase of translation and filtering of the re- tle; and finally JSON20 for json. If the parameter sulting URIs to the target URI space is not given the right content is decided using the 5. formatted content, as per user request, returned to the 303 HTTP redirection. Even for the content requests user (HTTP Response) Accept:text/html done using the browser, the client is redirected to the HTML page of the service ini- As an example, consider the case of a software client tialised with the input URI. who needs to know all the geographical entities contained in the Hampshire, the request can adopt as an input one URI: is the URI of the input entity to query using the ser- of many available URIs describing the Hampshire county, vice. The service uses a co-reference system in order to a popular choice could be the DBpedia URI (i.e. URI = find the equivalent URI for the Ordnance Survey and http://dbpedia.org/resource/Hampshire). The agent can the Geonames data set. This means that the user can then explicit the desired target data set, for example the DB- 16 http://statistics.data.gov.uk last accessed 10/02/10 pedia data set itself (i.e. dictionary = dbpedia), and in- 17 http://geonames.org last accessed 10/02/10 struct the server to return the JSON format of the document 18 http://browser.psi.enakting.org last accessed (i.e. HTTP header contains Accept:application/json). 10/02/10 The HTTP request will be then the following: 19 Community devoted to provide linked data access for lo- cal government data, see http://openlylocal.com last ac- GET /contains/dbpedia/http://dbpedia.org/ cessed 10/02/10 resource/Hampshire 20 http://json.org last accessed 10/02/10 Host: geoservice.psi.enakting.org Accept: application/json http://dbpedia.org/resource/Southampton_Test_ %28UK_Parliament_constituency%29 http://dbpedia.org/resource/Southampton_Itchen_ And the service will return a response redirecting the client %28UK_Parliament_constituency%29 to the right URL: HTTP/1.1 302 Found From those URIs we are then able to check then the iden- Location: http://geoservice.psi.enakting.org/ tities of the MPs in charge (in the Southampton page from contains/dbpedia/json/http://dbpedia.org/ DBpedia their are mentioned both as leaders of the city resource/Hampshire whereas an MP is actually in charge only to its constituency where s/he has been elected. Asking then for the URIs from the data sets provided by the EnAKTing project we would That, once resolved, will finally return the desired content, be able to retrieve the followings: a JSON array of strings that represents the URI of the DB- pedia resource describing entities contained in Hampshire: http://parliament.psi.enakting.org/id/cons-536 HTTP/1.1 200 OK http://parliament.psi.enakting.org/id/cons-535 Content-Type: application/json Following such links the user would be able then to re- trieve other information about the MPs from each constituen- ["http://dbpedia.org/resource/North_East cy (even retrieving an historical record of them) and further _Hampshire_%28UK_Parliament_constituency%29", information about their political activity. "http://dbpedia.org/resource/East_Hampshire _%28UK_Parliament_constituency%29", ... 5. EVALUATION The client agent can obviously immediately refer to the We have evaluated our geographical service from two dif- right URL and retrieve the content in the right format straight ferent perspectives. The first one looks at the direct ben- away. A useful way to exploit such service can be seen when efit that our backlinking service for Public Sector Informa- data sets other than OS one are queried. Not every data tion22 would gain from expanding its navigability through set in fact provides a clear semantic representation about geographic containments (see Section 5.1). The second eval- mereological relations. This is due to the fact that the focus uation is more analytic and looks at the new knowledge of many data set is to provide information about a particu- generated as part of the translation process from an author- lar region: encyclopaedic information from DBpedia, statis- itative geographic closure to the covered vocabularies (see tics information from the UK National Statistics, geograph- Section 5.2) . ical features from Geonames21 , conceptual description from Open CYC, local government information from Openly Lo- 5.1 Backlinking Service Integration cal, and UK PSI from EnAKTing. This section studies the navigability improvement that Using the service presented in this paper is easy to ex- our backlinking service for the PSI in the UK would ex- ploit the OS administrative ontology in order to retrieve ge- periment by plugging the containments from a wide range ographically relevant information regardless from the start- of vocabularies.The PSI Backlinking Service provides an ac- ing data set. As an example, let us consider the case where cess point to retrieve backlinks from Foreign URIs. Foreign a user may want to retrieve information about local govern- URIs make data discovery difficult because it is not possible ment of its own city, for example about Southampton, UK. to navigate the RDF documents of the WoD bidirectionally. The easiest thing to do is to start from a recognizable URI http://backlinks.psi.enakting.org provides an API to such as the DBpedia ones: retrieve collections of backlinks for a given URI. The study of the covered knowledge bases23 in the UK PSI Backlinking http://dbpedia.org/resource/Southampton Service made explicit that one of the most highly connected From this URI the user can retrieve general information data sets in the PSI WoD are the ones representing some about the city, even the names about some of the city lead- type of geographic information. ers. No further information is available on the Southampton In this evaluation we have used the Backlinking Service DBpedia page about local government information. Asking as a client of the Geographical Service in order to expand the geographical service to return the contained entities from the backlinks that we can get from geographic resources. We the Openly Local site we can then retrieve more resources: have kept the decentralization nature of the Backlinking and Geographical services and basically the Backlinking Service http://openlylocal.com/id/wards/4925 performs HTTP requests to get the geographic containments http://openlylocal.com/id/wards/4929 (see Figure 5). When the geography extension is enabled ... the backlinking service gets the list of contained entities for http://openlylocal.com/id/wards/4938 the input URI and returns the backlinks connected to any Those URIs are the ones published for each one of the URI part of containments. The request to the Geography wards present in the city of Southampton and provides not Service is performed using “contains” as command JSON as only the names of the local councillors but also some other format and “none” as dictionary. The selected dictionary is statistics about the ward (i.e. demographics and religious “none” because the Backlinking Service doesn’t know before statistics). Moreover, asking again the service for the DB- 22 http://backlinks.psi.enakting.org last accessed pedia URIs we are able to retrieve the followings: 10/02/10 21 23 Geonames provides a containment relation that does not http://backlinks.psi.enakting.org#KBs last accessed however reflect any administrative subdivision 10/02/10 HTTP GET http://backlinks.psi.enakting.org/resource/URI?geo=enabled linked to dbpedia:Hampshire or equivalent URIs but to ge- ographic containments of it in at least one of the data sets http://backlinks.psi.enakting.org (RESTFul API) covered by the Geographical Service. This scenario has shown one of possible scenarios where the exploitation of explicit semantic can improved the ac- HTTP GET http://geoservice.psi.enakting.org/contains/none/json/URI cessibility of the resources in the Web of Data. In esence for URI' in geoPartonomyBundle: BackLinks += GetBackLinks(URI') the backlinking service is improving its graph connectivity http://geoservice.psi.enakting.org by being aware of the new layer of Linked Data that the (RESTFul API) Backlinks Knowledge Base Geographical Service publishes via its RESTFul API. This (4store) case study also shows how different Linked Data RESTFul Co-reference services (such as co-reference, backlinking and geographical http://sameAs.org services) can cooperate in a layer built on top of current Web of Data to improve its navigability. Figure 5: Interaction of the backlinking and geo- 5.2 Vocabulary Closure Coverage graphical services The geographical service can be seen as an extra layer of linked data based on an initial geographic closure provided by Ordnance Survey and its extensions to other data sets via hand what type of URIs will be the source of backlinks for co-references. This extra layer of linked data is obviously an a certain geographic region. So as to improve the coverage added value to the Web of Data. This section analyses the we aim to get all the possible containments from all the interlinking improvement between the data sets by means of dictionaries supported in the geographical service. number of triples produced by the Geographical Service. There is a natural outcome from this integration and it can Table 1 represents the amount of triples generated by our be shown using how the systems works when asking for back- service in terms of number of triples that contain where the links connected the URI dbpedia:Hampshire. Prior to the use predicate is geoservice:part or geoservice:part of. This of the geographical extension a request to retrieve backlinks table shows the numbers of triples linking every pair of data for dbpedia:Hampshire would just give back 14 URIs related set in the system. For instance our Geographical Service has the UK region of dbpedia:Hampshire or any equivalent URI produced 30995 geographic containments between dbpedia part of the same co-reference bundle in sameAs.org (see Fig- and mortality.psi.enakting.org. ure 6). This same request when the geographical service is Of particular interest are the results from the geonames integrated returns the following additional backlinks: data set. In fact, the number of containment relations within such dataset is quite small compared to the number of con- • 6 010 resources that represent schools from http:// tainment relations provided by geonames itself (a rough esti- education.data.gov.uk. These RDF documents rep- mate done by the authors counts about 9K relations). Such resents the totality of education entities in the region additional source of spatial knowledge open a scenario where of Hampshire. the two knowledge bases can be compared and integrated for providing a better recall for the service. An important as- • 42 mortality statistical resources from http://mortali pect to take into account in such a scenario would be the ty.psi.enakting.org. This statistics are segmented quality of the results computed by the integration. by geography and gender. The data seed that triggered this new knowledge is the OS to OS containments, 60M of statements. The total number • 981 CO2 emission measurements from http://co2emis of triples generated are 223M and these are partially inter- sion.psi.enakting.org. These resources represent linking every pair of data sets. Partially because the com- the CO2 emissions for the region of Hampshire be- pleteness of every pair of datasets’ closure relies on the accu- tween 2005 and 2007. racy of the co-reference bundles extracted from sameAs.org. As the number of co-references from sameAs.org grows and • 300 resources with information of energy consumption improves its accuracy the Geographical Service will reflect from http://energy.psi.enakting.org. This data those changes automatically. This side effects is one of key sets publishes the energy consumption in the UK in aspects of the Web of Data and its decentralized nature. respect to fuel in the road network between 2005 and 2007. These results represent all the RDF documents linked to geographical regions contained in Hampshire. 6. CONCLUSIONS We have presented in this paper a service that helps users • 4 788 population census information segmented by age in browsing geographical resources from different datasets and sex from http://population.psi.enakting.org. (dbpedia, geonames, data.gov.uk. psi.enakting.org, . . . ) by exploiting an authoritative ontology for the UK territory • 224 parliamentary identities from http://parliament. (Ordnance Survey). One of the novel aspects of this research psi.enakting.org. These represent mandates for dif- is the use of a co-reference system (http://sameas.org) to ferent members of the UK Parliament and House of extend the containments from one geographic data set to Commons. others where such containments are not so rich or com- plete. Moreover, the added value of integrating such geo- Figure 6 shows the output of the backlinking service with graphical service with a backlinking service has been shown and without geographical extensions in the Backlinking Ser- with respect to demonstrate a possible exploitation scenario vice. All the resources enumerated above are not specifically on Public Sector Information. Due to the particular na- Backlinks Geographical Service Integration Disabled http://backlinks.psi.enakting.org/resource/doc/http://dbpedia.org/resource/Hampshire Backlinks Geographical Service Integration Enabled http://backlinks.psi.enakting.org/resource/doc/http://dbpedia.org/resource/Hampshire?geo=enabled Figure 6: Output comparison for dbpedia:Hampshire with and without geographical partonomies Table 1: Datasets linkage improvement statistics OS dbpedia statistics mortality parliament crime geonames openlylocal opencyc OS 60469910 1757760 45354078 1035901 1338214 235906 94072 18559453 1106900 dbpedia 1757760 59640 1393322 30995 46077 9570 3035 540619 35250 statistics 45354078 1393322 36179867 813660 1056892 206217 71232 14430773 819965 mortality 1035901 30995 813660 19109 23929 4607 1488 344436 17415 parliament 1338214 46077 1056892 23929 37631 7654 2410 436883 28070 crime 235906 9570 206217 4607 7654 2249 334 82559 4160 geonames 94072 3035 71232 1488 2410 334 224 26427 2475 openlylocal 18559453 540619 14430773 344436 436883 82559 26427 6498462 312120 opencyc 1106900 35250 819965 17415 28070 4160 2475 312120 27975 ture of the knowledge provided (i.e. closure of geographi- Hampshire (i.e. not contained any more), although being cal containment properties), there is the possibility of over- still part of it as a ceremonial county. Versioning of infor- whelming the user with information when asking about top mation resources is an hot topic in Linked Data community level features (e.g. England). In order to cope with this and it is even more important when publishing Public Sec- eventuality, the service will be provided soon with the ca- tor Information, whose content and validity must be put pability to limit the results by depth. Therefore, when into context. asked about all the entities contained in the top level fea- The research work reported here tackles an important ture England at the first level of depth, the service will re- aspect of Linked Data, the exploitation of explicit seman- turn only: North East, North West, South East, Eastern, tic content for enhancing resource retrieval and browsabil- South West, East Midlands, West Midlands, Yorkshire & ity. The choice to tackle geographical knowledge rather the Humber, Scotland, Wales, London (different from the than some other data facet is mainly due to the analysis City of London). of the available data sources, their structure and the avail- Another important aspect not tackled in this work, and able knowledge exploitable for a better integration of the subject of future research, is the temporal extent of admin- available information. istrative divisions. The version of administrative geography The use of co-reference systems allowed us to exploit the of UK will change shortly and has changed frequently during knowledge created in one organization (Ordnance Survey in the years (e.g. the number and borders of constituencies are this case) in different, and potentially novel, data collec- reviewed every 10 or 15 years). New entities can be defined, tions, overlapping a qualitative spatial dimension that was old ones can be abolished or change status. For example not present before. Such reuse of knowledge is potentially in- Southampton, once part of Hampshire, became a Unitary novative but poses many questions about the management Authority on the 1st of April 1997. Since then, Southamp- of the quality of the knowledge and the entity alignments ton has been administratively detached from the county of used. The presence, integration, and comparison of different geographical knowledge bases can be beneficial for the main- [13] J. Nanard and M. Nanard. Using structured types to tenance and discovery of entity alignments of good quality. incorporate knowledge in hypertext. In HYPERTEXT Another interesting aspect related to the use of co-reference ’91: Proceedings of the third annual ACM conference services integrated with an additional knowledge source is on Hypertext, pages 329–343, New York, NY, USA, the ability to exploit the data semantics in order to change 1991. ACM. the navigability of the datasets. Such change in the naviga- [14] A. A. Randell, Z. Cui, and A. G. Cohn. A spatial logic bility is clear when new arcs are provided within the same based on regions and connection. In B. Nebel, W. data set (e.g. between dbpedia resource where they were not Swartout, and C. Rich, editors, Principles of linked before) or between resources belonging to different Knowledge Representation and Reasoning, 1992. data sets (see Table 1 for a complete account of the data [15] J. Rohlfs. A theory of interdependent demand for a sets connected). communications service. The Bell Journal of Economics and Management Science, 5(1):16–37, 7. ACKNOWLEDGEMENTS 1974. This work was supported by the EnAKTing project funded [16] J. Sowa and A. Borgida. Principles of semantic by the Engineering and Physical Sciences Research Council networks : explorations in the representation of under contract EP/G008493/1. knowledge. Morgan Kaufmann, 1991. [17] A. N. Whitehead. Process and Reality. The MacMillan 8. REFERENCES Company, New York, NY, USA, 1929. [1] S. Auer, S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. Dbpedia: A nucleus for a web of open data. in 6th International Semantic Web Conference, Busan, Korea, pages 11–15, 2007. [2] C. Becker and C. Bizer. DBpedia mobile: A location-enabled linked data browser. In 1st Workshop about Linked Data on the Web (LDOW2008), April 2008. [3] T. Berners-Lee. Design issues: Linked data. http://www.w3.org/DesignIssues/LinkedData.html, 2006. [4] L. Carr, W. Hall, S. Bechhofer, and C. Goble. Conceptual linking: ontology-based open hypermedia. In WWW ’01: Proceedings of the 10th international conference on World Wide Web, pages 334–342, New York, NY, USA, 2001. ACM. [5] B. L. Clarke. A calculus of individuals based on “connection”. Notre Dame J. Formal Logic, 22(3):204–218, 1981. [6] A. G. Cohn, B. Bennett, J. Gooday, and N. M. Gotts. Qualitative spatial representation and reasoning with the region connection calculus. Geoinformatica, 1(3):275–316, 1997. [7] M. J. Egenhofer. A formal definition of binary topological relationships. In 3rd International Conference, on Foundations of Data Organization and Algorithms (FODO), pages 457–472, New York, NY, USA, 1989. Springer-Verlag New York, Inc. [8] C. Freksa. Temporal reasoning based on semi-intervals. Artif. Intell., 54(1-2):199–227, 1992. [9] H. Glaser, A. Jaffri, and I. Millard. Managing co-reference on the semantic web. In WWW2009 Workshop: Linked Data on the Web (LDOW2009), April 2009. [10] J. Goodwin, C. Dolbear, and G. Hart. Geographical linked data: The administrative geography of great britain on the semantic web. Transaction in GIS, 12(1):19–30, February 2009. [11] Grzegorczyk. Undecidability of some topological theories. Fundamenta Mathematicae, 38:137–152, 1951. [12] D. B. Lenat. Cyc: a large-scale investment in knowledge infrastructure. Commun. ACM, 38(11):33–38, 1995.