=Paper=
{{Paper
|id=Vol-440/paper-10
|storemode=property
|title=Unification of Geospatial Reasoning, Temporal Logic, & Social Network Analysis in an RDF Database
|pdfUrl=https://ceur-ws.org/Vol-440/paper10.pdf
|volume=Vol-440
|dblpUrl=https://dblp.org/rec/conf/oic/Aasman08
}}
==Unification of Geospatial Reasoning, Temporal Logic, & Social Network Analysis in an RDF Database==
Unification of Geospatial Reasoning, Temporal Logic, & Social Network
Analysis in an RDF Database
Jans Aasman
Franz, Inc.
555 12th Street, Suite 1450
Oakland, CA 94607
+1-510-452-2000
ja@franz.com
Abstract - This paper is about a new type of event supports a combination of description logic, geospatial reasoning,
database that enables efficient reasoning about things, people, temporal reasoning, and knowledge about the social relationships
companies, relationships between people and companies, and between people.
about places and events. The event database is built on top of a The commercial vendors of Semantic Technologies also see
scalable distributed RDF triple store that can handle literally a number of use cases that all center around events and require
billions of events. Like objects, events have at least one actor, the aforementioned query capabilities. We currently see
but usually more, a start-time and possibly an end-time, a companies using large data warehouses with very disparate RDF
place where the event happened, and the type of the event. An based triple stores describing various types of events where each
event can have many additional properties and annotations. event has at least two actors, usually a begin and end time, and
On top of this event database we implemented libraries for very often a geospatial component. These events are literally
RDFS++ logic reasoning, for geospatial and temporal everywhere: in Health Care applications we see hospital visits,
capabilities, and an extensive social network analysis package. drugstore visits, and medical procedures. In the Communications
This paper focuses on a query framework that makes it easy to Industry we see telephone call detail records, including location.
combine all of the aforementioned capabilities in a user An email and calendar database of a large company is nothing
friendly query language. more than a social network database filled with events in time
and, in many cases, space. In the Financial Industry every
Index Terms – Geotemporal logic, geospatial reasoning, transaction is essentially an event. In the Insurance Industry
claims are important events and they desperately need more
RDF database, graph database, RDFS, OWL, SPARQL, social
activity recognition. In the Intelligence community basically
network analytics, business intelligence, event-based systems,
everything revolves around events and actors. The REWERSE
event-driven architectures, metadata, semantic technologies.
program from the 6th Framework Programme of the EU
Commission [3] is one of the few systematic efforts to combine
RDFS/OWL with geotemporal reasoning, although the social
aspect hasn't been addressed yet. The recent book “The
I. INTRODUCTION Geospatial Web” [4] currently provides the state of the art
overview on how to work with people and events on a web scale
This paper describes the design and use of a unifying query and what kind of applications we might expect in the near future.
framework for geospatial reasoning, temporal logic, social
network analytics, RDFS and OWL in Event-based systems [1].
In this introduction we will first go into why we need such a II. FRAMEWORK REQUIREMENTS
framework and the requirements for such a framework.
The Semantic Web community has made great strides in the
The reason for such a framework can be answered by area of ontologies and description logic, and some initial work in
looking at the vision of the semantic web and understanding how the areas of geospatial reasoning [5], temporal reasoning [6],
companies use semantic technologies. Tim Berners-Lee, James social network analysis [7], and event ontologies [8]. All of this
Hendler and Ora Lassila’s Scientific American article (May, is based on RDF as the data representation. Based on this W3C
2000) [2] provides a compelling vision of the Semantic Web. It standard the combination of all these different reasoning
contains some interesting use cases for what the Semantic Web capabilities in one unified framework will propel further industry
will bring. These use cases assume that software agents know adoption of Semantic Technology. Given that we have seen a
how to roam the web and reason over things, people, companies, direct need for query capabilities that handle
relationships between people and companies and about places geospatial/temporal/social/rdfs/owl, we have designed a
and events. Clearly these agents need a query capability that framework. The main requirements we identified were:
of Allen’s 13 interval predicates. Note that we do purely
1. User and programmer friendly: We wanted the quantitative temporal reasoning. So if you provide a number of
framework to be an extension of SPARQL, with events with a start time and an end time or a duration then we
SPARQL as the foundation. Certainly the framework can perform queries like the following. This example will return
should not be anymore complex than SPARQL. all intervals ?i2 that happened in interval ?i1.
SPARQL is relatively user friendly, and as languages go, (select ?x (interval-during ?i1 ?i2))
the adoption rate is such that one could make the Temporal reasoning uses the range query capabilities to the
argument that it is sufficient to address most use cases. fullest extent. If you want to find all the events that happened
between Jan. 1, 2008 and Jan. 2, 2008, the triple store performs a
2. Implementer friendly: We need many people to straight triple query with only one cursor scan. It is still possible
experiment with this proposed framework such that the to blow up the query time spectacularly by doing things like
Semantic Web community can converge on a standard.
(select (?x ?y) (point-before ?x ?y))
3. Efficient: Given that we work with very large databases as that will generate every before/after pair. However, we do
with millions of events where the response time has to consider that to be the responsibility of the user. In many cases a
be on the sub second level, the implementation of the query optimizer can warn for that or rearrange the clauses to bind
query language and query engine needs to be very fast ?x or ?y.
4. We want the query language to work on distributed
databases. Currently we’ve designed the query engine to
work on federations of triple stores. Once we develop B. Geospatial Primitives
efficient caching techniques for distributed RDF
knowledge stores residing all over the web, it will also Our original intention of adding Geospatial capabilities was
be efficient for agents that need to roam the web. not so much to compete with existing spatial databases but
instead make it very easy for RDF users to be able to deal with
5. Practical & Easily Extendible: We want the API to be locations of objects very efficiently. In order to make this fast we
such that it can be easily modified to allow for ongoing implemented a variation of an R-Tree to encode two-dimensional
experimentation. data very efficiently directly in the triple indices [10]. A detailed
description of how this geospatial representation works can be
found in the geospatial tutorial included with the AllegroGraph
6. Works well with RDFS and OWL reasoning. documentation [11]. Currently we support a number of
predicates that can be used in the query language. Some
examples of the predicates:
III. DISCUSSION
(geo-distance ?x ?y ?dist) -> given, x and y, return distance
In the remainder of the paper we show how we can combine (geo-within-radius ?x ?y 10.0) -> find y within 10 miles from x
geospatial reasoning, temporal logic, social network analytics, (geo-inside-polygon ?polygon ?place ?lon ?lat)
and RDFS reasoning all in one query language.
One question that people ask who are familiar with triple
stores is: how can this work efficiently on very large data sets For our benchmarking we use the open source GeoNames
containing billions of triples? Most first generation triple stores database that can be freely downloaded from GeoNames.org [12].
store the URIs and literals that constitute the parts of a triple as The database contains nearly 7 million points of interest on
strings in a dictionary. So, when doing range queries over earth. From interesting points in nature, to populated areas, to
numeric values, for example, "select * from person where age > schools and churches, etc. Each point has 12 features such as
50”, the triple store engine has to go through each value for the asciiname, the local name, elevation level, longitude, latitude,
predicate ‘age’. One way around this is to add btrees for every population, etc. Actually, it is not a database but a csv file that
numeric type but that in general is a very inefficient solution in programmers can modify as necessary. For our purposes we
triple stores. The triple store that we use is AllegroGraph which obviously transform it into RDF triples. We can retrieve all 459
is actually a hybrid between a relational database and triple geo-points around Berkeley less than 4 miles away in less than 5
store, the internal representation of the triples is such that is milliseconds. We would argue that the basic retrieval speed is
allows for very efficient range queries. comparable to or better than current commercial spatial
databases. Here are some typical example queries that you can
do on the GeoNames database:
A. Temporal Reasoning
Our temporal reasoning is based on James Allen’s Interval Find the distance between Oakland and the one and only
Logic [9]. This logic looks at all the 13 ways two temporal Berkeley in California.
intervals can relate to each other. We provide predicates for each (select (?dist)
(q ?x geo:name “Oakland”) measure. This predicate will start with the most important one
(q ?y geo:name "Berkeley") first.
(q ?y geo:admin1_code "CA")
(geo-distance ?x ?y ?dist)) (select ?x
(ego-group fr:Person1 knows 2 ?group)
Put in a Google map all the places within 10 miles from Oakland (actor-centrality-members ?group knows ?x))
(google-map (select (?name ?lat ?lon)
(q ?x geo:asciiname “Oakland”) AllegroGraph is a native, general graph database, written
specifically to make graph search faster. However, the bottleneck
(geo-within-radius ?x ?y 10) is still getting triples from disk as fast as possible and having the
(q ?y geo:asciiname ?name) smartest algorithms and best caching available. For example,
many of the centrality measures that are used to compute the
(q ?y geo:isAt5 ?pos)
importance of an actor in a known group need to compute the
(pos->lon/lat ?pos ?lon ?lat))) shortest path between every actor in the group. We have created
special constructors to cache these groups in a transparent way so
that most computations can be done without minimal IO.
C. Social Network Analysis (SNA)
IV. AN OVERVIEW EXAMPLE
Many RDF resources are about people and relationships
between people, or between people and companies, or between
companies and other companies. We added Social Network In order to give the reader an impression of the breadth and
Analysis methods to make it easier to reason about relationships depth of the query language, we provide a typical example that
and groups. The functions that we provide address the five basic combines geospatial, temporal, SNA and RDFS reasoning.
questions from Social Network Analysis. (1) How far is person A (select (?x)
from person B, (2) if there is a link between A and B then how
(ego-group person:jans knows ?group 2)
strong is this relationship, (3) given a particular actor A, in what
group does this actor ‘live’, (4) given an actor in a group, how (actor-centrality-members ?group knows ?x ?num)
important is this actor in the group and finally, (5) given a group, (q ?event fr:actor ?x)
how dense are the relationships in the group and does this group
have a leader or a set of leaders. The SNA library encompasses a (qs ?event!rdf:type fr:Meeting)
set of well know SNA algorithms. We provide a set of general (interval-during ?event “2008-12-01” “2008-12-05”)
functions and have developed the concept of a generator. A (geo-box-around geoname:Berkeley ?event 5 miles)
generator is basically a function that takes as an input one node
and then creates a set of output nodes. The search functions and !)
SNA functions that we provide take these generators as first class In English this translates into:
arguments. For example: say we have a database with
Find the group of friends and friends of friends around the
relationships between people, the generator ‘knows’ will take as
person “Jans”. Find within this group the most important
an input a person and return a set of person(s) by following
person first. Find if this person was part of an event that
fr:went-to-dinner-with and fr:went-to-movies in both directions.
was of type Meeting and happened in a particular time
interval within 5 miles of Berkeley.
(defgenerator knows () Note that we seamlessly mix Social Network Analysis in the first
(bidirectional fr:went-to-dinner fr:went-to-movies)) two clauses, a simple database look up in the third, an RDFS
inference about the type of event, and then a temporal and a
geospatial constraint. This current example and the examples
We can use this generator to find, for example, the shortest shown above utilize Prolog. We expect in early 2009 to have a
path between two people. In this case the query will return a list SPARQL engine that will perform this identical query.
of persons.
The syntax of the SPARQL query will be slightly more
contrived due to the fact that SPARQL normally only allows
(select ?x patterns that map directly on triples (see example below). Note
that we introduced the non-standard ‘=’ or assignment construct.
(shortest-path knows fr:Person1 fr:Person2 ?x)) We are planning to discuss this topic with the SPARQL
committees.
Or we can use the generator to first create a group of friends select ?x where {
and friends of friends in the ego-group predicate, and then we ?group = ego-group(person:jans knows 2) .
find the importance of each member using the actor-centrality
?x = actor-centrality-members(?group knows ?x) .
?event fr:actor ?x ;
rdf:type fr:Meeting . [8] Raimond, Y. Abdallay, S., Event Ontology, 2007,
http://motools.sourceforge.net/event/event.html
FILTER (interval-during ?event '2007-12-01' '2007-12-31')
FILTER (geo-box-around geoname:Berkeley ?event 5miles) [9] Allen, J.F.: Time and Time Again: The Many Ways to
Represent Time. International Journal of Intelligent
}
Systems, Vol. 6, No. 4 (1991)
V. SUMMARY AND FUTURE RESEARCH [10] Wikipedia R-tree data structure,
http://en.wikipedia.org/wiki/Rtree
In this paper we have discussed how RDF can serve as a
basis for an event database where events are defined as ‘things’ [11] Geospatial tutorial section of Franz Inc.’s AllegroGraph 3.0
that (1) require RDFS++ reasoning because events have types, documentation,
(2) require geospatial reasoning because events happen http://agraph.franz.com/support/documentation/current/geos
somewhere, (3) require temporal reasoning because events nearly patial-tutorial.html
always have a start and duration and (4) require some form of
social analysis because most interesting events have one or more [12] GeoNames Data Access, http://www.geonames.org/export/
actors.
We demonstrated how all of these capabilities can be used
in one query language, in this case Prolog. And we expect that in
the near future these capabilities will be available in SPARQL as
well.
The primary research effort for the current version of the
query framework is to enhance query-optimization. Notice that in
the example shown above, most clauses are not direct matches
against the database but functors that do computations. Some of
these functors can act both as generators and as filters (as is
common in Prolog). In case a functor acts as generator we need to
research better statistical predictions for how many solutions can
be expected so that we can do better re-ordering of clauses.
VI. REFERENCES
[1] Aasman, J., Unification of geospatial reasoning, temporal
logic, & social network analysis in event-based systems,
Distributed Event Based Systems (DEBS 2008)
http://portal.acm.org/citation.cfm?id=1386007
[2] 1st Scientific American article on the Semantic Web,
http://www.sciam.com/article.cfm?articleID=00048144-
10D2-1C70-84A9809EC588EF21&ref=sciam
[3] REWERSE, http://rewerse.net
[4] Scharl, A., Tochtermann, K.: The Geospatial Web: How
Geobrowsers, Social Software and the Web 2.0 are Shaping
the Network Society. Springer (2007)
[5] W3C Geospatial Incubator Group,
http://www.w3.org/2005/Incubator/geo/
[6] Gutierrez, C., Hurtado, C., and Vaisman, A. Temporal RDF.
In European Conference on the Semantic Web (ECSW’05)
(Best paper award), pages 93–107, 2005,
http://www.dcc.uchile.cl/~cgutierr/papers/temporalRDF.pdf
[7] Mika, P.: Social Networks and the Semantic Web. Springer
(2007)