=Paper= {{Paper |id=Vol-440/paper-10 |storemode=property |title=Unification of Geospatial Reasoning, Temporal Logic, & Social Network Analysis in an RDF Database |pdfUrl=https://ceur-ws.org/Vol-440/paper10.pdf |volume=Vol-440 |dblpUrl=https://dblp.org/rec/conf/oic/Aasman08 }} ==Unification of Geospatial Reasoning, Temporal Logic, & Social Network Analysis in an RDF Database== https://ceur-ws.org/Vol-440/paper10.pdf
Unification of Geospatial Reasoning, Temporal Logic, & Social Network
                     Analysis in an RDF Database
                                                           Jans Aasman
                                                             Franz, Inc.
                                                     555 12th Street, Suite 1450
                                                       Oakland, CA 94607
                                                         +1-510-452-2000
                                                            ja@franz.com



      Abstract - This paper is about a new type of event             supports a combination of description logic, geospatial reasoning,
database that enables efficient reasoning about things, people,      temporal reasoning, and knowledge about the social relationships
companies, relationships between people and companies, and           between people.
about places and events. The event database is built on top of a          The commercial vendors of Semantic Technologies also see
scalable distributed RDF triple store that can handle literally      a number of use cases that all center around events and require
billions of events. Like objects, events have at least one actor,    the aforementioned query capabilities. We currently see
but usually more, a start-time and possibly an end-time, a           companies using large data warehouses with very disparate RDF
place where the event happened, and the type of the event. An        based triple stores describing various types of events where each
event can have many additional properties and annotations.           event has at least two actors, usually a begin and end time, and
On top of this event database we implemented libraries for           very often a geospatial component. These events are literally
RDFS++ logic reasoning, for geospatial and temporal                  everywhere: in Health Care applications we see hospital visits,
capabilities, and an extensive social network analysis package.      drugstore visits, and medical procedures. In the Communications
This paper focuses on a query framework that makes it easy to        Industry we see telephone call detail records, including location.
combine all of the aforementioned capabilities in a user             An email and calendar database of a large company is nothing
friendly query language.                                             more than a social network database filled with events in time
                                                                     and, in many cases, space. In the Financial Industry every
    Index Terms – Geotemporal logic, geospatial reasoning,           transaction is essentially an event. In the Insurance Industry
                                                                     claims are important events and they desperately need more
RDF database, graph database, RDFS, OWL, SPARQL, social
                                                                     activity recognition. In the Intelligence community basically
network analytics, business intelligence, event-based systems,
                                                                     everything revolves around events and actors. The REWERSE
event-driven architectures, metadata, semantic technologies.
                                                                     program from the 6th Framework Programme of the EU
                                                                     Commission [3] is one of the few systematic efforts to combine
                                                                     RDFS/OWL with geotemporal reasoning, although the social
                                                                     aspect hasn't been addressed yet. The recent book “The
                    I. INTRODUCTION                                  Geospatial Web” [4] currently provides the state of the art
                                                                     overview on how to work with people and events on a web scale
     This paper describes the design and use of a unifying query     and what kind of applications we might expect in the near future.
framework for geospatial reasoning, temporal logic, social
network analytics, RDFS and OWL in Event-based systems [1].
In this introduction we will first go into why we need such a                 II. FRAMEWORK REQUIREMENTS
framework and the requirements for such a framework.

                                                                         The Semantic Web community has made great strides in the
      The reason for such a framework can be answered by             area of ontologies and description logic, and some initial work in
looking at the vision of the semantic web and understanding how      the areas of geospatial reasoning [5], temporal reasoning [6],
companies use semantic technologies. Tim Berners-Lee, James          social network analysis [7], and event ontologies [8]. All of this
Hendler and Ora Lassila’s Scientific American article (May,          is based on RDF as the data representation. Based on this W3C
2000) [2] provides a compelling vision of the Semantic Web. It       standard the combination of all these different reasoning
contains some interesting use cases for what the Semantic Web        capabilities in one unified framework will propel further industry
will bring. These use cases assume that software agents know         adoption of Semantic Technology. Given that we have seen a
how to roam the web and reason over things, people, companies,       direct    need     for     query    capabilities   that     handle
relationships between people and companies and about places          geospatial/temporal/social/rdfs/owl, we have designed a
and events. Clearly these agents need a query capability that        framework. The main requirements we identified were:
                                                                       of Allen’s 13 interval predicates. Note that we do purely
   1.   User and programmer friendly: We wanted the                    quantitative temporal reasoning. So if you provide a number of
        framework to be an extension of SPARQL, with                   events with a start time and an end time or a duration then we
        SPARQL as the foundation. Certainly the framework              can perform queries like the following. This example will return
        should not be anymore complex than SPARQL.                     all intervals ?i2 that happened in interval ?i1.
        SPARQL is relatively user friendly, and as languages go,           (select ?x (interval-during ?i1 ?i2))
        the adoption rate is such that one could make the                 Temporal reasoning uses the range query capabilities to the
        argument that it is sufficient to address most use cases.      fullest extent. If you want to find all the events that happened
                                                                       between Jan. 1, 2008 and Jan. 2, 2008, the triple store performs a
   2.   Implementer friendly: We need many people to                   straight triple query with only one cursor scan. It is still possible
        experiment with this proposed framework such that the          to blow up the query time spectacularly by doing things like
        Semantic Web community can converge on a standard.
                                                                           (select (?x ?y) (point-before ?x ?y))
   3.   Efficient: Given that we work with very large databases        as that will generate every before/after pair. However, we do
        with millions of events where the response time has to         consider that to be the responsibility of the user. In many cases a
        be on the sub second level, the implementation of the          query optimizer can warn for that or rearrange the clauses to bind
        query language and query engine needs to be very fast          ?x or ?y.

   4.   We want the query language to work on distributed
        databases. Currently we’ve designed the query engine to
        work on federations of triple stores. Once we develop          B. Geospatial Primitives
        efficient caching techniques for distributed RDF
        knowledge stores residing all over the web, it will also          Our original intention of adding Geospatial capabilities was
        be efficient for agents that need to roam the web.             not so much to compete with existing spatial databases but
                                                                       instead make it very easy for RDF users to be able to deal with
   5.   Practical & Easily Extendible: We want the API to be           locations of objects very efficiently. In order to make this fast we
        such that it can be easily modified to allow for ongoing       implemented a variation of an R-Tree to encode two-dimensional
        experimentation.                                               data very efficiently directly in the triple indices [10]. A detailed
                                                                       description of how this geospatial representation works can be
                                                                       found in the geospatial tutorial included with the AllegroGraph
   6.   Works well with RDFS and OWL reasoning.                        documentation [11].       Currently we support a number of
                                                                       predicates that can be used in the query language. Some
                                                                       examples of the predicates:

                       III. DISCUSSION
                                                                         (geo-distance ?x ?y ?dist) -> given, x and y, return distance
    In the remainder of the paper we show how we can combine             (geo-within-radius ?x ?y 10.0) -> find y within 10 miles from x
geospatial reasoning, temporal logic, social network analytics,          (geo-inside-polygon ?polygon ?place ?lon ?lat)
and RDFS reasoning all in one query language.
      One question that people ask who are familiar with triple
stores is: how can this work efficiently on very large data sets         For our benchmarking we use the open source GeoNames
containing billions of triples? Most first generation triple stores    database that can be freely downloaded from GeoNames.org [12].
store the URIs and literals that constitute the parts of a triple as   The database contains nearly 7 million points of interest on
strings in a dictionary. So, when doing range queries over             earth. From interesting points in nature, to populated areas, to
numeric values, for example, "select * from person where age >         schools and churches, etc. Each point has 12 features such as
50”, the triple store engine has to go through each value for the      asciiname, the local name, elevation level, longitude, latitude,
predicate ‘age’. One way around this is to add btrees for every        population, etc. Actually, it is not a database but a csv file that
numeric type but that in general is a very inefficient solution in     programmers can modify as necessary. For our purposes we
triple stores. The triple store that we use is AllegroGraph which      obviously transform it into RDF triples. We can retrieve all 459
is actually a hybrid between a relational database and triple          geo-points around Berkeley less than 4 miles away in less than 5
store, the internal representation of the triples is such that is      milliseconds. We would argue that the basic retrieval speed is
allows for very efficient range queries.                               comparable to or better than current commercial spatial
                                                                       databases. Here are some typical example queries that you can
                                                                       do on the GeoNames database:
A. Temporal Reasoning

    Our temporal reasoning is based on James Allen’s Interval            Find the distance between Oakland and the one and only
Logic [9]. This logic looks at all the 13 ways two temporal            Berkeley in California.
intervals can relate to each other. We provide predicates for each     (select (?dist)
     (q ?x geo:name “Oakland”)                                         measure. This predicate will start with the most important one
     (q ?y geo:name "Berkeley")                                        first.

     (q ?y geo:admin1_code "CA")
     (geo-distance ?x ?y ?dist))                                            (select ?x
                                                                                    (ego-group fr:Person1 knows 2 ?group)

Put in a Google map all the places within 10 miles from Oakland                     (actor-centrality-members ?group knows ?x))

     (google-map (select (?name ?lat ?lon)
                   (q ?x geo:asciiname “Oakland”)                            AllegroGraph is a native, general graph database, written
                                                                       specifically to make graph search faster. However, the bottleneck
                   (geo-within-radius ?x ?y 10)                        is still getting triples from disk as fast as possible and having the
                   (q ?y geo:asciiname ?name)                          smartest algorithms and best caching available. For example,
                                                                       many of the centrality measures that are used to compute the
                   (q ?y geo:isAt5 ?pos)
                                                                       importance of an actor in a known group need to compute the
                   (pos->lon/lat ?pos ?lon ?lat)))                     shortest path between every actor in the group. We have created
                                                                       special constructors to cache these groups in a transparent way so
                                                                       that most computations can be done without minimal IO.
C. Social Network Analysis (SNA)
                                                                                     IV. AN OVERVIEW EXAMPLE
       Many RDF resources are about people and relationships
between people, or between people and companies, or between
companies and other companies. We added Social Network                     In order to give the reader an impression of the breadth and
Analysis methods to make it easier to reason about relationships       depth of the query language, we provide a typical example that
and groups. The functions that we provide address the five basic       combines geospatial, temporal, SNA and RDFS reasoning.
questions from Social Network Analysis. (1) How far is person A             (select (?x)
from person B, (2) if there is a link between A and B then how
                                                                              (ego-group person:jans knows ?group 2)
strong is this relationship, (3) given a particular actor A, in what
group does this actor ‘live’, (4) given an actor in a group, how              (actor-centrality-members ?group knows ?x ?num)
important is this actor in the group and finally, (5) given a group,          (q ?event fr:actor ?x)
how dense are the relationships in the group and does this group
have a leader or a set of leaders. The SNA library encompasses a              (qs ?event!rdf:type fr:Meeting)
set of well know SNA algorithms. We provide a set of general                 (interval-during ?event “2008-12-01” “2008-12-05”)
functions and have developed the concept of a generator. A                    (geo-box-around geoname:Berkeley ?event 5 miles)
generator is basically a function that takes as an input one node
and then creates a set of output nodes. The search functions and              !)
SNA functions that we provide take these generators as first class     In English this translates into:
arguments. For example: say we have a database with
                                                                            Find the group of friends and friends of friends around the
relationships between people, the generator ‘knows’ will take as
                                                                            person “Jans”. Find within this group the most important
an input a person and return a set of person(s) by following
                                                                            person first. Find if this person was part of an event that
fr:went-to-dinner-with and fr:went-to-movies in both directions.
                                                                            was of type Meeting and happened in a particular time
                                                                            interval within 5 miles of Berkeley.
      (defgenerator knows ()                                           Note that we seamlessly mix Social Network Analysis in the first
           (bidirectional fr:went-to-dinner fr:went-to-movies))        two clauses, a simple database look up in the third, an RDFS
                                                                       inference about the type of event, and then a temporal and a
                                                                       geospatial constraint. This current example and the examples
      We can use this generator to find, for example, the shortest     shown above utilize Prolog. We expect in early 2009 to have a
path between two people. In this case the query will return a list     SPARQL engine that will perform this identical query.
of persons.
                                                                            The syntax of the SPARQL query will be slightly more
                                                                       contrived due to the fact that SPARQL normally only allows
      (select ?x                                                       patterns that map directly on triples (see example below). Note
                                                                       that we introduced the non-standard ‘=’ or assignment construct.
              (shortest-path knows fr:Person1 fr:Person2 ?x))          We are planning to discuss this topic with the SPARQL
                                                                       committees.
     Or we can use the generator to first create a group of friends       select ?x where {
and friends of friends in the ego-group predicate, and then we               ?group = ego-group(person:jans knows 2) .
find the importance of each member using the actor-centrality
                                                                             ?x = actor-centrality-members(?group knows ?x) .
     ?event fr:actor ?x ;
            rdf:type fr:Meeting .                                    [8] Raimond, Y. Abdallay, S., Event Ontology, 2007,
                                                                         http://motools.sourceforge.net/event/event.html
     FILTER (interval-during ?event '2007-12-01' '2007-12-31')
     FILTER (geo-box-around geoname:Berkeley ?event 5miles)          [9] Allen, J.F.: Time and Time Again: The Many Ways to
                                                                         Represent Time. International Journal of Intelligent
    }
                                                                         Systems, Vol. 6, No. 4 (1991)

  V. SUMMARY AND FUTURE RESEARCH                                     [10] Wikipedia R-tree data structure,
                                                                          http://en.wikipedia.org/wiki/Rtree
     In this paper we have discussed how RDF can serve as a
basis for an event database where events are defined as ‘things’     [11] Geospatial tutorial section of Franz Inc.’s AllegroGraph 3.0
that (1) require RDFS++ reasoning because events have types,              documentation,
(2) require geospatial reasoning because events happen                    http://agraph.franz.com/support/documentation/current/geos
somewhere, (3) require temporal reasoning because events nearly           patial-tutorial.html
always have a start and duration and (4) require some form of
social analysis because most interesting events have one or more     [12] GeoNames Data Access, http://www.geonames.org/export/
actors.
     We demonstrated how all of these capabilities can be used
in one query language, in this case Prolog. And we expect that in
the near future these capabilities will be available in SPARQL as
well.
     The primary research effort for the current version of the
query framework is to enhance query-optimization. Notice that in
the example shown above, most clauses are not direct matches
against the database but functors that do computations. Some of
these functors can act both as generators and as filters (as is
common in Prolog). In case a functor acts as generator we need to
research better statistical predictions for how many solutions can
be expected so that we can do better re-ordering of clauses.



                       VI. REFERENCES
[1] Aasman, J., Unification of geospatial reasoning, temporal
    logic, & social network analysis in event-based systems,
    Distributed Event Based Systems (DEBS 2008)
    http://portal.acm.org/citation.cfm?id=1386007

[2] 1st Scientific American article on the Semantic Web,
    http://www.sciam.com/article.cfm?articleID=00048144-
    10D2-1C70-84A9809EC588EF21&ref=sciam

[3] REWERSE, http://rewerse.net

[4] Scharl, A., Tochtermann, K.: The Geospatial Web: How
    Geobrowsers, Social Software and the Web 2.0 are Shaping
    the Network Society. Springer (2007)

[5] W3C Geospatial Incubator Group,
    http://www.w3.org/2005/Incubator/geo/

[6] Gutierrez, C., Hurtado, C., and Vaisman, A. Temporal RDF.
    In European Conference on the Semantic Web (ECSW’05)
    (Best paper award), pages 93–107, 2005,
    http://www.dcc.uchile.cl/~cgutierr/papers/temporalRDF.pdf

[7] Mika, P.: Social Networks and the Semantic Web. Springer
    (2007)