Using OpenStreetMap Data to Create Benchmarks for
              Description Logic Reasoners ?

           Thomas Eiter1 , Patrik Schneider1 , Mantas Šimkus1 , and Guohui Xiao2
       1
        Institute of Information Systems, Vienna University of Technology, Vienna, Austria
 2
      KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano, Italy


1      Introduction
Description Logics (DLs) are a well-established and popular family of decidable logics
for knowledge representation and reasoning [6]. Several systems have been developed
to reason over DL knowledge bases (KBs), which usually consist of a TBox and an
ABox. A TBox describes the domain in terms of concepts and roles, while an ABox
stores information about known instances of concepts and their participation in roles.
    Naturally, classical reasoning tasks like TBox satisfiability and subsumption under
a TBox have received most attention and many reasoners have been devoted to them,
e.g. FaCT++ [24], HermiT [22], or ELK [14]. These and other mature systems geared
towards TBox reasoning have been rigorously tested and compared (e.g., JustBench [7])
using several real-life ontologies like GALEN and SNOMED-CT.
    A different category are reasoners for ontology-based query answering (OQA),
which plays a major role in ontology-based data access (OBDA) [19]. They are de-
signed to answer queries over DL KBs in the presence of large data instances (see
e.g. Ontop [20], Pellet [21], Stardog [4] and OWL-BGP [16]). TBoxes in this setting
are usually expressed in low complexity DLs, and are relatively small in size compared
to the size of instance data. which makes OQA different from classical TBox reasoners.
    Several works have considered testing OQA systems [12, 17, 18, 23, 13]. Despite
this, it has been acknowledged that judging the performance of OQA reasoners is dif-
ficult due to the lack of publicly available benchmarks consisting of large amounts of
real-life instance data. In this paper we consider publicly available geographic datasets
as a source of test data for OQA systems. In similar spirit, recently some benchmarks
have been proposed to test implementations of geospatial extensions of SPARQL [15,
10]. The latter benchmarks are geared towards testing spatial reasoning capabilities,
and cannot be easily adapted for testing OQA systems. The main goal of this paper is
to describe how benchmarks for OQA systems can be created from OpenStreetMap [2]
(OSM) geospatial data by employing a rule-based data transformation framework.
    The OSM project aims to collaboratively create an open map of the world. It has
proven hugely successful and is constantly updated and extended. OSM data describes
maps in terms of (possibly tagged) points, ways (geometries), and more complex ag-
gregate objects called relations. We believe the following features make OSM a good
source to obtain instance data for OQA reasoners:
?
     The authors were supported by FWF projects P24090 and P25518, the WWTF project ICT12-
     015, and the EU project Optique FP7-318338.
- Datasets of different sizes exist; e.g., OSM maps for all major cities, countries, and
  continents are directly available or can be easily generated.
- Depending on the location (e.g., urban versus rural), the density, separation, and com-
  pactness of object location varies strongly.
- Spatial objects have an inherent structure of containment, bordering, and overlapping,
  which can be exploited to generate topological relations (e.g., contains).
- Spatial objects are tagged with semantic information like the type of an object (e.g.,
  hospitals). This information can be naturally represented as DL concepts and roles.
    The paper is organized as follows. We first develop a formalization of OSM that al-
lows to view OSM maps as relational structures. We then present a rule-based language
to extract information from OSM data. Afterwards, we describe the implementation
of our approach together with some generated benchmarks. Finally, we present some
experimental results and conclude.


2   Formalization of Data Extraction
In this section, we first formally describe our model for OSM data, and then employ it
to describe our rule-based language to extract instance data from OSM data.
    Maps in OSM are represented using four basic constructs (a.k.a. elements): (1) nodes,
which correspond to points with a geographic location; (2) geometries (a.k.a. ways),
which are given as sequences of nodes; (3) tuples (a.k.a. relations), which are given
as sequences of nodes, ways and tuples; (4) tags, which are used to describe meta-
data about nodes, ways and tuples. Geometries are used in OSM to express polylines
and polygons, in this way describing streets, rivers, parks, etc. OSM tuples are used to
relate several elements, e.g. to indicate the turn priority in an intersection of two streets.
Formalization of OSM We assume infinite mutually disjoint sets Mnid , Mgid , Mtid and
Mtags of node identifiers, geometry identifiers, tuple identifiers and tags, respectively.
We let Mid = Mnid ∪ Mgid ∪ Mtid and call it the set of identifiers. An (OSM) map is a
triple M = (D, E, L) such that:

 1. D ⊆ Mid is finite set of identifiers called the domain of M.
 2. E is a function from D such that:
    (a) if e ∈ Mnid , then E(e) ∈ R × R;
    (b) if e ∈ Mgid , then E(e) = (e1 , . . . , em ) with {e1 , . . . , em } ⊆ D ∩ Mnid ;
    (c) if e ∈ Mtid , then E(e) = (e1 , . . . , em ) with {e1 , . . . , em } ⊆ D;
 3. L is a labeling function L : D → 2Mtags .

Intuitively, in a map M = (D, E, L) the function E assigns to each node identifier a
coordinate, to each geometry identifier a sequence of nodes, and to each tuple identifier
a sequence of arbitrary identifiers.
Enriching Maps with Computable Relations The above formalizes the raw represen-
tation of OSM data. To make it more useful, we support incorporation of information
that can be computed from a map. In particular, we allow to enrich maps with arbitrary
computable relations over Mid . Let Mrels be an infinite set of map relation symbols, each
with an associated nonnegative integer, called the arity.
    An enriched map is a tuple M = (D, E, L, ·M ), where (D, E, L) is a map and ·M is
a partial function that assigns to a map relation symbol R ∈ Mrels a relation RM ⊆ Dn ,
where n is the arity of R. In this way, a map can be enriched with externally computed
relations like the binary relations “is closer than 100m”, “inside a country”, “reachable
from”, etc. For the examples below, we assume that an enriched map M as above
always defines the unary relation Tagα for every tag α ∈ Mtags . In particular, we let
e ∈ TagM α iff α ∈ L(e), where e ∈ D. We will also use the binary relation Inside(x, y),
which captures the fact the location of a point x is inside a geometry y.
A Rule Language for Data Extraction We define a rule-based language that can be
used to describe how an ABox is created from an enriched map. Our language is based
on Datalog with stratified negation [5]. Let Drels be an infinite set of Datalog relation
symbols, each with an associated arity. For simplicity, and with a slight abuse of no-
tation, we assume that DL concept and role names form a subset of Datalog relations.
Formally, we take an infinite set Dconcepts ⊆ Drels of unary relations called concept
names and an infinite set Droles ⊆ Drels of binary relations called role names. Let Dvars
be a countably infinite set of variables. Elements of Mid ∪ Dvars are called terms.
    An atom is an expression of the form R(t) or ¬R(t), where R is a map or a Datalog
relation symbol of arity n, and t is an n-tuple of terms. A rule r is an expression of the
form B1 , . . . , Bn → R(t), where B1 , . . . , Bn are atoms (called body atoms) and R(t)
is an atom with R a Datalog relation. We assume Datalog safety, i.e. each variable of a
rule occurs in a non-negated body atom. A program P is any finite set of rules. A rule
or program is ground if it has no occurrences of variables. We only consider stratified
programs (see [5] for a definition).
    The semantics of a program P is given relative to an enriched map M. The ground-
ing of a program P w.r.t. M is the (ground) program ground(P, M) that can be ob-
tained from P by substituting in all possible ways the variables in rules of P with
identifiers occurring in M or P . We use a variant of the Gelfond-Lifschitz reduct [11]
to get rid of map atoms in a program. The reduct of P w.r.t. M is the program P M
obtained from ground(P, M) as follows:
    (a) From the body of every rule r delete every map atom ¬R(t) with t 6∈ RM .
    (b) Delete every rule r whose body contains a map atom ¬R(t) with t ∈ RM .
Observe that P M is an ordinary stratified Datalog program with identifiers acting as
constants. We let P M (M, P ) denote the perfect model of the program P M . See [5] for
the construction of P M (M, P ) by fix-point computation along the stratification. We
are finally ready to extract an ABox. Given a map M and a program P , we denote by
ABox(M, P ) the restriction of P M (M, P ) to the atoms over concept and role names.
    We next illustrate some of the features and advantages of the presented rule lan-
guage. E.g. the following rule collects in the role hasCinema the cinemas of a city (we
use a sans-serif and a typewriter font for map and Datalog relations, respectively):
   Point(x), Tagcinema (x), Geom(y), Tagcity (y), Inside(x, y) → hasCinema(y, x).
Negation in rule bodies can be used for default conclusions. E.g. the following rule
states that recreational areas include all parks that are not known to be private:
          Geom(x), Tagpark (x), ¬Tagprivate (x) → RecreationalArea(x).
3   Mapping Framework and Implemenation
Mapping Framework. The framework allows the user to define the data transforma-
tion for the instance generation in a declarative manner. As simplicity and extensibility
are two of our main goals, we address these on two levels.
    The first level concerns the extensibility of the mapping language and the related
evaluation method by supporting: (a) An ETL mode which resembles the extract, trans-
form, and load (ETL) process of classical data transformation tools; (b) A Datalog mode
which offers the features of Datalog with stratified negation. The second level concerns
the data sources (e.g., OSM) and external evaluation (e.g. calculating spatial relations).
In certain cases we have to use external functionality for more sophisticated computa-
tions. E.g., in the ETL mode, we treat the scripts as custom data source or as simple
transformations between input and output.
    The Datalog and an ETL language are combined in the following custom mapping
language. The Data Source Declarations are concerned with the general definitions as
PostgreSQL/PostGIS connections. The ETL Mapping Axioms define a single ETL
step, where the syntax is an extension of the Ontop mapping language. It is defined
either as a pair of source and target or as a triple of source, transform, and target. The
following sources and targets are currently available: (a) PostgreSQL/PostGIS with
SQL statements as a source or target; (b) Text files with regular expressions as a source
or target; (c) RDF files with SPARQL queries as a source; (d) External Python scripts
and constants as a source. The Datalog Rules are based on the syntax of the library
pyDatalog and embedded into the source and target of a mapping axiom.

Implementation. The main Python 2.7 script of the developed generation framework
is called as follows: generate.py –mappingFile file.txt –mode etl|datalog
     The ETL mode is designed for bulk processing, where scalability and performance
is crucial and complex calculations are not used or moved into the external scripts. We
implemented the computation in a data streaming-based manner. The Datalog mode
is designed for the features of stratified Datalog, where the entire result is always in-
memory. In this mode, dependencies between sources and targets are considered by
the declarative nature of Datalog. We use the same source and target components as
in the ETL mode but follow a computation of three steps: 1. The source components
are evaluated to store the n-tuples in the EDB; 2. The stratified Datalog program is
evaluated; 3. The IDB predicate outputs are processed by the target components.
     The architecture naturally results from the two modes, and the source and target
components. Besides the generation script, we already provide a selection of external
scripts for processing OSM data. E.g., spatialRelationsReader.py calculates the spa-
tial relations (e.g., contains) between two RDF files.

4   Benchmarks and Results
All the mapping files and test data for the introduced benchmark are available online [1].
OSM Dataset and Benchmark Ontology. OSM offers different subset databases with
different sizes and structures. Depending on the requirement, one could extract subsets
fulfilling a variety of criteria as instance sizes, density, or structure (e.g., spatial topo-
logical or graph-like). For the base dataset [3], we chose the cities of Cork, Riga, Bern,
and Vienna. The cities are of increasing size and are European capitals or major cities
with a high density of objects in the center and decreasing density towards the outskirts.
     The chosen DL-LiteR [8] ontology for the benchmarks is taken from the MyITS
project [9]. It is tailored to geospatial data sources and beyond to MyITS specific
sources (e.g., a restaurant guide). The top level is built on GeoOWL, the second level
based on the nine top-features of GeoNames, and the third level is built mainly from
OSM categories. The ontology is for DL-LiteR systems of average difficulty, since it
does contain only a few existentials quantification on the right-hand side of the inclu-
sion axioms. However, due to its size and concept and role hierarchy depth, it posses a
challenge regarding the rewritten query size.
Spatial Relations Benchmark (BM1). For the cities, we transform all the amenities
(e.g., restaurants), leisure areas (e.g., playgrounds), and shops to concept assertions.
Then, we calculate the roles for the spatial relations inside between leisure areas and
amenities. Further we generate several next relations between amenities and shops. The
different next relations are generated according to the distances of 50m, 100m, 250m,
and 500m between two objects. We define five queries that use the different “spa-
tial” roles. In q1 we evaluate over general concepts with a low selectivity of the role
contains. The query q2 determines which shops are next to amenities, which in turn
are inside a park. Query q3 resembles a star shaped pattern of relationships and uses the
role hasCuisine derived from OSM tags. In query q4 , we have a triangular pattern of
relationships and query amenities which are located in two leisure areas that overlap.
With q5 , we illustrate that arbitrary role chains can be used by connecting next.

q1 (y, z) ← Amenity(y), contains(z, y), Leisure(z)
q2 (x, y, z) ← Shop(x), hasOperator(x, w), SupermarketOp(w), next(x, y),
               Amenity(y), inside(y, z), Leisure(z)
q3 (x, y, z) ← Amenity(x), hasQualitativeValue(x, u), Cuisine(u), next(x, y),
               Shop(y), contains(z, x), Leisure(z)
q4 (x)       ← Amenity(x), next(x, y), Leisure(y), intersect(y, z), Leisure(z), next(x, z)
q5 (x, y) ← Amenity(x), next(x, z), next(z, y), Shop(y)
Evaluation and Results. We consider the following query answering systems: (a) Pel-
let [21], an OWL reasoner including an ABox query engine supporting answering con-
junctive queries using optimization techniques of query simplification and query re-
ordering (b) Stardog, the successor of Pellet, which is a commercial semantic graph
database supporting SPARQL (c) OWL-BGP [16], a SPARQL wrapper for OWL-API
based reasoners (e.g. HermiT used in this experiment). Finally we remark that On-
top under “classic mode” in principle can also answer queries over ontologies directly.
However as currently the “classic mode” in Ontop is not well optimized, a proper eval-
uation is not feasible. Moreover since Ontop is designed to also take R2RML mappings
into account, comparing Ontop with “pure” OQA systems is not done in this paper. The
experiments were performed on a HP Proliant Linux server with 144 cores @3.47GHz,
106GB of RAM and a 1TB 15K RPM HD. The systems were evaluated on Java 7 with
8GB memory for the JVM. The timeouts are set to 10min (indicated by ‘-’) for query
answering and to 1h for ontology loading.
     The evaluation results are presented in Table 1, where S stands for Stardog, P for
Pellet, and OB for OWL-BGP, and show that BM1 is indeed challenging for instances
as Vienna. We can see a gradual increase of difficulties with increasing city size and the
                    Table 1: Evaluation results on BM1 (time in sec.)
                  Cork                Riga               Bern              Vienna
              S     P    OB      S       P   OB     S       P   OB     S       P    OB
     next250
     q1   1.36 1.37      0.01 1.70    2.92 0.02 2.37     5.76 0.38 2.87 16.92 0.48
     q2   1.56 0.04      0.02 1.55    0.05 0.05 1.90     0.08 0.14 2.21 15.83 1.04
     q3   1.99 0.10      0.02 2.27    0.11 0.05 4.54     0.29 0.15 16.95 16.49 1.09
     q4   1.90 0.10      0.29 1.88    0.51 0.60 5.07     1.05 1.75 21.31 20.99 16.39
     q5   5.65 0.93      4.62 12.54   1.62 27.36 47.90   4.94 142.40   -     -      -
     load 2.12 2.69      5.10 2.58    4.44 11.47 3.77    7.79 24.40 7.66 24.62 128.06
     next500
     q1   1.56 1.50 0.01 1.84 3.99 0.02 1.99 7.67 0.40 2.32 48.03                    -
     q2   1.42 0.04 0.03 1.63 0.10 0.12 1.83 7.86 0.31 3.84 52.23                    -
     q3   1.97 0.10 0.03 2.41 0.15 0.11 9.36 8.29 0.30 26.96 38.21                   -
     q4   1.45 0.20 0.23 2.11 1.84 23.88 22.42 10.77 4.13 8.59 74.84                 -
     q5 16.48 2.49 19.51 75.28 15.26 148.06  -     -    -     -     -                -
     load 2.50 3.38 5.83 3.42 7.66 15.65 4.97 13.59 35.85 15.22 86.56                -


distance of next. q1 and q2 can be seen as the baseline results with low/high selectivity
and pose no challenge to the reasoners. With the queries q3 , q4 , and q5 , we can see a
breaking point where the performance starts to decline for all reasoners. For Stardog
this is Vienna with next250 (a next distance of 250), for Pellet this is Vienna next250 ,
and for OWL-BGP it declines with Vienna next250 . Remarkably, every reasoner has the
following strengths: (a) OWL-BGP has the best performance in the queries q1 , q2 , and
q3 ; (b) Stardog outperforms the other in the large instances of q4 , and q5 ; (c) Pellet is
best at the the medium to large instances of q4 , and q5 . Note that query q5 is the most
challenging one due to the join of the large next relations, which are 939 988 (resp. 2
857 488) instances for next250 (resp. next500 ) in Vienna.

5   Conclusion and Outlook
In this paper, we have presented a framework for extracting instance data from OSM.
Based on the framework, we have evaluated some state-of-the-art OQA systems with
our benchmark and showed that the respective systems have their strengths and weak-
nesses depending on the particular input query, and the size and the structure of the
input ABox.
    Future research is directed to variants and extensions of the framework. We aim to
extend the implementation to capture more input and output sources, further parameters
(e.g. various degrees of graph connectedness), and services. An important functional
extension would allow us to remove some assertions that are implied by the input on-
tology, in this way the information incompleteness could be better controlled. Another
extension could be related to automatic generation of TBoxes by means of analysis of
geospatial data. Further, our benchmark ontology could be extended to other important
DLs as EL and its relatives. Finally, based on our open repository, we aim to collect a
wider range of benchmarks with the related transformations and data extracts.
References
 1. City-bench. http://ghxiao.github.io/city-bench. Accessed: 2014-05-20.
 2. Openstreetmap. http://www.openstreetmap.org. Accessed: 2014-05-01.
 3. Openstreetmap data repository. http://download.bbbike.org/osm/bbbike/. Accessed:
    2014-04-14.
 4. Stardog. http://stardog.com/. Accessed: 2014-05-01.
 5. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-
    Wesley, 1995.
 6. Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter F. Patel-
    Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applica-
    tions. 2nd edition, 2007.
 7. Samantha Bail, Bijan Parsia, and Ulrike Sattler. Justbench: A framework for OWL bench-
    marking. In Proc. of ISWC 2010, pages 32–47. Springer, 2010.
 8. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric-
    cardo Rosati. Tractable reasoning and efficient query answering in description logics: The
    DL-Lite family. J. Autom. Reasoning, 39(3):385–429, 2007.
 9. Thomas Eiter, Thomas Krennwallner, and Patrik Schneider. Lightweight spatial conjunctive
    query answering using keywords. In Proc. of ESWC 2013, pages 243–258, 2013.
10. George Garbis, Kostis Kyzirakos, and Manolis Koubarakis. Geographica: A benchmark for
    geospatial rdf stores. CoRR, abs/1305.5653, 2013.
11. Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming.
    In ICLP/SLP, volume 88, pages 1070–1080, 1988.
12. Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. LUBM: A benchmark for OWL knowledge
    base systems. Web Semantics, 3(2-3):158 – 182, 2005.
13. Martha Imprialou, Giorgos Stoilos, and Bernardo Cuenca Grau. Benchmarking ontology-
    based query rewriting systems. In Proc. of AAAI 2012, 2012.
14. Yevgeny Kazakov, Markus Krötzsch, and František Simancı́k. Concurrent classification of
    EL ontologies. In Proc. ISWC 2011, pages 305–320, Berlin, Heidelberg, 2011. Springer.
15. Dave Kolas. A benchmark for spatial semantic web systems. In 4th International Workshop
    on Scalable Semantic Web Knowledge Base Systems (SSWS2008), October 2008.
16. Ilianna Kollia and Birte Glimm. Optimizing SPARQL query answering over OWL ontolo-
    gies. J. Artif. Intell. Res. (JAIR), 48:253–303, 2013.
17. Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, and Shengping Liu. Towards a
    complete owl ontology benchmark. In Proc. of ESWC 2006, pages 125–139. Springer.
18. Héctor Pérez-Urbina, Ian Horrocks, and Boris Motik. Efficient query answering for OWL 2.
    In Proc. of ISWC 2009, pages 489–504, 2009.
19. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio
    Lenzerini, and Riccardo Rosati. Linking data to ontologies. J. Data Semantics, 10:133–
    173, 2008.
20. Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology-
    based data access: Ontop of databases. In Proc. of ISWC 2013. Springer, 2013.
21. E. Sirin, B. Parsia, B. Cuenca Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL
    reasoner. J. Web Sem., 5(2):51–53, 2007.
22. Giorgos Stoilos, Birte Glimm, Ian Horrocks, Boris Motik, and Rob Shearer. A novel ap-
    proach to ontology classification. Web Semantics: Science, Services and Agents on the World
    Wide Web, 14(0), 2012.
23. Giorgos Stoilos, Bernardo Cuenca Grau, and Ian Horrocks. How incomplete is your semantic
    web reasoner? In Proc. of AAAI 2010. AAAI Press, 2010.
24. Dmitry Tsarkov and Ian Horrocks. FaCT++ description logic reasoner: System description.
    In Proc. of IJCAR 2006, pages 292–297, Berlin, Heidelberg, 2006. Springer.