Using OpenStreetMap Data to Create Benchmarks for Description Logic Reasoners ? Thomas Eiter1 , Patrik Schneider1 , Mantas Šimkus1 , and Guohui Xiao2 1 Institute of Information Systems, Vienna University of Technology, Vienna, Austria 2 KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano, Italy 1 Introduction Description Logics (DLs) are a well-established and popular family of decidable logics for knowledge representation and reasoning [6]. Several systems have been developed to reason over DL knowledge bases (KBs), which usually consist of a TBox and an ABox. A TBox describes the domain in terms of concepts and roles, while an ABox stores information about known instances of concepts and their participation in roles. Naturally, classical reasoning tasks like TBox satisfiability and subsumption under a TBox have received most attention and many reasoners have been devoted to them, e.g. FaCT++ [24], HermiT [22], or ELK [14]. These and other mature systems geared towards TBox reasoning have been rigorously tested and compared (e.g., JustBench [7]) using several real-life ontologies like GALEN and SNOMED-CT. A different category are reasoners for ontology-based query answering (OQA), which plays a major role in ontology-based data access (OBDA) [19]. They are de- signed to answer queries over DL KBs in the presence of large data instances (see e.g. Ontop [20], Pellet [21], Stardog [4] and OWL-BGP [16]). TBoxes in this setting are usually expressed in low complexity DLs, and are relatively small in size compared to the size of instance data. which makes OQA different from classical TBox reasoners. Several works have considered testing OQA systems [12, 17, 18, 23, 13]. Despite this, it has been acknowledged that judging the performance of OQA reasoners is dif- ficult due to the lack of publicly available benchmarks consisting of large amounts of real-life instance data. In this paper we consider publicly available geographic datasets as a source of test data for OQA systems. In similar spirit, recently some benchmarks have been proposed to test implementations of geospatial extensions of SPARQL [15, 10]. The latter benchmarks are geared towards testing spatial reasoning capabilities, and cannot be easily adapted for testing OQA systems. The main goal of this paper is to describe how benchmarks for OQA systems can be created from OpenStreetMap [2] (OSM) geospatial data by employing a rule-based data transformation framework. The OSM project aims to collaboratively create an open map of the world. It has proven hugely successful and is constantly updated and extended. OSM data describes maps in terms of (possibly tagged) points, ways (geometries), and more complex ag- gregate objects called relations. We believe the following features make OSM a good source to obtain instance data for OQA reasoners: ? The authors were supported by FWF projects P24090 and P25518, the WWTF project ICT12- 015, and the EU project Optique FP7-318338. - Datasets of different sizes exist; e.g., OSM maps for all major cities, countries, and continents are directly available or can be easily generated. - Depending on the location (e.g., urban versus rural), the density, separation, and com- pactness of object location varies strongly. - Spatial objects have an inherent structure of containment, bordering, and overlapping, which can be exploited to generate topological relations (e.g., contains). - Spatial objects are tagged with semantic information like the type of an object (e.g., hospitals). This information can be naturally represented as DL concepts and roles. The paper is organized as follows. We first develop a formalization of OSM that al- lows to view OSM maps as relational structures. We then present a rule-based language to extract information from OSM data. Afterwards, we describe the implementation of our approach together with some generated benchmarks. Finally, we present some experimental results and conclude. 2 Formalization of Data Extraction In this section, we first formally describe our model for OSM data, and then employ it to describe our rule-based language to extract instance data from OSM data. Maps in OSM are represented using four basic constructs (a.k.a. elements): (1) nodes, which correspond to points with a geographic location; (2) geometries (a.k.a. ways), which are given as sequences of nodes; (3) tuples (a.k.a. relations), which are given as sequences of nodes, ways and tuples; (4) tags, which are used to describe meta- data about nodes, ways and tuples. Geometries are used in OSM to express polylines and polygons, in this way describing streets, rivers, parks, etc. OSM tuples are used to relate several elements, e.g. to indicate the turn priority in an intersection of two streets. Formalization of OSM We assume infinite mutually disjoint sets Mnid , Mgid , Mtid and Mtags of node identifiers, geometry identifiers, tuple identifiers and tags, respectively. We let Mid = Mnid ∪ Mgid ∪ Mtid and call it the set of identifiers. An (OSM) map is a triple M = (D, E, L) such that: 1. D ⊆ Mid is finite set of identifiers called the domain of M. 2. E is a function from D such that: (a) if e ∈ Mnid , then E(e) ∈ R × R; (b) if e ∈ Mgid , then E(e) = (e1 , . . . , em ) with {e1 , . . . , em } ⊆ D ∩ Mnid ; (c) if e ∈ Mtid , then E(e) = (e1 , . . . , em ) with {e1 , . . . , em } ⊆ D; 3. L is a labeling function L : D → 2Mtags . Intuitively, in a map M = (D, E, L) the function E assigns to each node identifier a coordinate, to each geometry identifier a sequence of nodes, and to each tuple identifier a sequence of arbitrary identifiers. Enriching Maps with Computable Relations The above formalizes the raw represen- tation of OSM data. To make it more useful, we support incorporation of information that can be computed from a map. In particular, we allow to enrich maps with arbitrary computable relations over Mid . Let Mrels be an infinite set of map relation symbols, each with an associated nonnegative integer, called the arity. An enriched map is a tuple M = (D, E, L, ·M ), where (D, E, L) is a map and ·M is a partial function that assigns to a map relation symbol R ∈ Mrels a relation RM ⊆ Dn , where n is the arity of R. In this way, a map can be enriched with externally computed relations like the binary relations “is closer than 100m”, “inside a country”, “reachable from”, etc. For the examples below, we assume that an enriched map M as above always defines the unary relation Tagα for every tag α ∈ Mtags . In particular, we let e ∈ TagM α iff α ∈ L(e), where e ∈ D. We will also use the binary relation Inside(x, y), which captures the fact the location of a point x is inside a geometry y. A Rule Language for Data Extraction We define a rule-based language that can be used to describe how an ABox is created from an enriched map. Our language is based on Datalog with stratified negation [5]. Let Drels be an infinite set of Datalog relation symbols, each with an associated arity. For simplicity, and with a slight abuse of no- tation, we assume that DL concept and role names form a subset of Datalog relations. Formally, we take an infinite set Dconcepts ⊆ Drels of unary relations called concept names and an infinite set Droles ⊆ Drels of binary relations called role names. Let Dvars be a countably infinite set of variables. Elements of Mid ∪ Dvars are called terms. An atom is an expression of the form R(t) or ¬R(t), where R is a map or a Datalog relation symbol of arity n, and t is an n-tuple of terms. A rule r is an expression of the form B1 , . . . , Bn → R(t), where B1 , . . . , Bn are atoms (called body atoms) and R(t) is an atom with R a Datalog relation. We assume Datalog safety, i.e. each variable of a rule occurs in a non-negated body atom. A program P is any finite set of rules. A rule or program is ground if it has no occurrences of variables. We only consider stratified programs (see [5] for a definition). The semantics of a program P is given relative to an enriched map M. The ground- ing of a program P w.r.t. M is the (ground) program ground(P, M) that can be ob- tained from P by substituting in all possible ways the variables in rules of P with identifiers occurring in M or P . We use a variant of the Gelfond-Lifschitz reduct [11] to get rid of map atoms in a program. The reduct of P w.r.t. M is the program P M obtained from ground(P, M) as follows: (a) From the body of every rule r delete every map atom ¬R(t) with t 6∈ RM . (b) Delete every rule r whose body contains a map atom ¬R(t) with t ∈ RM . Observe that P M is an ordinary stratified Datalog program with identifiers acting as constants. We let P M (M, P ) denote the perfect model of the program P M . See [5] for the construction of P M (M, P ) by fix-point computation along the stratification. We are finally ready to extract an ABox. Given a map M and a program P , we denote by ABox(M, P ) the restriction of P M (M, P ) to the atoms over concept and role names. We next illustrate some of the features and advantages of the presented rule lan- guage. E.g. the following rule collects in the role hasCinema the cinemas of a city (we use a sans-serif and a typewriter font for map and Datalog relations, respectively): Point(x), Tagcinema (x), Geom(y), Tagcity (y), Inside(x, y) → hasCinema(y, x). Negation in rule bodies can be used for default conclusions. E.g. the following rule states that recreational areas include all parks that are not known to be private: Geom(x), Tagpark (x), ¬Tagprivate (x) → RecreationalArea(x). 3 Mapping Framework and Implemenation Mapping Framework. The framework allows the user to define the data transforma- tion for the instance generation in a declarative manner. As simplicity and extensibility are two of our main goals, we address these on two levels. The first level concerns the extensibility of the mapping language and the related evaluation method by supporting: (a) An ETL mode which resembles the extract, trans- form, and load (ETL) process of classical data transformation tools; (b) A Datalog mode which offers the features of Datalog with stratified negation. The second level concerns the data sources (e.g., OSM) and external evaluation (e.g. calculating spatial relations). In certain cases we have to use external functionality for more sophisticated computa- tions. E.g., in the ETL mode, we treat the scripts as custom data source or as simple transformations between input and output. The Datalog and an ETL language are combined in the following custom mapping language. The Data Source Declarations are concerned with the general definitions as PostgreSQL/PostGIS connections. The ETL Mapping Axioms define a single ETL step, where the syntax is an extension of the Ontop mapping language. It is defined either as a pair of source and target or as a triple of source, transform, and target. The following sources and targets are currently available: (a) PostgreSQL/PostGIS with SQL statements as a source or target; (b) Text files with regular expressions as a source or target; (c) RDF files with SPARQL queries as a source; (d) External Python scripts and constants as a source. The Datalog Rules are based on the syntax of the library pyDatalog and embedded into the source and target of a mapping axiom. Implementation. The main Python 2.7 script of the developed generation framework is called as follows: generate.py –mappingFile file.txt –mode etl|datalog The ETL mode is designed for bulk processing, where scalability and performance is crucial and complex calculations are not used or moved into the external scripts. We implemented the computation in a data streaming-based manner. The Datalog mode is designed for the features of stratified Datalog, where the entire result is always in- memory. In this mode, dependencies between sources and targets are considered by the declarative nature of Datalog. We use the same source and target components as in the ETL mode but follow a computation of three steps: 1. The source components are evaluated to store the n-tuples in the EDB; 2. The stratified Datalog program is evaluated; 3. The IDB predicate outputs are processed by the target components. The architecture naturally results from the two modes, and the source and target components. Besides the generation script, we already provide a selection of external scripts for processing OSM data. E.g., spatialRelationsReader.py calculates the spa- tial relations (e.g., contains) between two RDF files. 4 Benchmarks and Results All the mapping files and test data for the introduced benchmark are available online [1]. OSM Dataset and Benchmark Ontology. OSM offers different subset databases with different sizes and structures. Depending on the requirement, one could extract subsets fulfilling a variety of criteria as instance sizes, density, or structure (e.g., spatial topo- logical or graph-like). For the base dataset [3], we chose the cities of Cork, Riga, Bern, and Vienna. The cities are of increasing size and are European capitals or major cities with a high density of objects in the center and decreasing density towards the outskirts. The chosen DL-LiteR [8] ontology for the benchmarks is taken from the MyITS project [9]. It is tailored to geospatial data sources and beyond to MyITS specific sources (e.g., a restaurant guide). The top level is built on GeoOWL, the second level based on the nine top-features of GeoNames, and the third level is built mainly from OSM categories. The ontology is for DL-LiteR systems of average difficulty, since it does contain only a few existentials quantification on the right-hand side of the inclu- sion axioms. However, due to its size and concept and role hierarchy depth, it posses a challenge regarding the rewritten query size. Spatial Relations Benchmark (BM1). For the cities, we transform all the amenities (e.g., restaurants), leisure areas (e.g., playgrounds), and shops to concept assertions. Then, we calculate the roles for the spatial relations inside between leisure areas and amenities. Further we generate several next relations between amenities and shops. The different next relations are generated according to the distances of 50m, 100m, 250m, and 500m between two objects. We define five queries that use the different “spa- tial” roles. In q1 we evaluate over general concepts with a low selectivity of the role contains. The query q2 determines which shops are next to amenities, which in turn are inside a park. Query q3 resembles a star shaped pattern of relationships and uses the role hasCuisine derived from OSM tags. In query q4 , we have a triangular pattern of relationships and query amenities which are located in two leisure areas that overlap. With q5 , we illustrate that arbitrary role chains can be used by connecting next. q1 (y, z) ← Amenity(y), contains(z, y), Leisure(z) q2 (x, y, z) ← Shop(x), hasOperator(x, w), SupermarketOp(w), next(x, y), Amenity(y), inside(y, z), Leisure(z) q3 (x, y, z) ← Amenity(x), hasQualitativeValue(x, u), Cuisine(u), next(x, y), Shop(y), contains(z, x), Leisure(z) q4 (x) ← Amenity(x), next(x, y), Leisure(y), intersect(y, z), Leisure(z), next(x, z) q5 (x, y) ← Amenity(x), next(x, z), next(z, y), Shop(y) Evaluation and Results. We consider the following query answering systems: (a) Pel- let [21], an OWL reasoner including an ABox query engine supporting answering con- junctive queries using optimization techniques of query simplification and query re- ordering (b) Stardog, the successor of Pellet, which is a commercial semantic graph database supporting SPARQL (c) OWL-BGP [16], a SPARQL wrapper for OWL-API based reasoners (e.g. HermiT used in this experiment). Finally we remark that On- top under “classic mode” in principle can also answer queries over ontologies directly. However as currently the “classic mode” in Ontop is not well optimized, a proper eval- uation is not feasible. Moreover since Ontop is designed to also take R2RML mappings into account, comparing Ontop with “pure” OQA systems is not done in this paper. The experiments were performed on a HP Proliant Linux server with 144 cores @3.47GHz, 106GB of RAM and a 1TB 15K RPM HD. The systems were evaluated on Java 7 with 8GB memory for the JVM. The timeouts are set to 10min (indicated by ‘-’) for query answering and to 1h for ontology loading. The evaluation results are presented in Table 1, where S stands for Stardog, P for Pellet, and OB for OWL-BGP, and show that BM1 is indeed challenging for instances as Vienna. We can see a gradual increase of difficulties with increasing city size and the Table 1: Evaluation results on BM1 (time in sec.) Cork Riga Bern Vienna S P OB S P OB S P OB S P OB next250 q1 1.36 1.37 0.01 1.70 2.92 0.02 2.37 5.76 0.38 2.87 16.92 0.48 q2 1.56 0.04 0.02 1.55 0.05 0.05 1.90 0.08 0.14 2.21 15.83 1.04 q3 1.99 0.10 0.02 2.27 0.11 0.05 4.54 0.29 0.15 16.95 16.49 1.09 q4 1.90 0.10 0.29 1.88 0.51 0.60 5.07 1.05 1.75 21.31 20.99 16.39 q5 5.65 0.93 4.62 12.54 1.62 27.36 47.90 4.94 142.40 - - - load 2.12 2.69 5.10 2.58 4.44 11.47 3.77 7.79 24.40 7.66 24.62 128.06 next500 q1 1.56 1.50 0.01 1.84 3.99 0.02 1.99 7.67 0.40 2.32 48.03 - q2 1.42 0.04 0.03 1.63 0.10 0.12 1.83 7.86 0.31 3.84 52.23 - q3 1.97 0.10 0.03 2.41 0.15 0.11 9.36 8.29 0.30 26.96 38.21 - q4 1.45 0.20 0.23 2.11 1.84 23.88 22.42 10.77 4.13 8.59 74.84 - q5 16.48 2.49 19.51 75.28 15.26 148.06 - - - - - - load 2.50 3.38 5.83 3.42 7.66 15.65 4.97 13.59 35.85 15.22 86.56 - distance of next. q1 and q2 can be seen as the baseline results with low/high selectivity and pose no challenge to the reasoners. With the queries q3 , q4 , and q5 , we can see a breaking point where the performance starts to decline for all reasoners. For Stardog this is Vienna with next250 (a next distance of 250), for Pellet this is Vienna next250 , and for OWL-BGP it declines with Vienna next250 . Remarkably, every reasoner has the following strengths: (a) OWL-BGP has the best performance in the queries q1 , q2 , and q3 ; (b) Stardog outperforms the other in the large instances of q4 , and q5 ; (c) Pellet is best at the the medium to large instances of q4 , and q5 . Note that query q5 is the most challenging one due to the join of the large next relations, which are 939 988 (resp. 2 857 488) instances for next250 (resp. next500 ) in Vienna. 5 Conclusion and Outlook In this paper, we have presented a framework for extracting instance data from OSM. Based on the framework, we have evaluated some state-of-the-art OQA systems with our benchmark and showed that the respective systems have their strengths and weak- nesses depending on the particular input query, and the size and the structure of the input ABox. Future research is directed to variants and extensions of the framework. We aim to extend the implementation to capture more input and output sources, further parameters (e.g. various degrees of graph connectedness), and services. An important functional extension would allow us to remove some assertions that are implied by the input on- tology, in this way the information incompleteness could be better controlled. Another extension could be related to automatic generation of TBoxes by means of analysis of geospatial data. Further, our benchmark ontology could be extended to other important DLs as EL and its relatives. Finally, based on our open repository, we aim to collect a wider range of benchmarks with the related transformations and data extracts. References 1. City-bench. http://ghxiao.github.io/city-bench. Accessed: 2014-05-20. 2. Openstreetmap. http://www.openstreetmap.org. Accessed: 2014-05-01. 3. Openstreetmap data repository. http://download.bbbike.org/osm/bbbike/. Accessed: 2014-04-14. 4. Stardog. http://stardog.com/. Accessed: 2014-05-01. 5. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison- Wesley, 1995. 6. Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter F. Patel- Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applica- tions. 2nd edition, 2007. 7. Samantha Bail, Bijan Parsia, and Ulrike Sattler. Justbench: A framework for OWL bench- marking. In Proc. of ISWC 2010, pages 32–47. Springer, 2010. 8. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric- cardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385–429, 2007. 9. Thomas Eiter, Thomas Krennwallner, and Patrik Schneider. Lightweight spatial conjunctive query answering using keywords. In Proc. of ESWC 2013, pages 243–258, 2013. 10. George Garbis, Kostis Kyzirakos, and Manolis Koubarakis. Geographica: A benchmark for geospatial rdf stores. CoRR, abs/1305.5653, 2013. 11. Michael Gelfond and Vladimir Lifschitz. The stable model semantics for logic programming. In ICLP/SLP, volume 88, pages 1070–1080, 1988. 12. Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semantics, 3(2-3):158 – 182, 2005. 13. Martha Imprialou, Giorgos Stoilos, and Bernardo Cuenca Grau. Benchmarking ontology- based query rewriting systems. In Proc. of AAAI 2012, 2012. 14. Yevgeny Kazakov, Markus Krötzsch, and František Simancı́k. Concurrent classification of EL ontologies. In Proc. ISWC 2011, pages 305–320, Berlin, Heidelberg, 2011. Springer. 15. Dave Kolas. A benchmark for spatial semantic web systems. In 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2008), October 2008. 16. Ilianna Kollia and Birte Glimm. Optimizing SPARQL query answering over OWL ontolo- gies. J. Artif. Intell. Res. (JAIR), 48:253–303, 2013. 17. Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, and Shengping Liu. Towards a complete owl ontology benchmark. In Proc. of ESWC 2006, pages 125–139. Springer. 18. Héctor Pérez-Urbina, Ian Horrocks, and Boris Motik. Efficient query answering for OWL 2. In Proc. of ISWC 2009, pages 489–504, 2009. 19. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo Rosati. Linking data to ontologies. J. Data Semantics, 10:133– 173, 2008. 20. Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology- based data access: Ontop of databases. In Proc. of ISWC 2013. Springer, 2013. 21. E. Sirin, B. Parsia, B. Cuenca Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. J. Web Sem., 5(2):51–53, 2007. 22. Giorgos Stoilos, Birte Glimm, Ian Horrocks, Boris Motik, and Rob Shearer. A novel ap- proach to ontology classification. Web Semantics: Science, Services and Agents on the World Wide Web, 14(0), 2012. 23. Giorgos Stoilos, Bernardo Cuenca Grau, and Ian Horrocks. How incomplete is your semantic web reasoner? In Proc. of AAAI 2010. AAAI Press, 2010. 24. Dmitry Tsarkov and Ian Horrocks. FaCT++ description logic reasoner: System description. In Proc. of IJCAR 2006, pages 292–297, Berlin, Heidelberg, 2006. Springer.