Ontology-based database access with DIG-M ASTRO and the OBDA Plugin for Protégé Antonella Poggi1 , Mariano Rodriguez-Muro2 , Marco Ruzzi1 1 Dip. di Informatica e Sistemistica, SAPIENZA University of Rome 2 Faculty of Computer Science, Free University of Bozen-Bolzano 1 lastname@dis.uniroma1.it, 2 rodriguez@inf.unibz.it Abstract. In Ontology Based Data Access (OBDA), the aim is to use an ontol- ogy to mediate access to data. The main contribution of this work is to demon- strate two key components of an OBDA system, whose combination allows on- tology practitioners to finally realize end-to-end OBDA systems. The first com- ponent is an OBDA-enabled reasoner, named DIG-M ASTRO. The second com- ponent is the OBDA Plugin for Protégé, that in conjunction with the ontology editing features of Protégé, functions as an OBDA system designer. 1 Introduction Recent research and industrial efforts in both Data Integration, Semantic Web, or E- Science, witness increasing needs for semantically driven data access, and in particular for the so called Ontology Based Data Access (OBDA). Abstracting from the specific context, the aim of OBDA is to use an ontology, i.e. a formal conceptualization of the application domain, to mediate access to data. The added value of OBDA, w.r.t. access- ing a data source directly, is twofold. On the one hand, the ontology provides a semantic account of the application data domain. On the other hand, constraints expressed by the ontology allow to overcome incompleteness that may be present in the actual data. A lot of work has been invested in the last years to realize OBDA technologies. Ef- forts have been put in defining the appropriate language for the semantic layer, defining the structure and language of the mappings used to link the data and the semantic layer, studying the complexity of offering a set of useful services such as query answering, database schema extraction, inconsistency management, etc. Prototype and industrial level systems have been and are being built within academic and industrial research labs including those of major database players such as IBM [4]. However, as pointed out in [3] in the context of data integration, the gap between theory and practice is still wide. The systems which have been made available to the eager community of early adopters do not offer a cohesive structure. They are either based on different assump- tions, rely on different user interaction mechanism or provide limited functionality. As a result, when brought together, sometimes in very rough ways, their interaction turns out to be poor. The main contribution of this work is to present two key components of an OBDA system, whose combination allows ontology practitioners to finally realize end-to-end OBDA systems. The first component is an OBDA-enabled reasoner, named DIG-M ASTRO . The second component is the OBDA plugin for the standard ontology editing tool Protégé 1 , that in conjunction with Protégé, functions as an OBDA system designer. 1 http://protege.stanford.edu/ 2 From DIG-compliant to OBDA-enabled reasoner: DIG-M ASTRO The DIG interface 2 is a standardized HTTP/XML interface to Description Logics (DL) reasoners, that was developed by the DL Implementation Group, with the aim to ease the communication among tools making use of DL reasoners 3 . In particular, by im- plementing the DIG interface, DIG-compliant reasoners provide standard reasoning services over DL ontologies, e.g. ontology consistency checking, and query answer- ing [1]. Similarly, OBDA-enabled reasoners are systems that implement the OBDA exten- sion to the DIG interface proposed in [2]. By doing so, OBDA-enabled reasoners pro- vide standard reasoning services over ontologies whose data layer is specified by means of mappings, i.e. a set of assertions that establish the relationships between the ontology elements and the data at any suitable, possibly autonomous, source. As an example of OBDA-enabled reasoner, we demonstrate DIG-M ASTRO. M AS - TRO is the realization of the research done over the DL-LiteA language [5]. DL-LiteA is a fragment of OWL-DL designed to maximize expressivity while keeping reason- ing algorithms tractable. Specifically, on one hand, DL-LiteA is expressive enough to capture most of the standard UML and ER language constructs. On the other hand, rea- soning in DL-LiteA is L OG S PACE in data complexity, as efficient as query answering in relational databases. Therefore, instance level reasoning in DL-LiteA is orders of magnitude lower than reasoning in other DLs. As far as we are aware of, M ASTRO is the only DL reasoner that, besides imple- menting traditional reasoning services, is specifically built for OBDA operations, incor- porating facilities to specify data sources and mappings, and taking these into account when reasoning. Notably, the mapping techniques devised for M ASTRO also address the impedance mismatch issue that exists between the values stored in relational data sources and the objects represented in the ontologies [5]. DIG-M ASTRO enables M ASTRO to interact with any client conforming to the OBDA extension of DIG. Moreover it extends M ASTRO querying facilities by offering a set of additional services (the ones defined by the DIG Interface) which are reduced to the answering of suitable Union of Conjunctive Queries (UCQs) that M ASTRO can answer. Given the low computational complexity of M ASTRO, we argue that DIG-M ASTRO is an attractive component of an OBDA system, where the amount of data stored in the sources is large, and efficient reasoning is mandatory. 3 OBDA Plugin for Protégé On the road to OBDA, one of the main issues is the lack of tools for the design of core components of an OBDA system. With the aim of contributing to this issue, we developed the OBDA Plugin for Protégé, allowing users to: (i) describe the data sources of the OBDA system; (ii) describe the mappings connecting the data source and the entities of the ontology; (iii) send the descriptions of these components to an OBDA- enabled reasoner; and, (iv) issue queries to an OBDA-enabled reasoner and view the results. 2 http://dl.kr.org/dig/ 3 http://www.cs.man.ac.uk/˜sattler/reasoners.html Moreover, with flexibility as an objective, we envision an add-ons architecture for the OBDA Plugin for Protégé, to enable third parties to incorporate new suitable fea- tures to OBDA systems. Add-ons may either be specific to an OBDA setting, e.g. to enable the plugin to handle new kinds of mappings, data sources or OBDA queries. Or, add-ons may provide generic OBDA functionalities, e.g. to facilitate the design process, data source inspection, query or mapping validation, etc. In the current implementation, we provide facilities to (i) describe relational DBMS data sources; (ii) describe mappings of the form ψ ; φ, where φ is a conjunctive query over the ontology, and ψ is an arbitrary SQL query over the data sources [5]; (iii) issue UCQs in a restricted SPARQL syntax to the OBDA-enabled reasoner and view the results; and, (iv) inspect and manipulate the relational sources. 4 Demonstration scenario In this section we illustrate the scenario that we will use, to demonstrate how the com- bination of DIG-M ASTRO and the OBDA Plugin for Protégé, together with Protégé, al- lows to realize and query an OBDA system. Specifically, we will consider the academic domain benchmark ontology LUBM4 , and show how it can be used, within Protégé, to access a database from the Sapienza University of Rome. To emphasize the efficiency of OBDA instance level reasoning and query answering using DIG-M ASTRO, we will exploit a very large database, storing information about courses, exams, students and administrative staff within 27 different tables, whose size ranges from few to hundreds of thousands of tuples, with an overall size of 250.000 tuples. (a) RDMS-Ontology Mapping Pane (b) SPARQL UCQs Pane Fig. 1. OBDA Plugin screenshots We next enumerate the key functionalities of DIG-M ASTRO and the OBDA Plugin for Protégé that we will demonstrate under the above mentioned scenario. Ontology definition Since the LUBM ontology is written in the OWL-DL language, and DIG-M ASTRO works with ontologies expressed in DL-LiteA (i.e. a sub-fragment of OWL-DL), when the OBDA Plugin for Protégé “tells” DIG-M ASTRO about LUBM, it possibly rewrites non-DL-LiteA axioms into equivalent DL-LiteA axioms, and dis- card those axioms that are beyond DL-LiteA , issuing a warning whenever this happens. Remarkably, only few axioms of LUBM are beyond DL-LiteA . 4 http://swat.cse.lehigh.edu/projects/lubm/ Data source and mappings specification Using the OBDA Plugin for Protégé, the user can specify a relational database as a data source, as well as a set of mappings between the data and the ontology. The syntax of the mappings in the OBDA plugin is of the form presented in Section 3. The semantics depends on the OBDA-enabled reasoner, that is DIG-M ASTRO in our scenario. Hence, informally, a mapping of the form ψ ; φ, states that for each data tuple t̄ satisfying ψ, there exists a set of objects that are built starting from t̄ by means of suitable (Skolem) functors, and satisfy φ over the ontology 5 . Figure 1(a) shows a screenshot of the plugin RDMS-Ontology Mapping Pane, allowing to specify two mappings M : 1 and M : 2. Query answering Query answering capabilities are provided by the OBDA Plugin for Protégé and its interaction with an OBDA-enabled reasoner. At a glance, the user ac- cesses the plugin SPARQL UCQs pane, shown in Figure 1(b), to issue UCQs to an OBDA-enabled reasoner. Hence, as before, the semantics of query answering depends on the OBDA-enabled reasoner, that is DIG-M ASTRO [5]. Informally, taking into ac- count the ontology, the data source and the mappings, the reasoner implements a rewrit- ing technique which translates the input UCQ into a set of queries over the source, whose union stands for the answer to the original query. Finally, the plugin displays the results, and allows the user to further manipulate them, e.g. by exporting, or saving them. 5 Conclusion We presented two contributions in the direction of realizing OBDA systems: (i) an OBDA-enabled reasoner named DIG-M ASTRO, and (ii) a plug-in that enables Protégé to act as an OBDA system designer. We described a scenario in which we use the OBDA Plugin for Protégé to set up and query an OBDA system based on DIG-M ASTRO. We now conclude by enumerating some of the features of both components that will be ad- ditionally demonstrated during the demo session: (i) standard reasoning services, other than query answering, provided by an OBDA system based on DIG-M ASTRO, through the use of the OBDA Plugin for Protégé; (ii) the workflow of an OBDA system design using the OBDA Plugin for Protégé; and, (iii) the deployment of simple OBDA web applications that access an OBDA system based on DIG-M ASTRO. References 1. S. Bechhofer. The DIG Description Logic Interface: DIG/1.1. Proceedings of the 2003 De- scription Logic Workshop (DL 2003), 2003. 2. D. Calvanese and M. Rodriguez. An extension of dig 2.0 for handling bulk data. In Proc. of the 3rd Workshop on OWL: Experiences and Directions (OWLED 2007), volume 258, 2007. 3. L. M. Haas. Beauty and the beast: The theory and practice of information integration. In T. Schwentick and D. Suciu, editors, ICDT, volume 4353 of Lecture Notes in Computer Sci- ence, pages 28–43. Springer, 2007. 4. L. Ma, J. Mei, Y. Pan, K. Kulkarni, A. Fokoue, and A. Ranganathan. Semantic web tech- nologies and data management. In Proc. of W3C Workshop on RDF Access to Relational Databases, 2007. 5. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. on Data Semantics, X:133–173, 2008. 5 Refer to [5] for a full specification of mappings in M ASTRO.