Mastro Studio: a system for Ontology-Based Data Management Cristina Civili, Marco Console, Domenico Lembo, Lorenzo Lepore, Riccardo Mancini, Antonella Poggi, Marco Ruzzi, Valerio Santarelli, and Domenico Fabio Savo DIAG, Sapienza Università di Roma lastname@dis.uniroma1.it 1 Introduction Ontology-based data access (OBDA) is a computing paradigm in which access to data is realized through a three-level architecture, constituted by an ontology, a set of data sources, and the mapping between the two. In this paper we present the M ASTRO S TUDIO system for data management based on the OBDA paradigm [5]. M ASTRO S TUDIO is based on the M ASTRO reasoner for OBDA, and, therefore, inherits from M ASTRO the characteristics which we discuss in this and the following paragraph. Ontologies in M ASTRO are specified in logics of the DL-Lite family of Description Logics [4, 5]. Such logics, which are at the base of the OWL 2 QL profile, allow to capture the main modeling features of a variety of representation languages, such as basic ontology languages and conceptual data models, and at the same time maintain computational complexity of reasoning low, in particular when computed with respect to the size of the input data only (i.e., in data complexity). Data sources in M ASTRO are seen as a single relational database. When more than one source or even non-relational sources need to be accessed, such a database can be obtained through the use of off-the-shelf relational data federation tools. Finally, the mapping is essentially a set of GAV mapping assertions [7], which associate ontology elements with queries specified on the underlying database. By virtue of these design choices, query answering in M ASTRO can be done through a very efficient technique that reduces this task, via query rewriting, to standard SQL query evaluation. Besides reasoning capabilities offered by M ASTRO, M ASTRO S TUDIO is also equipped with a web-based graphical user interface (GUI) which allows for advanced mechanisms for the inspection of the components of an OBDA specification, i.e., the ontology, the mapping and the data sources. In particular, it allows for the representation of the ontology in a graphical form, resembling Entity-Relationship modeling, which makes the ontology accessible to non-experts of logical and ontology formalisms. Also, M ASTRO S TUDIO provides wiki-like documentation in which every element of the on- tology is associated with a natural language description, as well as with all ontology axioms and mapping assertions in which it is involved. The M ASTRO S TUDIO GUI is realized through the Drupal1 content management system. 1 http://drupal.org In the last few years, several works have been conducted on OBDA in a simplified setting where no mappings are used to connect the (intentional level of the) ontology to external data sources [8, 12]. The only notable exception besides M ASTRO S TUDIO is Quest [10], which has indeed common roots with our system. Quest is a system for query answering over DL-LiteA ontologies, which can work in both “classical” (i.e., with a local ABox) and “virtual” mode (i.e., exploiting mappings). Although first ex- periments show effectiveness of Quest in the classical scenario [10], the development of its usage in the virtual mode is still ongoing. Finally, we observe that, to the best of our knowledge, M ASTRO S TUDIO is the only full-fledged ontology-based data man- agement system which provides, along with OBDA functionalities, advanced features for documenting and inspecting an OBDA specification. 2 Technical background We recall here the notions of OBDA specification and OBDA semantics, and survey the main reasoning services and optimizations offered by M ASTRO S TUDIO. These (opti- mized) reasoning services are in fact inherited from the M ASTRO reasoner, in which they are realized, and suitably exposed as web services by M ASTRO S TUDIO. For these reasons, in the rest of this section we refer directly to the Mastro reasoner. OBDA specification. In M ASTRO, an OBDA specification is a triple hO, M, Di, where O is an ontology, D is a relational database instance, and M is the mapping between O and D. More precisely, O is specified in a logic of the DL-Lite family of lightweight Description Logics (DLs) [4, 5]. DLs are decidable fragments of first-order logic (FOL) that allow to represent the domain of interest in terms of concepts, denoting sets of ob- jects, roles, denoting binary relations between objects, and attributes, denoting relations between objects and values from predefined domains. DLs of the DL-Lite family have been specifically designed for OBDA and allow for a good tradeoff between the expres- sive power of the language and the computational complexity of reasoning. Notably, query answering in such DLs can be done in LOGSPACE with respect to data complex- ity. DL-Lite logics essentially capture standard conceptual modeling formalisms, such as UML Class Diagrams and Entity-Relationship Schemas, and are at the basis of OWL 2 QL, one of the tractable profiles of OWL 2, the current W3C standard language for ontologies2 . M is a set of assertions of the form Φ ψ, where Φ is an SQL query specified over the schema of D, and ψ is an element of the ontology O, i.e., a concept, a role, or an attribute (see also [9]). Intuitively, such a mapping assertion specifies that the tuples returned by the query Φ are used to generate the facts that instantiate ψ. M is therefore a GAV mapping, according to the data integration terminology [7]. OBDA semantics. The semantics of an OBDA specification is given in terms of FOL interpretations. A FOL interpretation I is a model for an ontology O if it satisfies (in the classical FOL sense) all logical axioms specified in O [4]. Then, given an OBDA specification B = hO, M, Di, a FOL interpretation I is a model for B if (i) I is a model for O, and (ii) I satisfies M, i.e., for each mapping assertion Φ ψ and each tuple t in the evaluation of Φ over D, I satisfies the fact ψ(t) (see also [9]). Notice that the 2 http://www.w3.org/TR/owl-profiles/ above notion of mapping satisfaction corresponds to the classical notion of satisfaction of sound GAV mapping in data integration [7]. An OBDA B is satisfiable if B admits at least one model. Reasoning in Mastro. Reasoning services that do not consider data are called inten- sional. Among these services, M ASTRO S TUDIO allows for the computation of all sub- sumption relationships inferred in an ontology between concepts, roles, and attributes. This, in particular, enables the construction of the classification tree of the ontology [2]. The main task involving data performed by M ASTRO is to answer (unions of) conjunctive queries ((U)CQs) posed over the ontology O of an OBDA system B = hO, M, Di. Answering one such query Q amounts to computing its certain answers, denoted CertAns(Q, B), i.e., the tuples that are in the interpretation of Q in every model of B (the FOL interpretation of a UCQ is the standard one [1])3 . In M ASTRO, certain answers to queries are computed through a query rewriting process. The basic notion underlying this approach is the one of perfect rewriting: a query QDB over D is a perfect rewriting of a query Q under B if the evaluation of QDB over D returns the set CertAns(Q, B). The perfect rewriting of a UCQ Q posed over O can be obtained in two steps: (i) compute an ontology-rewriting Q0 of Q with respect to the ontology O; (ii) compute the mapping-rewriting of Q0 by using the map- ping M, thus obtaining an SQL query on D. Intuitively, an ontology-rewriting of Q is another query Q0 , expressed over O, which incorporates all the relevant properties of the ontology axioms, so that, by using Q0 , we can compute the certain answers of Q by ignoring O, i.e., CertAns(Q, hO, M, Di) = CertAns(Q0 , h∅, M, Di). This step is realized in M ASTRO through the algorithm Presto [11]), which rewrites Q into a new UCQ Q0 over O. Then, the mapping-rewriting step can be seen as a variant of the un- folding procedure in GAV data integration, as it essentially substitutes each atom in the query Q0 with the SQL query that the mapping associates to the atom predicate. After the rewriting process, the query is fully expressed in SQL and can be directly evaluated over the sources. We notice also that checking ontology satisfiability in DL-Lite can be reduced to query answering. In particular, to each ontology axiom we can associate a query aiming at identifying the existence of counterexamples, i.e., data violating such axiom (e.g, data contradicting axioms imposing disjointness of concepts or functionality of roles). This is indeed the way ontology satisfiability is realized in M ASTRO. Optimizations. The perfect rewriting produced as described above is a union of SQL queries which may often contain a huge number of disjuncts. This is mainly due to the mapping-rewriting step, which combines in all possible ways the various mapping queries associated to each atom predicate, and this may very well produce a final SQL query whose size is exponential with respect to the size of the initial query and the size of the mappings [6]. However, in general, not all such disjuncts really contribute to the computation of the certain answers (for example, because a disjunct is contained into another). We developed in the M ASTRO reasoner a mechanism that is able to prune the rewriting and produce another perfect rewriting of smaller size. The adoption of this technique by M ASTRO allows to reduce the evaluation time of the final rewriting. 3 In fact, M ASTRO even allows for processing more expressive queries interpreted under a se- mantics that approximate standard FOL semantics (see [3] for details). Ontology Ontology Mastro (GraphML Mappings Consistency Checking Syntax) Translator (OWL Syntax) Query Rewriting Ontology Qr Mapping Rewriting Rewriting GUI Inspection Environment Reasoning Environment Perfect Mapping Management Intensional Reasoning Data Sources Input flow MASTRO STUDIO Software component invocation Fig. 1. The M ASTRO S TUDIO system architecture Also, to further optimize the rewriting process, M ASTRO allows for the use of so- called perfect mapping assertions. Given an OBDA specification B, a perfect mapping assertion is a pair hcq, cqDB i such that cq is a conjunctive query and cqDB is a perfect rewriting of q under B. Perfect mapping assertions of the above form can be used dur- ing both the ontology-rewriting and mapping-rewriting steps in the following way: if a conjunctive query q to be rewritten contains a subquery cq, M ASTRO substitutes cq with cqDB (modulo some variable unification), and makes the rewriting process to con- tinue only on the remaining part of q. It can be shown that this is a drastic optimization allowing to heavily reduce the size of the perfect rewritings. Notice also that perfect mappings may be obtained by simply storing the perfect rewritings computed by M ASTRO itself. In other words, the set of perfect mappings can be considered as a memory of the previous perfect rewritings, suitably pruned according to the first optimization described above. For further details, see [6]. 3 The M ASTRO S TUDIO system The base principle adopted in the design of M ASTRO S TUDIO is to provide a seam- less access to the ontology description and the reasoning services over it. The M ASTRO S TUDIO GUI is web-based and is realized through the Drupal open source CMS (Con- tent Management System). Via the GUI, the user can access two different environments: the Inspection Environment and the Reasoning Environment. The first environment pro- vides the user with functionalities for easily inspecting all the OBDA system compo- nents, whereas the second one allows for invoking various reasoning services, and is therefore tightly coupled with the underlying reasoner (cf. Figure 1). GUI inspection environment. This environment allows for three main functionalities, each realized by a specific component: ontology inspection, mapping inspection and data source inspection. Ontology inspection enables in-depth ontology navigation by means of the visualization of both a graphical representation of the ontology and its specification through the OWL functional syntax4 , as well as the provision of hypertex- tual descriptions of ontology elements, organized in the form of a wiki. The graphical representation of the ontology provided by M ASTRO S TUDIO has a graph-like structure, similar to that of an Entity-Relationship diagram. It allows for a gentle inspection of the ontology, accessible also to non-experts of logical and ontology formalisms. The ontology graph is encoded into GraphML5 , a standard XML-based graph exchange format. Such encoding, besides being used as input to the inspection environment for visualization of the ontology diagram, is transformed into OWL func- tional syntax through the Translator module (cf. Figure 1), which generates the corre- sponding DL-Lite axioms specified through the standard OWL functional syntax. The ontology inspection component also contains wiki-like documentation of the ontology, provided through the use of contributed Drupal modules such as Wikitools, Freelinking, and Flexifilter6 . Moreover, a custom module has been added to the Dru- pal core in order to automatically generate the wiki pages associated to the ontology. Starting from an ontology, the module allows to create a wiki page for each concept, role and attribute, according to a predefined template that includes some asserted in- formation, as well as axioms and mappings that are related to the element documented in the page. These pages are stored through the CMS and can be manually edited by the user in order to enrich the documentation with human-friendly information such as textual descriptions. The documentation can be inspected through a tree-menu that rep- resents the hierarchies of concepts, roles and attributes as asserted in the ontology. The Mapping inspection and the Data Source inspection components provide the ability to inspect respectively the mapping assertions and the data sources. In particular the latter allows the user to visualize the structure of the source relations and also to pose direct SQL queries over it. GUI reasoning environment. The second environment is structured on the basis of the main reasoning services provided by M ASTRO S TUDIO. In particular, it enables for in- voking intensional reasoning, ontology satisfiability, and query answering services. As for intensional reasoning, the user is provided with the visualization of all subsump- tions inferred by the ontology, relying on the underlying intensional reasoning mod- ule (cf. Figure 1). Concerning ontology satisfiability, the user can get an indication of which axioms are contradicted by source data, and a preview of such data (counterex- amples). Furthermore, the environment allows to specify queries (in SPARQL syntax) over the ontology and to visualize their certain answers returned by the reasoner. On user demand, details of the rewriting process, such as the ontology-rewriting and the mapping-rewriting, can be shown. M ASTRO reasoner. It is constituted by three main modules, i.e., the query rewriting, the consistency checking, and the intensional reasoning module. The query rewriting module realizes the query rewriting process and its optimiza- tions described in Section 2. In particular, the ontology rewriting sub-module receives the user query Q from the GUI and the OWL syntax specification of the ontology as inputs and produces Qr , which is the ontology-rewriting of Q. Qr is then passed to 4 http://www.w3.org/TR/owl-profiles/ 5 http://graphml.graphdrawing.org/ 6 http://drupal.org/project/ the mapping rewriting sub-module (cf. Figure 1), which takes as input also the map- ping specification and computes the perfect rewriting Q0SQL of Q. The module is also in charge of pruning Q0SQL according to the first optimization described in Section 2. The resulting query Q00SQL is then sent to the underlying DBMS for evaluation. Fur- thermore, it is also passed to the perfect mapping manager sub-module, which stores (subject to user confirmation) the perfect mapping hQ, Q00SQL i. Such module feeds both the ontology and the mapping rewriting modules, that can make use of perfect mapping assertions to “freeze” portions of the query to be rewritten (cf. Section 2). The intensional reasoning module realizes the intensional reasoning tasks described in the previous section. Besides providing its result to the reasoning environment of the GUI, it also gives the computed subsumptions as input to the query rewriting module since such subsumptions are needed for the execution of the Presto algorithm [11]. Finally, the consistency checking module realizes the ontology satisfiability method sketched in the previous section, verifying the consistency of each ontology axiom by producing the associated query and sending it to the query rewriting module for refor- mulation and evaluation. The results are returned to the GUI reasoning environment. References 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publ. Co., 1995. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge Uni- versity Press, 2nd edition, 2007. 3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. EQL-Lite: Effective first-order query processing in description logics. In Proc. of IJCAI 2007, pages 274–279, 2007. 4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 5. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo. Mastro: A reasoner for effective ontology-based data access. In Proc. of ORE-2012, volume 858 of CEUR, ceur-ws.org, 2012. 6. F. Di Pinto, D. Lembo, M. Lenzerini, R. Mancini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo. Optimizing query rewriting in ontology-based data access. In Proc. of EDBT 2013, 2013. 7. M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS 2002, pages 233–246, 2002. 8. H. Pérez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for DL-lite. In Proc. of DL 2009, volume 477 of CEUR, ceur-ws.org, 2009. 9. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. on Data Semantics, X:133–173, 2008. 10. M. Rodriguez-Muro and D. Calvanese. High performance query answering over DL-Lite ontologies. In Proc. of KR 2012, pages 308–318, 2012. 11. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. In Proc. of KR 2010, pages 290–300, 2010. 12. T. Venetis, G. Stoilos, and G. B. Stamou. Incremental query rewriting for OWL 2 QL. In Proc. of DL 2012, 2012.