Mastro Studio: a system for Ontology-Based
                      Data Management

                 Cristina Civili, Marco Console, Domenico Lembo,
                Lorenzo Lepore, Riccardo Mancini, Antonella Poggi,
              Marco Ruzzi, Valerio Santarelli, and Domenico Fabio Savo

                            DIAG, Sapienza Università di Roma
                               lastname@dis.uniroma1.it


1     Introduction
Ontology-based data access (OBDA) is a computing paradigm in which access to data
is realized through a three-level architecture, constituted by an ontology, a set of data
sources, and the mapping between the two.
    In this paper we present the M ASTRO S TUDIO system for data management based
on the OBDA paradigm [5]. M ASTRO S TUDIO is based on the M ASTRO reasoner for
OBDA, and, therefore, inherits from M ASTRO the characteristics which we discuss
in this and the following paragraph. Ontologies in M ASTRO are specified in logics of
the DL-Lite family of Description Logics [4, 5]. Such logics, which are at the base
of the OWL 2 QL profile, allow to capture the main modeling features of a variety of
representation languages, such as basic ontology languages and conceptual data models,
and at the same time maintain computational complexity of reasoning low, in particular
when computed with respect to the size of the input data only (i.e., in data complexity).
Data sources in M ASTRO are seen as a single relational database. When more than
one source or even non-relational sources need to be accessed, such a database can be
obtained through the use of off-the-shelf relational data federation tools. Finally, the
mapping is essentially a set of GAV mapping assertions [7], which associate ontology
elements with queries specified on the underlying database.
    By virtue of these design choices, query answering in M ASTRO can be done through
a very efficient technique that reduces this task, via query rewriting, to standard SQL
query evaluation.
    Besides reasoning capabilities offered by M ASTRO, M ASTRO S TUDIO is also
equipped with a web-based graphical user interface (GUI) which allows for advanced
mechanisms for the inspection of the components of an OBDA specification, i.e., the
ontology, the mapping and the data sources. In particular, it allows for the representation
of the ontology in a graphical form, resembling Entity-Relationship modeling, which
makes the ontology accessible to non-experts of logical and ontology formalisms. Also,
M ASTRO S TUDIO provides wiki-like documentation in which every element of the on-
tology is associated with a natural language description, as well as with all ontology
axioms and mapping assertions in which it is involved. The M ASTRO S TUDIO GUI is
realized through the Drupal1 content management system.
 1
     http://drupal.org
    In the last few years, several works have been conducted on OBDA in a simplified
setting where no mappings are used to connect the (intentional level of the) ontology
to external data sources [8, 12]. The only notable exception besides M ASTRO S TUDIO
is Quest [10], which has indeed common roots with our system. Quest is a system for
query answering over DL-LiteA ontologies, which can work in both “classical” (i.e.,
with a local ABox) and “virtual” mode (i.e., exploiting mappings). Although first ex-
periments show effectiveness of Quest in the classical scenario [10], the development
of its usage in the virtual mode is still ongoing. Finally, we observe that, to the best of
our knowledge, M ASTRO S TUDIO is the only full-fledged ontology-based data man-
agement system which provides, along with OBDA functionalities, advanced features
for documenting and inspecting an OBDA specification.


2     Technical background

We recall here the notions of OBDA specification and OBDA semantics, and survey the
main reasoning services and optimizations offered by M ASTRO S TUDIO. These (opti-
mized) reasoning services are in fact inherited from the M ASTRO reasoner, in which
they are realized, and suitably exposed as web services by M ASTRO S TUDIO. For these
reasons, in the rest of this section we refer directly to the Mastro reasoner.
OBDA specification. In M ASTRO, an OBDA specification is a triple hO, M, Di, where
O is an ontology, D is a relational database instance, and M is the mapping between
O and D. More precisely, O is specified in a logic of the DL-Lite family of lightweight
Description Logics (DLs) [4, 5]. DLs are decidable fragments of first-order logic (FOL)
that allow to represent the domain of interest in terms of concepts, denoting sets of ob-
jects, roles, denoting binary relations between objects, and attributes, denoting relations
between objects and values from predefined domains. DLs of the DL-Lite family have
been specifically designed for OBDA and allow for a good tradeoff between the expres-
sive power of the language and the computational complexity of reasoning. Notably,
query answering in such DLs can be done in LOGSPACE with respect to data complex-
ity. DL-Lite logics essentially capture standard conceptual modeling formalisms, such
as UML Class Diagrams and Entity-Relationship Schemas, and are at the basis of OWL
2 QL, one of the tractable profiles of OWL 2, the current W3C standard language for
ontologies2 . M is a set of assertions of the form Φ          ψ, where Φ is an SQL query
specified over the schema of D, and ψ is an element of the ontology O, i.e., a concept,
a role, or an attribute (see also [9]). Intuitively, such a mapping assertion specifies that
the tuples returned by the query Φ are used to generate the facts that instantiate ψ. M
is therefore a GAV mapping, according to the data integration terminology [7].
OBDA semantics. The semantics of an OBDA specification is given in terms of FOL
interpretations. A FOL interpretation I is a model for an ontology O if it satisfies (in
the classical FOL sense) all logical axioms specified in O [4]. Then, given an OBDA
specification B = hO, M, Di, a FOL interpretation I is a model for B if (i) I is a model
for O, and (ii) I satisfies M, i.e., for each mapping assertion Φ       ψ and each tuple
t in the evaluation of Φ over D, I satisfies the fact ψ(t) (see also [9]). Notice that the
 2
     http://www.w3.org/TR/owl-profiles/
above notion of mapping satisfaction corresponds to the classical notion of satisfaction
of sound GAV mapping in data integration [7]. An OBDA B is satisfiable if B admits
at least one model.
Reasoning in Mastro. Reasoning services that do not consider data are called inten-
sional. Among these services, M ASTRO S TUDIO allows for the computation of all sub-
sumption relationships inferred in an ontology between concepts, roles, and attributes.
This, in particular, enables the construction of the classification tree of the ontology [2].
     The main task involving data performed by M ASTRO is to answer (unions of)
conjunctive queries ((U)CQs) posed over the ontology O of an OBDA system B =
hO, M, Di. Answering one such query Q amounts to computing its certain answers,
denoted CertAns(Q, B), i.e., the tuples that are in the interpretation of Q in every model
of B (the FOL interpretation of a UCQ is the standard one [1])3 .
     In M ASTRO, certain answers to queries are computed through a query rewriting
process. The basic notion underlying this approach is the one of perfect rewriting: a
query QDB over D is a perfect rewriting of a query Q under B if the evaluation of
QDB over D returns the set CertAns(Q, B). The perfect rewriting of a UCQ Q posed
over O can be obtained in two steps: (i) compute an ontology-rewriting Q0 of Q with
respect to the ontology O; (ii) compute the mapping-rewriting of Q0 by using the map-
ping M, thus obtaining an SQL query on D. Intuitively, an ontology-rewriting of Q is
another query Q0 , expressed over O, which incorporates all the relevant properties of
the ontology axioms, so that, by using Q0 , we can compute the certain answers of Q
by ignoring O, i.e., CertAns(Q, hO, M, Di) = CertAns(Q0 , h∅, M, Di). This step is
realized in M ASTRO through the algorithm Presto [11]), which rewrites Q into a new
UCQ Q0 over O. Then, the mapping-rewriting step can be seen as a variant of the un-
folding procedure in GAV data integration, as it essentially substitutes each atom in the
query Q0 with the SQL query that the mapping associates to the atom predicate. After
the rewriting process, the query is fully expressed in SQL and can be directly evaluated
over the sources.
     We notice also that checking ontology satisfiability in DL-Lite can be reduced to
query answering. In particular, to each ontology axiom we can associate a query aiming
at identifying the existence of counterexamples, i.e., data violating such axiom (e.g,
data contradicting axioms imposing disjointness of concepts or functionality of roles).
This is indeed the way ontology satisfiability is realized in M ASTRO.
Optimizations. The perfect rewriting produced as described above is a union of SQL
queries which may often contain a huge number of disjuncts. This is mainly due to
the mapping-rewriting step, which combines in all possible ways the various mapping
queries associated to each atom predicate, and this may very well produce a final SQL
query whose size is exponential with respect to the size of the initial query and the size
of the mappings [6]. However, in general, not all such disjuncts really contribute to the
computation of the certain answers (for example, because a disjunct is contained into
another). We developed in the M ASTRO reasoner a mechanism that is able to prune the
rewriting and produce another perfect rewriting of smaller size. The adoption of this
technique by M ASTRO allows to reduce the evaluation time of the final rewriting.
 3
     In fact, M ASTRO even allows for processing more expressive queries interpreted under a se-
     mantics that approximate standard FOL semantics (see [3] for details).
          Ontology
                                               Ontology
                                                                         Mastro
          (GraphML                                            Mappings                     Consistency Checking
          Syntax)
                            Translator        (OWL Syntax)

                                                                           Query Rewriting
                                                                            Ontology          Qr          Mapping
                                                                            Rewriting                     Rewriting
                     GUI
                     Inspection Environment   Reasoning Environment
                                                                                  Perfect Mapping Management


                                                                                  Intensional Reasoning

                                                           Data
                                                          Sources
                       Input flow                                                          MASTRO STUDIO
                       Software component
                       invocation


                              Fig. 1. The M ASTRO S TUDIO system architecture


    Also, to further optimize the rewriting process, M ASTRO allows for the use of so-
called perfect mapping assertions. Given an OBDA specification B, a perfect mapping
assertion is a pair hcq, cqDB i such that cq is a conjunctive query and cqDB is a perfect
rewriting of q under B. Perfect mapping assertions of the above form can be used dur-
ing both the ontology-rewriting and mapping-rewriting steps in the following way: if
a conjunctive query q to be rewritten contains a subquery cq, M ASTRO substitutes cq
with cqDB (modulo some variable unification), and makes the rewriting process to con-
tinue only on the remaining part of q. It can be shown that this is a drastic optimization
allowing to heavily reduce the size of the perfect rewritings.
    Notice also that perfect mappings may be obtained by simply storing the perfect
rewritings computed by M ASTRO itself. In other words, the set of perfect mappings can
be considered as a memory of the previous perfect rewritings, suitably pruned according
to the first optimization described above. For further details, see [6].


3   The M ASTRO S TUDIO system

The base principle adopted in the design of M ASTRO S TUDIO is to provide a seam-
less access to the ontology description and the reasoning services over it. The M ASTRO
S TUDIO GUI is web-based and is realized through the Drupal open source CMS (Con-
tent Management System). Via the GUI, the user can access two different environments:
the Inspection Environment and the Reasoning Environment. The first environment pro-
vides the user with functionalities for easily inspecting all the OBDA system compo-
nents, whereas the second one allows for invoking various reasoning services, and is
therefore tightly coupled with the underlying reasoner (cf. Figure 1).
GUI inspection environment. This environment allows for three main functionalities,
each realized by a specific component: ontology inspection, mapping inspection and
data source inspection. Ontology inspection enables in-depth ontology navigation by
means of the visualization of both a graphical representation of the ontology and its
specification through the OWL functional syntax4 , as well as the provision of hypertex-
tual descriptions of ontology elements, organized in the form of a wiki.
    The graphical representation of the ontology provided by M ASTRO S TUDIO has a
graph-like structure, similar to that of an Entity-Relationship diagram. It allows for a
gentle inspection of the ontology, accessible also to non-experts of logical and ontology
formalisms. The ontology graph is encoded into GraphML5 , a standard XML-based
graph exchange format. Such encoding, besides being used as input to the inspection
environment for visualization of the ontology diagram, is transformed into OWL func-
tional syntax through the Translator module (cf. Figure 1), which generates the corre-
sponding DL-Lite axioms specified through the standard OWL functional syntax.
    The ontology inspection component also contains wiki-like documentation of the
ontology, provided through the use of contributed Drupal modules such as Wikitools,
Freelinking, and Flexifilter6 . Moreover, a custom module has been added to the Dru-
pal core in order to automatically generate the wiki pages associated to the ontology.
Starting from an ontology, the module allows to create a wiki page for each concept,
role and attribute, according to a predefined template that includes some asserted in-
formation, as well as axioms and mappings that are related to the element documented
in the page. These pages are stored through the CMS and can be manually edited by
the user in order to enrich the documentation with human-friendly information such as
textual descriptions. The documentation can be inspected through a tree-menu that rep-
resents the hierarchies of concepts, roles and attributes as asserted in the ontology. The
Mapping inspection and the Data Source inspection components provide the ability to
inspect respectively the mapping assertions and the data sources. In particular the latter
allows the user to visualize the structure of the source relations and also to pose direct
SQL queries over it.
GUI reasoning environment. The second environment is structured on the basis of the
main reasoning services provided by M ASTRO S TUDIO. In particular, it enables for in-
voking intensional reasoning, ontology satisfiability, and query answering services. As
for intensional reasoning, the user is provided with the visualization of all subsump-
tions inferred by the ontology, relying on the underlying intensional reasoning mod-
ule (cf. Figure 1). Concerning ontology satisfiability, the user can get an indication of
which axioms are contradicted by source data, and a preview of such data (counterex-
amples). Furthermore, the environment allows to specify queries (in SPARQL syntax)
over the ontology and to visualize their certain answers returned by the reasoner. On
user demand, details of the rewriting process, such as the ontology-rewriting and the
mapping-rewriting, can be shown.
M ASTRO reasoner. It is constituted by three main modules, i.e., the query rewriting,
the consistency checking, and the intensional reasoning module.
    The query rewriting module realizes the query rewriting process and its optimiza-
tions described in Section 2. In particular, the ontology rewriting sub-module receives
the user query Q from the GUI and the OWL syntax specification of the ontology as
inputs and produces Qr , which is the ontology-rewriting of Q. Qr is then passed to
 4
   http://www.w3.org/TR/owl-profiles/
 5
   http://graphml.graphdrawing.org/
 6
   http://drupal.org/project/
the mapping rewriting sub-module (cf. Figure 1), which takes as input also the map-
ping specification and computes the perfect rewriting Q0SQL of Q. The module is also
in charge of pruning Q0SQL according to the first optimization described in Section 2.
The resulting query Q00SQL is then sent to the underlying DBMS for evaluation. Fur-
thermore, it is also passed to the perfect mapping manager sub-module, which stores
(subject to user confirmation) the perfect mapping hQ, Q00SQL i. Such module feeds both
the ontology and the mapping rewriting modules, that can make use of perfect mapping
assertions to “freeze” portions of the query to be rewritten (cf. Section 2).
    The intensional reasoning module realizes the intensional reasoning tasks described
in the previous section. Besides providing its result to the reasoning environment of the
GUI, it also gives the computed subsumptions as input to the query rewriting module
since such subsumptions are needed for the execution of the Presto algorithm [11].
    Finally, the consistency checking module realizes the ontology satisfiability method
sketched in the previous section, verifying the consistency of each ontology axiom by
producing the associated query and sending it to the query rewriting module for refor-
mulation and evaluation. The results are returned to the GUI reasoning environment.

References
 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publ. Co.,
    1995.
 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The
    Description Logic Handbook: Theory, Implementation and Applications. Cambridge Uni-
    versity Press, 2nd edition, 2007.
 3. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. EQL-Lite: Effective
    first-order query processing in description logics. In Proc. of IJCAI 2007, pages 274–279,
    2007.
 4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning, 39(3):385–429, 2007.
 5. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo.
    Mastro: A reasoner for effective ontology-based data access. In Proc. of ORE-2012, volume
    858 of CEUR, ceur-ws.org, 2012.
 6. F. Di Pinto, D. Lembo, M. Lenzerini, R. Mancini, A. Poggi, R. Rosati, M. Ruzzi, and D. F.
    Savo. Optimizing query rewriting in ontology-based data access. In Proc. of EDBT 2013,
    2013.
 7. M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS 2002, pages
    233–246, 2002.
 8. H. Pérez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for
    DL-lite. In Proc. of DL 2009, volume 477 of CEUR, ceur-ws.org, 2009.
 9. A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking
    data to ontologies. J. on Data Semantics, X:133–173, 2008.
10. M. Rodriguez-Muro and D. Calvanese. High performance query answering over DL-Lite
    ontologies. In Proc. of KR 2012, pages 308–318, 2012.
11. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. In Proc.
    of KR 2010, pages 290–300, 2010.
12. T. Venetis, G. Stoilos, and G. B. Stamou. Incremental query rewriting for OWL 2 QL. In
    Proc. of DL 2012, 2012.