=Paper= {{Paper |id=Vol-1350/paper-33 |storemode=property |title=Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-1350/paper-33.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/LemboMRST15 }} ==Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract)== https://ceur-ws.org/Vol-1350/paper-33.pdf
         Mapping Analysis in Ontology-based Data Access:
         Algorithms and Complexity (Extended Abstract)

                      Domenico Lembo1 , José Mora1 , Riccardo Rosati1 ,
                       Domenico Fabio Savo1 , Evgenij Thorstensen2
     1                                          2
         DIAG, Sapienza Università di Roma         Dept. of Informatics, University of Oslo
         lastname@dis.uniroma1.it                        evgenit@ifi.uio.no



1        Introduction
Ontology-based data access (OBDA) is a recent paradigm for accessing data sources
through an ontology that acts as a conceptual, integrated view of the data, and declara-
tive mappings that connect the ontology to the data sources [13, 6, 14, 5, 2] .
     One important aspect in OBDA concerns the construction of a system specification,
i.e., defining the ontology and the mappings over an existing set of data sources. Map-
pings are indeed the most complex part of an OBDA specification, since they have to
capture the semantics of the data sources and express such semantics in terms of the on-
tology. The first experiences in the application of the OBDA framework in real-world
scenarios (e.g., [2, 9]) have shown that the semantic distance between the conceptual
and the data layer is often very large, because data sources are mostly application-
oriented: this often makes the definition, debugging, and maintenance of mappings a
hard and complex task. Such experiences have clearly shown the need of tools for sup-
porting the management of mappings.
     The recent work [11] has started providing a theoretical basis for mapping manage-
ment support in OBDA, focusing on the formal analysis of mappings in ontology-based
data access. In particular, the two most important semantic anomalies of mappings have
been analyzed: inconsistency and redundancy. Roughly speaking, an inconsistent map-
ping for an ontology and a source schema is a specification that gives rise to logical
contradictions with the ontology and/or the source schema. Then, a mapping M is
redundant with respect to an OBDA specification if adding the mapping M to the spec-
ification does not change its semantics. The work presented in [11] has defined both a
local notion of mapping inconsistency and redundancy, which focuses on single map-
ping assertions, and a global notion, where inconsistency and redundancy is considered
with respect to a whole mapping specification (set of mapping assertions).
     In this paper, we study the computational properties of verifying both local and
global mapping inconsistency and redundancy in an OBDA specification. We consider
a wide range of ontology languages that comprises the description logics underlying
OWL 2 and all its profiles (OWL 2 EL, OWL 2 QL, and OWL 2 RL),1 and examine
mapping languages of different expressiveness (the so-called GAV and GLAV map-
pings [7]) over sources corresponding to relational databases. We provide algorithms
and establish tight complexity bounds for the decision problems associated with both
local and global mapping inconsistency and mapping redundancy, and for both com-
bined complexity and TBox complexity (which only considers the size of the TBox).
 1
     http://www.w3.org/TR/owl2-profiles/
      The outcome of our analysis is twofold:
    – in our framework, it is possible to define modular techniques that are able to reduce
      the analysis of mappings to the composition of standard reasoning tasks over the
      ontology (inconsistency, instance checking, query answering) and over the data
      sources (query answering and containment). This is a non-trivial result, because
      mappings are formulas combining both ontology and data source elements;
    – the above forms of mapping analysis enjoy nice computational properties, in the
      sense that they are not harder than the above mentioned standard reasoning tasks
      over the ontology and the data sources (see Figure 1 and Figure 2) .
According to the above results, in our OBDA framework, the analysis of mappings is
feasible for languages with nice computational properties, like the three OWL profiles.

2     Theoretical background
An OBDA specification is a triple J = hT , S, Mi, where T is a DL TBox, S is
a source schema, and M is a mapping between the two. In this paper, we consider
TBoxes specified through DLs that are the logical basis of the W3C standard OWL and
of its profiles, i.e., SROIQ [8], which underpins OWL 2, DL-LiteR [4], which is the
basis of OWL 2 QL, RL [10], a simplified version of OWL 2 RL, and EL⊥ , a slight
extension of the DL EL [3], which is the basis of OWL 2 EL. The source schema is
assumed to be relational, and we consider both simple schemas, i.e., without integrity
constraints, and FD schemas, i.e., simple schemas with functional dependencies [1].
The mapping is a set of assertions m of the form φ(x) ; ψ(x), where φ(x), called the
body of m, and ψ(x), called the head of m, are conjunctive queries (CQs) over S and
T , respectively. We use head (m) and body(m) to denote the head and the body of m.
     Mappings of the form above are called GLAV, and are the most expressive com-
monly studied mappings [12, 7]. Besides them, we refer also to GLAVBE mappings,
which are GLAV mappings where ψ(x) is a CQ with a bounded number of occur-
rences of existential variables, and to GAV mappings, which are GLAV mappings where
head (m) does not admit existential variables.
     The semantics of an OBDA specification J = hT , S, Mi is given in terms of first-
order interpretations that satisfy both T and M, given a source instance D legal for S,
i.e., an instance for S that satisfies the constraints of S. We denote with Mod (J , D) the
set of models of J w.r.t. D We also say that a mapping assertion m is active on a source
instance D if the evaluation of the query body(m) over D is non-empty. A mapping M
is active on D if all its mapping assertions m ∈ M are active on D.
     Below we recall the definitions given in [11] that formalize the mapping analysis
services that we study in this paper. Given a TBox T , a source schema S, a mapping
assertion m, a mapping M, and an OBDA specification J = hT , S, Mi, we have that
  – m is (locally) inconsistent for hT , Si if m is head-inconsistent for T , i.e., T |=
      ∀x.(¬ψ(x)), or m is body-inconsistent for S, i.e., S |= ∀x.(¬φ(x)).
  – M is globally inconsistent for hT , Si if there does not exist a source instance D
      legal for S such that M is active on D and Mod (J , D) 6= ∅.
  – A mapping M0 is globally redundant for J if, for every source instance D that is
      legal for S, Mod (hT , S, Mi, D) = Mod (hT , S, M ∪ M0 i, D).
    Local mapping redundancy is a special case of global mapping redundancy in which
the mappings M and M0 are both composed of a single assertion.
                                 GAV                         GLAV
            task      DL-LiteR RL EL⊥ SROIQ       DL-LiteR RL EL⊥   SROIQ
         local inc. =NLOGSPACE =P =P =N2EXPTIME =NLOGSPACE =P   =P =N2EXPTIME
        global inc. =NLOGSPACE =P =P =N2EXPTIME =NLOGSPACE =P   =P =N2EXPTIME
         local red. =NLOGSPACE =P =P =N2EXPTIME     =NP    =NP =NP    open
        global red. =NLOGSPACE =P =P =N2EXPTIME     =NP    =NP =NP    open


Fig. 1. TBox compl. of mapping inconsistency and redundancy (for both simple and FD schemas).

                                 GAV                             GLAVBE
        task       DL-LiteR     RL EL⊥ SROIQ         DL-LiteR     RL EL⊥    SROIQ
     local inc. =NLOGSPACE (SI) =P =P =N2EXPTIME =NLOGSPACE (SI)∗ =P∗ =P∗ =N2EXPTIME∗
                                                            ∗
                    =P (FD)                          =P (FD)
    global inc.      =NP        =NP =NP =N2EXPTIME     =NP        =NP =NP =N2EXPTIME
     local red.      =NP        =NP =NP =N2EXPTIME     =NP        =NP =NP     open
    global red.      =NP        =NP =NP =N2EXPTIME     =NP        =NP =NP     open


Fig. 2. Combined complexity of mapping inconsistency and redundancy (SI = simple schemas,
FD = FD schemas). ∗ The result also holds for arbitrary GLAV mappings.


3     Complexity Results
We summarize below our complexity results. We consider both TBox complexity, i.e.,
the complexity computed w.r.t. the size of the TBox only, and combined complexity.
     For both simple and FD schemas, and for both GAV and GLAV mappings, the
TBox complexity of local mapping inconsistency turns out to be the same as the TBox
complexity of ontology inconsistency. As for the combined complexity, simple and FD
schemas behave differently. For simple schemas, it is not necessary to check body-
inconsistency (since there are no constraints in S), and thus the combined complexity
is the same as for mapping head-inconsistency, which in turn is the same as the com-
bined complexity of ontology inconsistency. For FD schemas we further need to check
whether the mapping assertion is body-consistent, which can be done in PTIME. Com-
bining together this result with the above bounds for simple schemas, we obtain the
exact bounds for combined complexity shown in Fig. 2.
     Global inconsistency can be reduced to checking the consistency of an OBDA spec-
ification w.r.t. a (minimal) source database that activates M. In particular we have that
for both simple and FD schemas, for both GAV and GLAV mappings, the TBox com-
plexity of global mapping inconsistency is the same as the TBox complexity of ontol-
ogy inconsistency. As for combined complexity, we devise a non-deterministic algo-
rithm exploiting the above mentioned correspondence of global mapping inconsistency
and OBDA inconsistency. This algorithm allows us to prove that for both simple and
FD schemas, and for both GAV and GLAVBE mappings, it holds that: (i) if the on-
tology language is DL-LiteR , RL, or EL⊥ , then the combined complexity of global
mapping inconsistency is in NP; (ii) if the ontology language is SROIQ, then it is in
N2EXPTIME. We also prove that these bounds are in fact exact.
     As for redundancy, our investigation shows that both local and global redundancy
have the same computational behaviour. The complexity results are obtained with tech-
niques that resemble those used for establishing complexity of global inconsistency. All
our complexity results are reported in the tables in Fig. 1 and Fig. 2.
Acknowledgments. This research has been partially supported by the EU under FP7
Large-scale integrating project Optique (grant n. FP7-318338).
References
 1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison Wes-
    ley Publ. Co., 1995.
 2. Natalia Antonioli, Francesco Castanò, Spartaco Coletta, Stefano Grossi, Domenico Lembo,
    Maurizio Lenzerini, Antonella Poggi, Emanuela Virardi, and Patrizia Castracane. Ontology-
    based data management for the Italian public debt. In Proc. of FOIS 2014, pages 372–385,
    2014.
 3. Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In Proc. of
    IJCAI 2005, pages 364–369, 2005.
 4. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Ric-
    cardo Rosati. Tractable reasoning and efficient query answering in description logics: The
    DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007.
 5. Diego Calvanese, Martin Giese, Peter Haase, Ian Horrocks, Thomas Hubauer, Yannis E.
    Ioannidis, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Herald Kllapi, Johan W. Klüwer,
    Manolis Koubarakis, Steffen Lamparter, Ralf Möller, Christian Neuenstadt, T. Nordtveit,
    Özgür L. Özçep, Mariano Rodriguez-Muro, Mikhail Roshchin, Domenico Fabio Savo,
    Michael Schmidt, Ahmet Soylu, Arild Waaler, and Dmitriy Zheleznyakov. Optique: OBDA
    solution for big data. In Proc. of ESWC 2013 Satellite Events, volume 7955 of LNCS, pages
    293–295. Springer, 2013.
 6. Cristina Civili, Marco Console, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenz-
    erini, Lorenzo Lepore, Riccardo Mancini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi,
    Valerio Santarelli, and Domenico Fabio Savo. MASTRO STUDIO: Managing ontology-
    based data access applications. PVLDB, 6:1314–1317, 2013.
 7. AnHai Doan, Alon Y. Halevy, and Zachary G. Ives. Principles of Data Integration. Morgan
    Kaufmann, 2012.
 8. Ian Horrocks, Oliver Kutz, and Ulrike Sattler. The even more irresistible SROIQ. In Proc.
    of KR 2006, pages 57–67, 2006.
 9. Evgeny Kharlamov, Martin Giese, Ernesto Jimnez-Ruiz, Martin G. Skjveland, Ahmet Soylu,
    Dmitriy Zheleznyakov, Timea Bagosi, Marco Console, Peter Haase, Ian Horrocks, Sarunas
    Marciuska, Christoph Pinkel, Mariano Rodriguez-Muro, Marco Ruzzi, Valerio Santarelli,
    Domenico Fabio Savo, Kunal Sengupta, Michael Schmidt, Evgenij Thorstensen, Johannes
    Trame, and Arild Waaler. Optique 1.0: Semantic access to big data: The case of Norwegian
    petroleum directorate’s factpages. In Proc. of ISWC 2013 Posters & Demos Track, pages
    65–68, 2013.
10. Roman Kontchakov and Michael Zakharyaschev. An introduction to Description Logics and
    query rewriting. In Manolis Koubarakis, Giorgos B. Stamou, Giorgos Stoilos, Ian Horrocks,
    Phokion G. Kolaitis, Georg Lausen, and Gerhard Weikum, editors, RW 2014 Tutorial Lec-
    tures, volume 8714 of LNCS, pages 195–244. Springer, 2014.
11. Domenico Lembo, José Mora, Riccardo Rosati, Domenico Fabio Savo, and Evgenij
    Thorstensen. Towards mapping analysis in ontology-based data access. In Proc. of RR 2014,
    volume 8741 of LNCS, pages 108–123. Springer, 2014.
12. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS 2002,
    pages 233–246, 2002.
13. Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio
    Lenzerini, and Riccardo Rosati. Linking data to ontologies. J. on Data Semantics, X:133–
    173, 2008.
14. Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Zakharyaschev. Ontology-
    based data access: Ontop of databases. In Proc. of ISWC 2013, volume 8218 of LNCS,
    pages 558–573. Springer, 2013.