Reducing global consistency to local consistency in
              Ontology-based Data Access

                          Marco Console, Maurizio Lenzerini

        Dipartimento di Ing. Informatica, Automatica e Gestionale “Antonio Ruberti”
                               S APIENZA Università di Roma
                            Via Ariosto 25, I-00186 Roma, Italy
                    {console,lenzerini}@dis.uniroma1.it


1   Introduction


Ontology-based data access (OBDA) is a paradigm aiming at accessing and managing
the data of an information system by means of an ontology [6]. An OBDA system
is constituted by an OBDA specification, representing its intensional level, and one
or more data sources, representing the extensional one. Depending on the relation the
specification shares with the information system, we can divide OBDA systems into
two main branches: (1) simple, if the information system is specifically designed to
store the ontology instances, or (2) composite, if the information system is constituted
by pre-existing data sources, that are not under the control of the OBDA modeler. In
this paper we address the latter scenario, and assume that data sources are managed by
a relational Data Base Management System (DBMS).
     Most of the research on OBDA has concentrated on making query answering ef-
ficient. However, query answering is not the only service that an OBDA system must
provide. Another crucial service is consistency checking. Current approaches to this
problem involves executing expensive queries at run-time. Here, we address a funda-
mental problem for OBDA system: given an OBDA specification, can we avoid the
consistency check on the whole OBDA system (global consistency check), and rely
instead on the constraint checking carried out by the DBMS on the data source (local
consistency checking)? If this is the case, whenever the DBMS accepts a database at
the source, we know that its data are consistent with the OBDA system. In other words,
we know that we can reduce global consistency to local consistency.
     In the next sections, we present a formal framework for defining global and local
consistency in OBDA systems, characterizing their relationship. We actually split this
relationship in two parts, that we call protection and faithfulness. Intuitively, a source
schema is faithful to an OBDA system if it does not block any data consistent with the
ontology, and protects an OBDA system from inconsistency if its integrity constraints
block every data that are in conflict with the ontology. By using these two notions, we
present an algorithm for checking whether we can indeed reduce global consistency to
local consistency in a relevant class of OBDA systems.
2   Ontology based data access
We consider relational databases, and refer the reader to [1] for a more detailed account
of databases. A schema S is a pair hΣS , CS i, where ΣS is the alphabet of S, and CS
is the set of integrity constraints of S, which are rules that each database conforming
to the schema must obey. A database for S, or simply a ΣS -database, is a finite set
of ground atoms over the predicates in ΣS and the constants in an alphabet Γ , subject
to the unique name assumption. A ΣS -database D is legal for S, written D |= S, if
satisfies all the integrity constraints in CS , written D |= CS .
    An ontology is a conceptualization of a domain of interest expressed in terms of
a formal language. Here, we consider logic-based languages, and, more specifically,
Description Logics (DLs) [2].
    An OBDA specification provides the characteristics of the three basic components
of the system, as specified by the following definition.

Definition 1. An OBDA specification B is a triple hT , M, Si, where
 – T is a TBox, called the ontology of B.
 – S = hΣS , CS i is a database schema, called the source schema of B;
 – M is a finite set of mapping assertions [4, 5] between S and T , called the mapping
    of B.
Pairing an OBDA specification B = hT , M, Si with a ΣS -database D, we obtain an
OBDA system. We define the semantics of an OBDA system by specifying which are
the models of B relatively to D, denoted by M odD (B).

Definition 2. Let B = hT , M, Si be an OBDA specification, and let D be a ΣS -
database. Then M odD (B) = { I | I |= T , (D, I) |= M, and D |= CS }.

    Checking whether an OBDA system, constituted by B and D, is satisfiable amounts
to checking whether M odD (B) 6= ∅. In practice, the system is managed by suitable
software components, including a database management system ensuring that D |= CS .


3   Framework for global and local consistency
We begin our analysis of global and local consistency with the formal definition of these
two notions.

 Definition 3. Let B = hT , M, hΣS , CS ii be an OBDA specification, and let D be a
ΣS -database. Then the OBDA system constituted by B and D is said to be locally con-
sistent if D |= CS , whereas is said to be globally consistent if M odD (hT , M, hΣS , ∅ii)
6= ∅,

    The above definition captures the idea that, while the domain ontology T forms the
intensional level of the whole system, the database D together with M determines its
extensional level. The schema S is simply the structure designed for accommodating the
data stored at the source, but it does not really contribute to the semantics of the OBDA
system. So global consistency is indeed different from checking the satisfiability of the
whole B, while local consistency merely means that the database D is legal with respect
to the source schema.
    Further, global consistency of B and D can be reduced to local consistency exactly
when, for all ΣS -databases D, M odD (hT , M, hΣS , ∅ii) 6= ∅ is equivalent to D |=
CS . We actually split this notion in two parts, corresponding to the two parts of the
equivalence, and we call such parts protection and faithfulness, respectively.
Definition 4. Let B = hT , M, Si be an OBDA specification, where S = hΣS , CS i.
Then, S is said to protect T and M from inconsistency if for all ΣS -database D such
that M odD (hT , M, hΣS , ∅ii) = ∅, we have that D 6|= CS .
    Intuitively, the schema S protects B from inconsistency whenever its constraints
block every database which would break global consistency.
Definition 5. Let B = hT , M, Si be an OBDA specification, where S = hΣS , CS i.
Then, S is said to be faithful to T and M in B if for all ΣS -database D such that
M odD (hT , M, hΣS , ∅ii) 6= ∅, we have that D |= CS .
    Intuitively, the schema S is faithful to B if it does not constrain the source in such a
way to filter out data that would not cause the OBDA system to fall into inconsistency.
Theorem 1. Let B = hT , M, hΣS , CS ii be an OBDA specification. Then S is faithful
to B if and only if M odD (hT , M, hΣS , CS ii) = M odD (hT , M, hΣS , ∅ii).
    The two notions of protection and faithfulness give raise to two decision problems,
namely check whether S protects B (Protection) and check whether S is faithful to B
(Faithfulness).

4   Results
Unfortunately, even for OBDA specifications having decidable query answering pro-
cedures, the decision problems associated to protection and faithfulness are both un-
decidable. In recent studies, we discovered cases in which an algorithm for solving
those problems actually exists. In particular, in one relevant scenario, we restricted the
TBox to be expressed in the DL-LiteR fragment (see [3, 7]), the mapping language to
be GLAV-based, (see [4, 5]), with both the head and the body of each mapping assertion
being conjunctive queries, and the source schemata to be expressed in terms of the re-
lational model with key, foreign key and denial constraints. Note that this combination
of languages allows us to capture a large amount of real world scenarios.
    Relying on the finite controllability of query answering under keys and foreign keys
(see [8]), we were able to prove the following.
Theorem 2. Protection can be solved in PTIME with respect to T and M, and in NP
with respect to S.
Theorem 3. Faithfulness can be solved in PTIME with respect to S and T , and in NP
with respect to M.
    We plan to continue our investigation at considering the case of OBDA systems
where the source schema contains constraints that do not fall into the class of constraints
studied here, or where the DLs used for expressing the ontology goes beyond DL-LiteR .
Acknowledgements: Work partially supported by the EU under FP7, project Optique
(Scalable End-user Access to Big Data), grant n. FP7-318338.
References
1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley Publ. Co. (1995)
2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.): The De-
   scription Logic Handbook: Theory, Implementation, and Applications. Cambridge University
   Press (2010), paperback edition
3. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: DL-Lite: Tractable
   description logics for ontologies. In: Proc. of AAAI 2005. pp. 602–607 (2005)
4. Halevy, A.Y.: Answering queries using views: A survey. VLDB Journal 10(4), 270–294 (2001)
5. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of PODS 2002. pp. 233–
   246 (2002)
6. Lenzerini, M.: Ontology-based data management. In: Proc. of CIKM 2011. pp. 5–6 (2011)
7. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data
   to ontologies. J. on Data Semantics X, 133–173 (2008)
8. Rosati, R.: On the decidability and finite controllability of query processing in databases with
   incomplete information. In: Proc. of PODS 2006. pp. 356–365 (2006)