Reducing global consistency to local consistency in Ontology-based Data Access Marco Console, Maurizio Lenzerini Dipartimento di Ing. Informatica, Automatica e Gestionale “Antonio Ruberti” S APIENZA Università di Roma Via Ariosto 25, I-00186 Roma, Italy {console,lenzerini}@dis.uniroma1.it 1 Introduction Ontology-based data access (OBDA) is a paradigm aiming at accessing and managing the data of an information system by means of an ontology [6]. An OBDA system is constituted by an OBDA specification, representing its intensional level, and one or more data sources, representing the extensional one. Depending on the relation the specification shares with the information system, we can divide OBDA systems into two main branches: (1) simple, if the information system is specifically designed to store the ontology instances, or (2) composite, if the information system is constituted by pre-existing data sources, that are not under the control of the OBDA modeler. In this paper we address the latter scenario, and assume that data sources are managed by a relational Data Base Management System (DBMS). Most of the research on OBDA has concentrated on making query answering ef- ficient. However, query answering is not the only service that an OBDA system must provide. Another crucial service is consistency checking. Current approaches to this problem involves executing expensive queries at run-time. Here, we address a funda- mental problem for OBDA system: given an OBDA specification, can we avoid the consistency check on the whole OBDA system (global consistency check), and rely instead on the constraint checking carried out by the DBMS on the data source (local consistency checking)? If this is the case, whenever the DBMS accepts a database at the source, we know that its data are consistent with the OBDA system. In other words, we know that we can reduce global consistency to local consistency. In the next sections, we present a formal framework for defining global and local consistency in OBDA systems, characterizing their relationship. We actually split this relationship in two parts, that we call protection and faithfulness. Intuitively, a source schema is faithful to an OBDA system if it does not block any data consistent with the ontology, and protects an OBDA system from inconsistency if its integrity constraints block every data that are in conflict with the ontology. By using these two notions, we present an algorithm for checking whether we can indeed reduce global consistency to local consistency in a relevant class of OBDA systems. 2 Ontology based data access We consider relational databases, and refer the reader to [1] for a more detailed account of databases. A schema S is a pair hΣS , CS i, where ΣS is the alphabet of S, and CS is the set of integrity constraints of S, which are rules that each database conforming to the schema must obey. A database for S, or simply a ΣS -database, is a finite set of ground atoms over the predicates in ΣS and the constants in an alphabet Γ , subject to the unique name assumption. A ΣS -database D is legal for S, written D |= S, if satisfies all the integrity constraints in CS , written D |= CS . An ontology is a conceptualization of a domain of interest expressed in terms of a formal language. Here, we consider logic-based languages, and, more specifically, Description Logics (DLs) [2]. An OBDA specification provides the characteristics of the three basic components of the system, as specified by the following definition. Definition 1. An OBDA specification B is a triple hT , M, Si, where – T is a TBox, called the ontology of B. – S = hΣS , CS i is a database schema, called the source schema of B; – M is a finite set of mapping assertions [4, 5] between S and T , called the mapping of B. Pairing an OBDA specification B = hT , M, Si with a ΣS -database D, we obtain an OBDA system. We define the semantics of an OBDA system by specifying which are the models of B relatively to D, denoted by M odD (B). Definition 2. Let B = hT , M, Si be an OBDA specification, and let D be a ΣS - database. Then M odD (B) = { I | I |= T , (D, I) |= M, and D |= CS }. Checking whether an OBDA system, constituted by B and D, is satisfiable amounts to checking whether M odD (B) 6= ∅. In practice, the system is managed by suitable software components, including a database management system ensuring that D |= CS . 3 Framework for global and local consistency We begin our analysis of global and local consistency with the formal definition of these two notions. Definition 3. Let B = hT , M, hΣS , CS ii be an OBDA specification, and let D be a ΣS -database. Then the OBDA system constituted by B and D is said to be locally con- sistent if D |= CS , whereas is said to be globally consistent if M odD (hT , M, hΣS , ∅ii) 6= ∅, The above definition captures the idea that, while the domain ontology T forms the intensional level of the whole system, the database D together with M determines its extensional level. The schema S is simply the structure designed for accommodating the data stored at the source, but it does not really contribute to the semantics of the OBDA system. So global consistency is indeed different from checking the satisfiability of the whole B, while local consistency merely means that the database D is legal with respect to the source schema. Further, global consistency of B and D can be reduced to local consistency exactly when, for all ΣS -databases D, M odD (hT , M, hΣS , ∅ii) 6= ∅ is equivalent to D |= CS . We actually split this notion in two parts, corresponding to the two parts of the equivalence, and we call such parts protection and faithfulness, respectively. Definition 4. Let B = hT , M, Si be an OBDA specification, where S = hΣS , CS i. Then, S is said to protect T and M from inconsistency if for all ΣS -database D such that M odD (hT , M, hΣS , ∅ii) = ∅, we have that D 6|= CS . Intuitively, the schema S protects B from inconsistency whenever its constraints block every database which would break global consistency. Definition 5. Let B = hT , M, Si be an OBDA specification, where S = hΣS , CS i. Then, S is said to be faithful to T and M in B if for all ΣS -database D such that M odD (hT , M, hΣS , ∅ii) 6= ∅, we have that D |= CS . Intuitively, the schema S is faithful to B if it does not constrain the source in such a way to filter out data that would not cause the OBDA system to fall into inconsistency. Theorem 1. Let B = hT , M, hΣS , CS ii be an OBDA specification. Then S is faithful to B if and only if M odD (hT , M, hΣS , CS ii) = M odD (hT , M, hΣS , ∅ii). The two notions of protection and faithfulness give raise to two decision problems, namely check whether S protects B (Protection) and check whether S is faithful to B (Faithfulness). 4 Results Unfortunately, even for OBDA specifications having decidable query answering pro- cedures, the decision problems associated to protection and faithfulness are both un- decidable. In recent studies, we discovered cases in which an algorithm for solving those problems actually exists. In particular, in one relevant scenario, we restricted the TBox to be expressed in the DL-LiteR fragment (see [3, 7]), the mapping language to be GLAV-based, (see [4, 5]), with both the head and the body of each mapping assertion being conjunctive queries, and the source schemata to be expressed in terms of the re- lational model with key, foreign key and denial constraints. Note that this combination of languages allows us to capture a large amount of real world scenarios. Relying on the finite controllability of query answering under keys and foreign keys (see [8]), we were able to prove the following. Theorem 2. Protection can be solved in PTIME with respect to T and M, and in NP with respect to S. Theorem 3. Faithfulness can be solved in PTIME with respect to S and T , and in NP with respect to M. We plan to continue our investigation at considering the case of OBDA systems where the source schema contains constraints that do not fall into the class of constraints studied here, or where the DLs used for expressing the ontology goes beyond DL-LiteR . Acknowledgements: Work partially supported by the EU under FP7, project Optique (Scalable End-user Access to Big Data), grant n. FP7-318338. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley Publ. Co. (1995) 2. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.): The De- scription Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press (2010), paperback edition 3. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: DL-Lite: Tractable description logics for ontologies. In: Proc. of AAAI 2005. pp. 602–607 (2005) 4. Halevy, A.Y.: Answering queries using views: A survey. VLDB Journal 10(4), 270–294 (2001) 5. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of PODS 2002. pp. 233– 246 (2002) 6. Lenzerini, M.: Ontology-based data management. In: Proc. of CIKM 2011. pp. 5–6 (2011) 7. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data Semantics X, 133–173 (2008) 8. Rosati, R.: On the decidability and finite controllability of query processing in databases with incomplete information. In: Proc. of PODS 2006. pp. 356–365 (2006)