Probabilistic Ontological Data Exchange with Bayesian Networks Thomas Lukasiewicz1 , Maria Vanina Martinez3 , Livia Predoiu12 , and Gerardo I. Simari3 1 Department of Computer Science, University of Oxford, UK 2 Department of Computer Science, Otto-von-Guericke Universität Magdeburg, Germany 3 Dept. of Comp. Sci. and Eng., Univ. Nacional del Sur and CONICET, Argentina Abstract. We study the problem of exchanging probabilistic data between onto- logy-based probabilistic databases. The probabilities of the probabilistic source databases are compactly encoded via Boolean formulas with the variables ad- hering to the dependencies imposed by a Bayesian network, which are closely related to the management of provenance. For the ontologies and the ontology mappings, we consider different kinds of existential rules from the Datalog+/– family. We provide a complete picture of the computational complexity of the problem of deciding whether there exists a probabilistic (universal) solution for a given probabilistic source database relative to a (probabilistic) ontological data exchange problem. We also analyze the complexity of answering UCQs (unions of conjunctive queries) in this framework. 1 Introduction Large volumes of uncertain data are best modeled, stored, and processed in probabilis- tic databases [22]. Enriching databases with terminological knowledge encoded in on- tologies has recently gained increasing importance in the form of ontology-based data access (OBDA) [21]. A crucial problem in OBDA is to integrate and exchange knowl- edge. Not only in the context of OBDA, but also in the area of the Semantic Web, there are distributed ontologies that we may have to map and integrate to enable query an- swering over them. Here, apart from the uncertainty attached to source databases, there may also be uncertainty regarding the ontology mappings establishing the proper corre- spondence between items in the source ontology and items in the target ontology. This especially happens when the mappings are created automatically. Data exchange [11] is an important theoretical framework used for studying data- interoperability tasks that require data to be transferred from existing databases to a target database that comes with its own (independently created) schema and schema constraints. The expressivity of the data exchange framework goes beyond the classi- cal data integration framework [17]. For the translation, schema mappings are used, which are declarative specifications that describe the relationship between two database schemas. In classical data exchange, we have a source database, a target database, a de- terministic mapping, and deterministic target dependencies. Recently, a framework for probabilistic data exchange [10] has been proposed where the classical data exchange framework based on weakly acyclic existential rules has been extended to consider a probabilistic source database and a probabilistic source-to-target mapping. In this paper, we study an expressive extension of the probabilistic data exchange framework in [10], where the source and the target are ontological knowledge bases, each consisting of a probabilistic database and a deterministic ontology describing terminological knowledge about the data stored in the database. The two ontologies and the mapping between them are expressed via existential rules. Our extension of the data exchange framework is strongly related to exchanging data between incom- plete databases, as proposed in [3], which considers an incomplete deterministic source database in the data exchange problem. However, in that work, the databases are deter- ministic, and the mappings and the target database constraints are full existential rules only. In our complexity analysis in this paper, we consider a host of different classes of existential rules, including some subclasses of full existential rules. In addition, our source is a probabilistic database relative to an underlying ontology. Our work in this paper is also related to the recently proposed knowledge base ex- change framework [2, 1], which allows knowledge to be exchanged between determin- istic DL-LiteRDF S and DL-LiteR ontologies. In this paper, besides considering proba- bilistic source databases, we are also using more expressive ontology languages, since already linear existential rules from the Datalog+/– family are strictly more expressive than the description logics (DLs) DL-LiteX of the DL-Lite family [9] as well as their extensions with n-ary relations DLR-LiteX . Guarded existential rules are sufficiently ex- pressive to model the tractable DL EL [4, 5] (and ELI f [16]). Note that existential rules are also known as tuple-generating dependencies (TGDs) and Datalog+/– rules [7]. The main contributions of this paper are summarized as follows. − We introduce deterministic and probabilistic ontological data exchange problems, where probabilistic knowledge is exchanged between two Bayesian network-based prob- abilistic databases relative to their underlying deterministic ontologies, and the deter- ministic and probabilistic mapping between the two ontologies is defined via determin- istic and probabilistic existential mapping rules, respectively. − We provide an in-depth analysis of the data and combined complexity of deciding the existence of probabilistic (universal) solutions and obtain a (fairly) complete picture of the data complexity, general combined complexity, bounded-arity (ba) combined, and fixed-program combined (fp) complexity for the main sublanguages of the Datalog+/– family. We also delineate some tractable special cases, and provide complexity results for exact UCQ (union of conjunctive queries) answering. − For the complexity analysis, we consider a compact encoding of probabilistic source databases and mappings, which is used in the area of both incomplete and probabilistic databases, and also known as data provenance or data lineage [14, 12, 13, 22]. Here, we consider data provenance for probabilistic data that is structured according to an underlying Bayesian network. 2 Preliminaries We assume infinite sets of constants C, (labeled) nulls N, and regular variables V. A term t is a constant, null, or variable. An atom has the form p(t1 , . . . , tn ), where p is an n-ary predicate, and t1 , . . . , tn are terms. Conjunctions of atoms are often identified with the sets of their atoms. An instance I is a (possibly infinite) set of atoms p(t), where t is a tuple of constants and nulls. A database D is a finite instance that contains only constants. A homomorphism is a substitution h : C ∪ N ∪ V → C ∪ N ∪ V that is the identity on C. We assume familiarity with conjunctive queries (CQs). The answer to a CQ q over an instance I is denoted q(I). A Boolean CQ (BCQ) q evaluates to true over I, denoted I |= q, if q(I) 6= ∅. A tuple-generating dependency (TGD) σ is a first-order formula ∀X ϕ(X) → ∃Y p(X, Y), where X ∪ Y ⊆ V, ϕ(X) is a conjunction of atoms, and p(X, Y) is an atom. We call ϕ(X) the body of σ, denoted body(σ), and p(X, Y) the head of σ, de- noted head (σ). We consider only TGDs with a single atom in the head, but our results can be extended to TGDs with a conjunction of atoms in the head. An instance I satis- fies σ, written I |= σ, if the following holds: whenever there exists a homomorphism h such that h(ϕ(X)) ⊆ I, then there exists h0 ⊇ h|X , where h|X is the restriction of h to X, such that h0 (p(X, Y)) ∈ I. A negative constraint (NC) ν is a first-order formula ∀X ϕ(X) → ⊥, where X ⊆ V, ϕ(X) is a conjunction of atoms, called the body of ν, denoted body(ν), and ⊥ denotes the truth constant false. An instance I satisfies ν, de- noted I |= ν, if there is no homomorphism h such that h(ϕ(X)) ⊆ I. Given a set Σ of TGDs and NCs, I satisfies Σ, denoted I |= Σ, if I satisfies each TGD and NC of Σ. For brevity, we omit the universal quantifiers in front of TGDs and NCs. Given a database D and a set Σ of TGDs and NCs, the answers that we consider are those that are true in all models of D and Σ. Formally, the models of D and Σ, denoted mods(D, Σ), is the set of instances {I | I ⊇ D and I |= Σ}. TheTanswer to a CQ q rel- ative to D and Σ is defined as the set of tuples ans(q, D, Σ) = I∈mods(D,Σ) {t | t ∈ q(I)}. The answer to a BCQ q is true, denoted D ∪ Σ |= q, if ans(q, D, Σ) 6= ∅. The problem of CQ answering is defined as follows: given a database D, a set Σ of TGDs and NCs, a CQ q, and a tuple of constants t, decide whether t ∈ ans(q, D, Σ). Following Vardi’s taxonomy [23], the combined complexity of BCQ answering is cal- culated by considering all the components, i.e., the database, the set of dependencies, and the query, as part of the input. The bounded-arity combined complexity (or sim- ply ba-combined complexity) is calculated by assuming that the arity of the underlying schema is bounded by an integer constant. Notice that in the context of description logics (DLs), whenever we refer to the combined complexity in fact we refer to the ba-combined complexity since, by definition, the arity of the underlying schema is at most two. The fixed-program combined complexity (or simply fp-combined complexity) is calculated by considering the set of TGDs and NCs as fixed. 3 Ontological Data Exchange In this section, we define the notions of deterministic and probabilistic ontological data exchange. The source (resp., target) of the deterministic/probabilistic ontological data exchange problems that we consider in this paper is a probabilistic database (resp., probabilistic instance), each relative to a deterministic ontology. Here, a probabilistic database (resp., probabilistic instance) over a schema S is a probability space P r = (I, µ) such that I is the set of all (possibly infinitelyP many) databases (resp., instances) over S, and µ : I → [0, 1] is a function that satisfies I∈I µ(I) = 1. 3.1 Deterministic Ontological Data Exchange Ontological data exchange formalizes data exchange from a probabilistic database rel- ative to a source ontology Σs (consisting of TGDs and NCs) over a schema S to a probabilistic target instance Prt relative to a target ontology Σt (consisting of a set of TGDs and NCs) over a schema T via a (source-to-target) mapping (also consisting of a set of TGDs and NCs). More specifically, an ontological data exchange (ODE) problem M= (S, T, Σs , Σt , Σst ) consists of (i) a source schema S, (ii) a target schema T disjoint from S, (iii) a finite set Σs of TGDs and NCs over S (called source ontology), (iv) a fi- nite set Σt of TGDs and NCs over T (called target ontology), and (v) a finite set Σst of TGDs and NCs σ over S ∪ T (called (source-to-target) mapping) such that body(σ) and head(σ) are defined over S ∪ T and T, respectively. Ontological data exchange with deterministic databases is based on defining a tar- get instance J over T as being a solution for a deterministic source database I over S relative to an ODE problem M = (S, T, Σs , Σt , Σst ), if (I ∪ J) |= Σs ∪ Σt ∪ Σst . We denote by SolM the set of all such pairs (I, J). Among the possible deterministic solu- tions J to a deterministic source database I relative to M in SolM , we prefer universal solutions, which are the most general ones carrying only the necessary information for data exchange, i.e., those that transfer only the source database along with the relevant implicit derivations via Σs to the target ontology. A universal solution can be homo- morphically mapped to all other solutions leaving the constants unchanged. Hence, a deterministic target instance J over S is a universal solution for a deterministic source database I over T relative to a schema mapping M, if (i) J is a solution, and (ii) for each solution J 0 for I relative to M, there is a homomorphism h : J → J 0 . We denote by USol M (⊆ SolM ) the set of all pairs (I, J) of deterministic source databases I and target instances J such that J is a universal solution for I relative to M. When considering probabilistic databases and instances, a joint probability space Pr over the solution relation SolM and the universal solution relation USolM must exist. More specifically, a probabilistic target instance Prt = (J , µt ) is a probabilistic solution (resp., probabilistic universal solution) for a probabilistic source database Prs = (I, µs ) relative to an ODE problem M = (S, T, Σs , Σt , Σst ), if there exists a probability space Pr = (I × J , µ) such thatP(i) the left and right marginals of Pr are PrP s and Prt , respec- tively, i.e., (i.a) µs (I) = J∈J µ(I, J) for all I ∈ I, (i.b) µt (J) = I∈I µ(I, J) for all J ∈ J ; and (ii) µ(I, J) = 0 for all (I, J) 6∈ SolM (resp., (I, J) 6∈ USolM ). Note that this intuitively says that all non-solutions (I, J) have probability zero and the existence of a solution does not exclude that some source databases with probability zero have no corresponding target instance. Example 1. An ontological data exchange (ODE) problem M = (S, T, Σs , Σt , Σst ) is given by the source schema S = {Researcher/2, ResearchArea/2, Publication/3} (the number after each predicate denotes its arity), the target schema T = {UResearch- Area/3, Lecturer/2}, the source ontology Σs = {σs , νs }, the target ontology Σt = {σt , νt }, and the mapping Σst = {σst , νm }, where: σs : Publication(X, Y, Z) → ResearchArea(X, Y), νs : Researcher(X, Y) ∧ ResearchArea(X, Y) → ⊥, σt : UResearchArea(U, D, T) → ∃Z Lecturer(T, Z), νt : Lecturer(X, Y) ∧ Lecturer(Y, X) → ⊥, Possible source database facts ra Researcher(Alice, UnivOx) Derived source database facts rp Researcher(Paul, UnivOx) aaml ResearchArea(Alice, ML) paml Publication(Alice, ML, JMLR) aadb ResearchArea(Alice, DB) padb Publication(Alice, DB, TODS) apdb ResearchArea(Paul, DB) ppdb Publication(Paul, DB, TODS) apai ResearchArea(Paul, AI) ppai Publication(Paul, AI, AIJ) Possible target instance facts Probabilistic source database Prs = (I, µs ) uml UResearchArea(UnivOx, N1 , ML) I1 = {ra ,rp ,paml ,ppdb ,aaml ,apdb } 0.5 uai UResearchArea(UnivOx, N2 , AI) I2 = {ra ,rp ,paml ,ppai ,aaml ,apai } 0.2 udb UResearchArea(UnivOx, N3 , DB) I3 = {ra ,rp ,padb ,ppai ,aadb ,apai } 0.15 lml Lecturer(ML, N4 ) I4 = {ra ,rp ,padb ,ppdb ,aadb ,apdb } 0.075 lai Lecturer(AI, N5 ) I5 = {ra ,padb ,aadb } 0.075 ldb Lecturer(DB, N6 ) Probabilistic target instance Prt1 = (J1 , µt1 ) Probabilistic target instance Prt2 = (J2 , µt2 ) J1 = {uml ,udb ,lml ,ldb } 0.5 J5 = {uml ,udb ,lml ,ldb } 0.55 J2 = {uml ,uai ,lml ,lai } 0.2 J6 = {uml ,uai ,lml ,lai } 0.1 J3 = {uai ,udb ,lai ,ldb } 0.15 J7 = {uml ,uai ,udb ,lml ,lai ,ldb } 0.35 J4 = {udb ,ldb } 0.15 Table 1. Probabilistic source database and two probabilistic target instances for Example 1 (N1 , . . . , N6 are nulls); both are probabilistic solutions, but only Prt1 is universal. Fig. 1. Probabilistic universal solution Prt1 . Fig. 2. Probabilistic solution Prt2 . σst : ResearchArea(N, T) ∧ Researcher(N, U) → ∃D UResearchArea(U, D, T), νst : ResearchArea(N, T) ∧ UResearchArea(U, T, N) → ⊥. Given the probabilistic source database in Table 1, two probabilistic instances Prt1 = (J1 , µt1 ) and Prt2 = (J2 , µt2 ) that are probabilistic solutions are shown in Table 1. Note that only Prt1 is also a probabilistic universal solution. Note also that Figures 1 and 2 show the probability spaces over Prt1 and Prt2 , respectively.  Query answering in ontological data exchange is performed over the target ontology and is generalized from deterministic data exchange. A union of conjunctive queries (or Wk UCQ) has the form q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), where each ∃Yi Φi (X, Yi , Ci ) with i ∈ {1, . . . , k} is a CQ with exactly the variables X and Yi , and the constants Ci . Given an ODE problem M = (S, T, Σs , Σt , Σst ), probabilistic source database Wk Prs = (I, µs ), UCQ q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), and tuple t (a ground instance of X in q) over C, the confidence of t relative to q, denoted conf q (t), in Prs relative to M is the infimum of Prt (q(t)) subject to all probabilistic solutions Prt for Prs relative to M. Here, Prt (q(t)) for Prt = (J , µt ) is the sum of all µt (J) such that q(t) evaluates to true in the instance J ∈ J (i.e., some BCQ ∃Yi Φi (t, Yi , Ci ) with i ∈ {1, . . . , k} evaluates to true in J). Example 2. Consider again the setting of Example 1, and let q be a UCQ of a stu- dent who wants to know whether she can study either machine learning or artificial intelligence at the University of Oxford: q() = ∃X, Z(Lecturer(AI, X) ∧ UResearch- Area(UnivOx, Z, AI)) ∨ ∃X, Z(Lecturer(ML, X) ∧ UResearchArea(UnivOx, Z, ML)). Then, q yields the probabilities 0.85 and 1 on Prt1 and Prt2 , respectively.  3.2 Probabilistic Ontological Data Exchange Probabilistic ontological data exchange extends deterministic ontological data exchange by turning the deterministic source-to-target mapping into a probabilistic source-to- target mapping, i.e., we have a probability distribution over the set of all subsets of Σst . More specifically, a probabilistic ontological data exchange (PODE) problem M = (S, T, Σs , Σt , Σst , µst ) consists of (i) a source schema S, (ii) a target schema T dis- joint from S, (iii) a finite set Σs of TGDs and NCs over S (called source ontology), (iv) a finite set Σt of TGDs and NCs over T (called target ontology), (v) a finite set Σst Pst of TGDs and0 NCs σ over S ∪ T, and (vi) a function µst : 2 Σ → [0, 1] such that 0 Σ ⊆Σst µ st (Σ ) = 1 (called probabilistic (source-to-target) mapping). A probabilistic target instance Prt = (J , µt ) is a probabilistic solution (resp., prob- abilistic universal solution) for a probabilistic source database Prs = (I, µs ) relative to a PODE problem M = (S, T, Σs , Σt , Σst , µst ), if there exists a probability space Pr = = (I ×J ×2Σ P , µ) such that: (i) the three st marginals of µ are µs , µt ,P and µst , such that: (i.a) µs (I) = J∈J , Σ 0 ⊆Σst µ(I, J, Σ 0 ) for all I ∈ I, (i.b) µt (J) = I∈I, Σ 0 ⊆Σst µ(I, J, Σ 0 ) for all J ∈ J , and (i.c) µst (Σ 0 ) = I∈I, J∈J µ(I, J, Σ 0 ) for all Σ 0 ⊆ Σst ; and P (ii) µ(I, J, Σ 0 ) = 0 for all (I, J) 6∈ Sol (S,T,Σ 0 ) (resp., (I, J) 6∈ USol (S,T,Σ 0 ) ). Using probabilistic (universal) solutions for probabilistic source databases relative to PODE problems, the semantics of UCQs is lifted to PODE problems as follows. Given a PODE problem M = (S, T, Σs , Σt , Σst , µst ), a probabilistic source database Wk Prs = (I, µs ), a UCQ q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), and a tuple t (a ground in- stance of X in q) over C, the confidence of t relative to q, denoted conf q (t), in Prs rel- ative to M is the infimum of Prt (q(t)) subject to all probabilistic solutions Prt for Prs relative to M. Here, Prt (q(t)) for Prt = (J , µt ) is the sum of all µt (J) such that q(t) evaluates to true in the instance J ∈ J . 3.3 Compact Encoding We use a compact encoding of both probabilistic databases and probabilistic map- pings, which is based on annotating facts, TGDs, and NCs by probabilistic events in a Bayesian network, rather than explicitly specifying the whole probability space. We first define annotations and annotated atoms. Let e1 , . . . , en be n ≥ 1 elemen- tary events. A world w is a conjunction `1 ∧· · ·∧`n , where each `i , i ∈ {1, . . . , n}, is ei- ther the elementary event ei or its negation ¬ei . An annotation λ is any Boolean combi- nation of elementary events (i.e., all elementary events are annotations, and if λ1 and λ2 Possible source database facts Annotation ra Researcher(Alice, UnivOx) true rp Researcher(Paul, UnivOx) e1 ∨ e2 ∨ e3 ∨ e4 paml Publication(Alice, ML, JMLR) e1 ∨ e2 padb Publication(Alice, DB, TODS) ¬ e1 ∧ ¬ e2 ppdb Publication(Paul, DB, TODS) e1 ∨ (¬ e2 ∧ ¬ e3 ∧ e4 ) ppai Publication(Paul, AI, AIJ) (¬ e1 ∧ e2 ) ∨ (¬ e1 ∧ e3 ) Table 2. Annotation encoding of the probabilistic source database in Table 1. Table 3. Bayesian network over the elementary events. are annotations, then also ¬λ1 and λ1 ∧ λ2 ). An annotated atom has the form a : λ, where a is an atom, and λ is an annotation. The compact encoding of probabilistic databases can then be defined as follows. Note that this encoding is also underlying our complexity analysis in Section 4. A set A of annotated atoms along with a probability µ(w) ∈ [0, 1] for every world w compactly encodes a probabilistic database P r = (I, µ) whenever: (i) the probability µ of ev- ery annotation λ is the sum of the probabilities of all worlds in which λ is true, and (ii) the probability µ of every subset-maximal database {a1 , . . . , am } ∈ I 4 such that {a1 : λ1 , . . . , am : λm } ⊆ A for some annotations λ1 , . . . , λm is the probability µ of λ1 ∧ · · · ∧ λm (and the probability µ of every other database in I is 0). We assume that the probability distributions for the underlying events are given by a Bayesian network, which is usually used for compactly specifying a joint probability space, encoding also a certain causal structure between the variables. The following example in Tables 2 and 3 illustrates the compact encoding of probabilistic source data- bases via Boolean annotations relative to an underlying Bayesian network. If the mapping is probabilistic as well, then we use two disjoint sets of elementary events, one for encoding the probabilistic source database and the other one for the mapping. In this way, the probabilistic source database is independent from the proba- bilistic mapping. We now define the compact encoding of probabilistic mappings. An annotated TGD (resp., NC) has the form σ : λ, where σ is a TGD (resp., NC), and λ is an annotation. A set Σ of annotated TGDs and NCs σ : λ with σ ∈ Σst along with a probability µ(w) ∈ [0, 1] for every world w compactly encodes a probabilistic map- pings µst : 2Σst → [0, 1] whenever (i) the probability µ of every annotation λ is the sum of the probabilities of all worlds in which λ is true, and (ii) the probability µst of every 4 That is, we do not consider subsets of the databases here. subset-maximal {σ1 , . . . , σk } ⊆ Σst such that {σ1 : λ1 , . . . , σk : λk } ⊆ Σ for some annotations λ1 , . . . , λk is the probability µ of λ1 ∧ · · · ∧ λk (and the probability µst of every other subset of Σst is 0). 3.4 Computational Problems We consider the following computational problems: Existence of a solution (resp., universal solution): Given an ODE or a PODE prob- lem M and a probabilistic source database Prs , decide whether there exists a prob- abilistic (resp., probabilistic universal) solution for Prs relative to M. Answering UCQs: Given an ODE or a PODE problem M, a probabilistic source data- base Prs , a UCQ q(X), and a tuple t over C, compute conf Q (t) in Prs w.r.t. M. 4 Computational Complexity We now analyze the computational complexity of deciding the existence of a (univer- sal) probabilistic solution for deterministic and probabilistic ontological data exchange problems. We also delineate some tractable special cases, and we provide some com- plexity results for exact UCQ answering for ODE and PODE problems. We assume some elementary background in complexity theory [15, 20]. We now briefly recall the complexity classes that we encounter in our complexity results. The complexity classes PSPACE (resp., P, EXP, 2EXP) contain all decision problems that can be solved in polynomial space (resp., polynomial, exponential, double exponential time) on a deterministic Turing machine, while the complexity classes NP and NEXP contain all decision problems that can be solved in polynomial and exponential time on a nondeterministic Turing machine, respectively; coNP and coNEXP are their com- plementary classes, where “Yes” and “No” instances are interchanged. The complexity class AC0 is the class of all languages that are decidable by uniform families of Boolean circuits of polynomial size and constant depth. The inclusion relationships among the above (decision) complexity classes (all currently believed to be strict) are as follows: 0 AC ⊆ P ⊆ NP, coNP ⊆ PSPACE ⊆ EXP ⊆ NEXP, coNEXP ⊆ 2EXP The (function) complexity class #P is the set of all functions that are computable by a polynomial-time nondeterministic Turing machine whose output for a given input string I is the number of accepting computations for I. 4.1 Decidability Paradigms The main (syntactic) conditions on TGDs that guarantee the decidability of CQ answer- ing are guardedness [6], stickiness [8], and acyclicity. Each one of these conditions has its “weak” counterpart: weak guardedness [6], weak stickiness [8], and weak acyclic- ity [11], respectively. A TGD σ is guarded if there exists an atom in its body that contains (or “guards”) all the body variables of σ. The class of guarded TGDs, denoted G, is defined as the Data Comb. ba-comb. fp-comb. Data Comb. ba-comb. fp-comb. L, LF, AF in AC0 PSPACE NP NP L, LF, AF coNP PSPACE coNP coNP G P 2EXP EXP NP G coNP 2EXP EXP coNP WG EXP 2EXP EXP EXP WG EXP 2 EXP EXP EXP 0 S, SF in AC EXP NP NP S, SF coNP EXP coNP coNP F, GF P EXP NP NP F, GF coNP EXP coNP coNP A in AC0 NEXP NEXP NP A coNP coNEXP coNEXP coNP WS, WA P 2EXP 2EXP NP WS, WA coNP 2EXP 2EXP coNP Fig. 3. Complexity of BCQ answering [18]. Fig. 4. Complexity of existence of a proba- All entries except for “in AC0 ” are com- bilistic (universal) solution (for both deter- pleteness ones, where hardness in all cases ministic and probabilistic ODE). All entries holds even for ground atomic BCQs. are completeness results. family of all possible sets of guarded TGDs. A key subclass of guarded TGDs are the so-called linear TGDs with just one body atom (which is automatically a guard), and the corresponding class is denoted L. Weakly guarded TGDs extend guarded TGDs by requiring only “harmful” body variables to appear in the guard, and the associated class is denoted WG. It is easy to verify that L ⊂ G ⊂ WG. Stickiness is inherently different from guardedness, and its central property can be described as follows: variables that appear more than once in a body (i.e., join variables) are always propagated (or “stick”) to the inferred atoms. A set of TGDs that enjoys the above property is called sticky, and the corresponding class is denoted S. Weak sticki- ness is a relaxation of stickiness where only “harmful” variables are taken into account. A set of TGDs which enjoys weak stickiness is weakly sticky, and the associated class is denoted WS. Observe that S ⊂ WS. A set Σ of TGDs is acyclic if its predicate graph is acyclic, and the underlying class is denoted A. In fact, an acyclic set of TGDs can be seen as a nonrecursive set of TGDs. We say Σ is weakly acyclic if its dependency graph enjoys a certain acyclicity condition, which actually guarantees the existence of a finite canonical model; the associated class is denoted WA. Clearly, A ⊂ WA. Another key fragment of TGDs, which deserves our attention, are the so-called full TGDs, i.e., TGDs without existentially quantified variables, and the corresponding class is denoted F. If we further assume that full TGDs enjoy linearity, guardedness, stickiness, or acyclicity, then we obtain the classes LF, GF, SF, and AF, respectively. 4.2 Overview of Complexity Results Our complexity results for deciding the existence of a probabilistic (universal) solution for both ODE and PODE problems with annotations over events relative to an under- lying Bayesian network are summarized in Fig. 4 for all classes of existential rules discussed above in the data, combined, ba-combined, and fp-combined complexity (all entries are completeness results). For L, LF, AF, S, SF, and A in the data complexity, we obtain tractability when the underlying Bayesian network is a polytree. For all other cases, hardness holds even when the underlying Bayesian network is a polytree. Finally, for all classes of existential rules discussed above except for WG, answering UCQs for both ODE and PODE problems is in #P in the data complexity. 4.3 Deterministic Ontological Data Exchange The first result shows that deciding whether there exists a probabilistic (or probabilistic universal) solution for a probabilistic source database relative to an ODE problem is complete for C (resp., coC), if BCQ answering for the involved sets of TGDs and NCs is complete for a deterministic (resp., nondeterministic) complexity class C ⊇ PSPACE (resp., C ⊇ NP), and hardness holds even for ground atomic BCQs. As a corollary, by the complexity of BCQ answering with TGDs and NCs in Figure 3 [18], we imme- diately obtain the complexity results shown in Figure 4 for deciding the existence of a probabilistic (universal) solution (in deterministic ontological data exchange) in the combined, ba-combined, and fp-combined complexity, and for the class WG of TGDs and NCs in the data complexity. The hardness results hold even when the underlying Bayesian network is a polytree. Theorem 1. Given a probabilistic source database P rs relative to a source ontology Σs and an ODE problem M = (S, T, Σs , Σt , Σst ) such that Σs ∪ Σt ∪ Σst belongs to a class of TGDs and NCs for which BCQ answering is complete for a deterministic (resp., nondeterministic) complexity class C ⊇ PSPACE (resp., C ⊇ NP), and hardness holds even for ground atomic BCQs, deciding the existence of a probabilistic (universal) solution for P rs relative to Σs and M is complete for C (resp., coC). Hardness holds even when the underlying Bayesian network is a polytree. The following result shows that deciding whether there exists a probabilistic (uni- versal) solution for a probabilistic source database relative to an ODE problem is com- plete for coNP in the data complexity, for all classes of sets of TGDs and NCs considered in this paper, except for WG. Hardness for coNP for the classes G, F, GF, WS, and WA holds even when the underlying Bayesian network is a polytree. Theorem 2. Given a probabilistic source database P rs relative to a source ontology Σs and an ODE problem M = (S, T, Σs , Σt , Σst ) such that Σs ∪ Σt ∪ Σst belongs to a class among L, LF, AF, G, S, SF, F, GF, A, WS, and WA, deciding whether there exists a probabilistic (or probabilistic universal) solution for P rs relative to Σs and M is coNP-complete in the data complexity. Hardness for coNP for the classes G, F, GF, WS, and WA holds even when the underlying Bayesian network is a polytree. The following result shows that deciding whether there exists a probabilistic (or probabilistic universal) solution for a probabilistic source database relative to an ODE problem is in P in the data complexity, if BCQ answering for the involved sets of TGDs and NCs is first-order rewritable as a Boolean UCQ, and the underlying Bayesian net- work is a polytree. As a corollary, by the complexity of BCQ answering with TGDs and NCs, deciding the existence of a solution is in P for the classes L, LF, AF, S, SF, and A in the data complexity, if the underlying Bayesian network is a polytree. Theorem 3. Given a probabilistic source database P rs relative to a source ontology Σs , with a polytree as Bayesian network, and an ODE problem M = (S, T, Σs , Σt , Σst ) such that Σs ∪ Σt ∪ Σst belongs to a class of TGDs and NCs for which BCQ answering is first-order rewritable as a Boolean UCQ, deciding whether there exists a probabilis- tic (universal) solution for P rs relative to Σs and M is in P in the data complexity. Finally, the following theorem shows that answering UCQs for probabilistic source databases relative to an ODE problem is complete for #P in the data complexity for all above classes of existential rules except for WG. Theorem 4. Given (i) an ODE problem M = (S, T, Σt , Σs , Σst ) such that Σs ∪ Σst ∪ Σt belongs to a class among L, LF, AF, G, S, SF, F, GF, A, WS, and WA, and (ii) a prob- abilistic source database P rs relative to Σs such that there exists a solution for P rs relative to M, (iii) a UCQ Q = q(X) over T, and (iv) a tuple a, computing confQ (a) is #P-complete in the data complexity. 4.4 Probabilistic Ontological Data Exchange All the results of Section 4.3 in Theorems 1 and 4 carry over to the case of proba- bilistic ontological data exchange. Clearly, the hardness results carry over immediately, since deterministic ontological data exchange is a special case of probabilistic ontolog- ical data exchange. As for the membership results, we additionally consider the worlds for the probabilistic mapping, which are iterated through in the data complexity and guessed in the combined, the ba-combined, and the fp-combined complexity. 5 Summary and Outlook We have defined deterministic and probabilistic ontological data exchange problems, where probabilistic knowledge is exchanged between two ontologies. The two ontolo- gies and the mapping between them are defined via existential rules, where the rules for the mapping are deterministic and probabilistic, respectively. We have given a precise analysis of the computational complexity of deciding the existence of a probabilistic (universal) solution for different classes of existential rules in both deterministic and probabilistic ontological data exchange. We also have delineated some tractable special cases, and we have provided some complexity results for exact UCQ answering. An interesting topic for future research is to further explore the tractable cases of probabilistic solution existence and whether they can be extended, e.g., by slightly gen- eralizing the type of the mapping rules. Another issue for future work is to further analyze the complexity of answering UCQs for different classes of existential rules in deterministic and probabilistic ontological data exchange. Acknowledgments. This work was supported by an EU (FP7/2007-2013) Marie-Curie Intra-European Fellowship (“PRODIMA”), the UK EPSRC grant EP/J008346/1 (“PrO- QAW”), the ERC grant 246858 (“DIADEM”), a Yahoo! Research Fellowship, and funds from Universidad Nacional del Sur and CONICET, Argentina. This paper is a short version of a paper that appeared in Proc. RuleML 2015 [19]. References 1. Arenas, M., Botoeva, E., Calvanese, D., Ryzhikov, V.: Exchanging OWL2 QL knowledge bases. In: Proc. IJCAI. pp. 703–710 (2013) 2. Arenas, M., Botoeva, E., Calvanese, D., Ryzhikov, V., Sherkhonov, E.: Exchanging descrip- tion logic knowledge bases. In: Proc. KR. pp. 563–567 (2012) 3. Arenas, M., Pérez, J., Reutter, J.L.: Data exchange beyond complete data. J. ACM 60(4), 28:1–28:59 (2013) 4. Baader, F.: Least common subsumers and most specific concepts in a description logic with existential restrictions and terminological cycles. In: Proc. IJCAI. pp. 364–369 (2003) 5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. IJCAI. pp. 364–369 (2005) 6. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013) 7. Cali, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/–: A family of logical knowledge representation and query languages for new applications. In: Proc. LICS. pp. 228–242 (2010) 8. Calı̀, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: The query answering problem. Artif. Intell. 193, 87–128 (2012) 9. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007) 10. Fagin, R., Kimelfeld, B., Kolaitis, P.G.: Probabilistic data exchange. J. ACM 58(4), 15:1– 15:55 (2011) 11. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: Semantics and query answer- ing. Theor. Comput. Sci. 336(1), 89–124 (2005) 12. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Sys. 15(1), 32–66 (1997) 13. Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: Proc. PODS. pp. 31– 40 (2007) 14. Imielinski, T., Witold Lipski, J.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984) 15. Johnson, D.S.: A catalog of complexity classes. In: van Leeuwen, J. (ed.) Handbook of The- oretical Computer Science, vol. A, chap. 2, pp. 67–161. MIT Press (1990) 16. Krisnadhi, A., Lutz, C.: Data complexity in the EL family of description logics. In: Proc. LPAR, LNCS, vol. 4790, pp. 333–347. Springer (2007) 17. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. PODS. pp. 233–246 (2002) 18. Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query answering under existential rules. In: Proc. AAAI. pp. 1546–1552 (2015) 19. Lukasiewicz, T., Martinez, M.V., Predoiu, L., Simari, G.I.: Existential rules and Bayesian networks for probabilistic ontological data exchange. In: Proc. RuleML. LNCS, vol. 9202, pp. 294–310. Springer (2015) 20. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley (1994) 21. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. Data Sem. 10, 133–173 (2008) 22. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. M & C (2011) 23. Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proc. STOC. pp. 137–146 (1982)