Probabilistic Ontological Data Exchange
                      with Bayesian Networks

                       Thomas Lukasiewicz1 , Maria Vanina Martinez3 ,
                          Livia Predoiu12 , and Gerardo I. Simari3
                   1
                     Department of Computer Science, University of Oxford, UK
    2
        Department of Computer Science, Otto-von-Guericke Universität Magdeburg, Germany
         3
           Dept. of Comp. Sci. and Eng., Univ. Nacional del Sur and CONICET, Argentina


          Abstract. We study the problem of exchanging probabilistic data between onto-
          logy-based probabilistic databases. The probabilities of the probabilistic source
          databases are compactly encoded via Boolean formulas with the variables ad-
          hering to the dependencies imposed by a Bayesian network, which are closely
          related to the management of provenance. For the ontologies and the ontology
          mappings, we consider different kinds of existential rules from the Datalog+/–
          family. We provide a complete picture of the computational complexity of the
          problem of deciding whether there exists a probabilistic (universal) solution for
          a given probabilistic source database relative to a (probabilistic) ontological data
          exchange problem. We also analyze the complexity of answering UCQs (unions
          of conjunctive queries) in this framework.


1       Introduction

Large volumes of uncertain data are best modeled, stored, and processed in probabilis-
tic databases [22]. Enriching databases with terminological knowledge encoded in on-
tologies has recently gained increasing importance in the form of ontology-based data
access (OBDA) [21]. A crucial problem in OBDA is to integrate and exchange knowl-
edge. Not only in the context of OBDA, but also in the area of the Semantic Web, there
are distributed ontologies that we may have to map and integrate to enable query an-
swering over them. Here, apart from the uncertainty attached to source databases, there
may also be uncertainty regarding the ontology mappings establishing the proper corre-
spondence between items in the source ontology and items in the target ontology. This
especially happens when the mappings are created automatically.
    Data exchange [11] is an important theoretical framework used for studying data-
interoperability tasks that require data to be transferred from existing databases to a
target database that comes with its own (independently created) schema and schema
constraints. The expressivity of the data exchange framework goes beyond the classi-
cal data integration framework [17]. For the translation, schema mappings are used,
which are declarative specifications that describe the relationship between two database
schemas. In classical data exchange, we have a source database, a target database, a de-
terministic mapping, and deterministic target dependencies. Recently, a framework for
probabilistic data exchange [10] has been proposed where the classical data exchange
framework based on weakly acyclic existential rules has been extended to consider a
probabilistic source database and a probabilistic source-to-target mapping.
     In this paper, we study an expressive extension of the probabilistic data exchange
framework in [10], where the source and the target are ontological knowledge bases,
each consisting of a probabilistic database and a deterministic ontology describing
terminological knowledge about the data stored in the database. The two ontologies
and the mapping between them are expressed via existential rules. Our extension of
the data exchange framework is strongly related to exchanging data between incom-
plete databases, as proposed in [3], which considers an incomplete deterministic source
database in the data exchange problem. However, in that work, the databases are deter-
ministic, and the mappings and the target database constraints are full existential rules
only. In our complexity analysis in this paper, we consider a host of different classes
of existential rules, including some subclasses of full existential rules. In addition, our
source is a probabilistic database relative to an underlying ontology.
     Our work in this paper is also related to the recently proposed knowledge base ex-
change framework [2, 1], which allows knowledge to be exchanged between determin-
istic DL-LiteRDF S and DL-LiteR ontologies. In this paper, besides considering proba-
bilistic source databases, we are also using more expressive ontology languages, since
already linear existential rules from the Datalog+/– family are strictly more expressive
than the description logics (DLs) DL-LiteX of the DL-Lite family [9] as well as their
extensions with n-ary relations DLR-LiteX . Guarded existential rules are sufficiently ex-
pressive to model the tractable DL EL [4, 5] (and ELI f [16]). Note that existential rules
are also known as tuple-generating dependencies (TGDs) and Datalog+/– rules [7].
The main contributions of this paper are summarized as follows.
− We introduce deterministic and probabilistic ontological data exchange problems,
where probabilistic knowledge is exchanged between two Bayesian network-based prob-
abilistic databases relative to their underlying deterministic ontologies, and the deter-
ministic and probabilistic mapping between the two ontologies is defined via determin-
istic and probabilistic existential mapping rules, respectively.
− We provide an in-depth analysis of the data and combined complexity of deciding the
existence of probabilistic (universal) solutions and obtain a (fairly) complete picture of
the data complexity, general combined complexity, bounded-arity (ba) combined, and
fixed-program combined (fp) complexity for the main sublanguages of the Datalog+/–
family. We also delineate some tractable special cases, and provide complexity results
for exact UCQ (union of conjunctive queries) answering.
− For the complexity analysis, we consider a compact encoding of probabilistic source
databases and mappings, which is used in the area of both incomplete and probabilistic
databases, and also known as data provenance or data lineage [14, 12, 13, 22]. Here,
we consider data provenance for probabilistic data that is structured according to an
underlying Bayesian network.

2   Preliminaries
We assume infinite sets of constants C, (labeled) nulls N, and regular variables V.
A term t is a constant, null, or variable. An atom has the form p(t1 , . . . , tn ), where p is
an n-ary predicate, and t1 , . . . , tn are terms. Conjunctions of atoms are often identified
with the sets of their atoms. An instance I is a (possibly infinite) set of atoms p(t),
where t is a tuple of constants and nulls. A database D is a finite instance that contains
only constants. A homomorphism is a substitution h : C ∪ N ∪ V → C ∪ N ∪ V that is
the identity on C. We assume familiarity with conjunctive queries (CQs). The answer
to a CQ q over an instance I is denoted q(I). A Boolean CQ (BCQ) q evaluates to true
over I, denoted I |= q, if q(I) 6= ∅.
     A tuple-generating dependency (TGD) σ is a first-order formula ∀X ϕ(X) →
∃Y p(X, Y), where X ∪ Y ⊆ V, ϕ(X) is a conjunction of atoms, and p(X, Y) is
an atom. We call ϕ(X) the body of σ, denoted body(σ), and p(X, Y) the head of σ, de-
noted head (σ). We consider only TGDs with a single atom in the head, but our results
can be extended to TGDs with a conjunction of atoms in the head. An instance I satis-
fies σ, written I |= σ, if the following holds: whenever there exists a homomorphism h
such that h(ϕ(X)) ⊆ I, then there exists h0 ⊇ h|X , where h|X is the restriction of h
to X, such that h0 (p(X, Y)) ∈ I. A negative constraint (NC) ν is a first-order formula
∀X ϕ(X) → ⊥, where X ⊆ V, ϕ(X) is a conjunction of atoms, called the body of ν,
denoted body(ν), and ⊥ denotes the truth constant false. An instance I satisfies ν, de-
noted I |= ν, if there is no homomorphism h such that h(ϕ(X)) ⊆ I. Given a set Σ of
TGDs and NCs, I satisfies Σ, denoted I |= Σ, if I satisfies each TGD and NC of Σ.
For brevity, we omit the universal quantifiers in front of TGDs and NCs.
     Given a database D and a set Σ of TGDs and NCs, the answers that we consider are
those that are true in all models of D and Σ. Formally, the models of D and Σ, denoted
mods(D, Σ), is the set of instances {I | I ⊇ D and I |= Σ}. TheTanswer to a CQ q rel-
ative to D and Σ is defined as the set of tuples ans(q, D, Σ) = I∈mods(D,Σ) {t | t ∈
q(I)}. The answer to a BCQ q is true, denoted D ∪ Σ |= q, if ans(q, D, Σ) 6= ∅.
The problem of CQ answering is defined as follows: given a database D, a set Σ of
TGDs and NCs, a CQ q, and a tuple of constants t, decide whether t ∈ ans(q, D, Σ).
Following Vardi’s taxonomy [23], the combined complexity of BCQ answering is cal-
culated by considering all the components, i.e., the database, the set of dependencies,
and the query, as part of the input. The bounded-arity combined complexity (or sim-
ply ba-combined complexity) is calculated by assuming that the arity of the underlying
schema is bounded by an integer constant. Notice that in the context of description
logics (DLs), whenever we refer to the combined complexity in fact we refer to the
ba-combined complexity since, by definition, the arity of the underlying schema is at
most two. The fixed-program combined complexity (or simply fp-combined complexity)
is calculated by considering the set of TGDs and NCs as fixed.

3   Ontological Data Exchange
In this section, we define the notions of deterministic and probabilistic ontological data
exchange. The source (resp., target) of the deterministic/probabilistic ontological data
exchange problems that we consider in this paper is a probabilistic database (resp.,
probabilistic instance), each relative to a deterministic ontology. Here, a probabilistic
database (resp., probabilistic instance) over a schema S is a probability space P r =
(I, µ) such that I is the set of all (possibly infinitelyP
                                                         many) databases (resp., instances)
over S, and µ : I → [0, 1] is a function that satisfies I∈I µ(I) = 1.
3.1   Deterministic Ontological Data Exchange
Ontological data exchange formalizes data exchange from a probabilistic database rel-
ative to a source ontology Σs (consisting of TGDs and NCs) over a schema S to a
probabilistic target instance Prt relative to a target ontology Σt (consisting of a set of
TGDs and NCs) over a schema T via a (source-to-target) mapping (also consisting of a
set of TGDs and NCs). More specifically, an ontological data exchange (ODE) problem
M= (S, T, Σs , Σt , Σst ) consists of (i) a source schema S, (ii) a target schema T disjoint
from S, (iii) a finite set Σs of TGDs and NCs over S (called source ontology), (iv) a fi-
nite set Σt of TGDs and NCs over T (called target ontology), and (v) a finite set Σst of
TGDs and NCs σ over S ∪ T (called (source-to-target) mapping) such that body(σ) and
head(σ) are defined over S ∪ T and T, respectively.
    Ontological data exchange with deterministic databases is based on defining a tar-
get instance J over T as being a solution for a deterministic source database I over S
relative to an ODE problem M = (S, T, Σs , Σt , Σst ), if (I ∪ J) |= Σs ∪ Σt ∪ Σst . We
denote by SolM the set of all such pairs (I, J). Among the possible deterministic solu-
tions J to a deterministic source database I relative to M in SolM , we prefer universal
solutions, which are the most general ones carrying only the necessary information for
data exchange, i.e., those that transfer only the source database along with the relevant
implicit derivations via Σs to the target ontology. A universal solution can be homo-
morphically mapped to all other solutions leaving the constants unchanged. Hence, a
deterministic target instance J over S is a universal solution for a deterministic source
database I over T relative to a schema mapping M, if (i) J is a solution, and (ii) for
each solution J 0 for I relative to M, there is a homomorphism h : J → J 0 . We denote
by USol M (⊆ SolM ) the set of all pairs (I, J) of deterministic source databases I and
target instances J such that J is a universal solution for I relative to M.
    When considering probabilistic databases and instances, a joint probability space Pr
over the solution relation SolM and the universal solution relation USolM must exist.
More specifically, a probabilistic target instance Prt = (J , µt ) is a probabilistic solution
(resp., probabilistic universal solution) for a probabilistic source database Prs = (I, µs )
relative to an ODE problem M = (S, T, Σs , Σt , Σst ), if there exists a probability space
Pr = (I × J , µ) such thatP(i) the left and right marginals of Pr are PrP  s and Prt , respec-
tively, i.e., (i.a) µs (I) = J∈J µ(I, J) for all I ∈ I, (i.b) µt (J) = I∈I µ(I, J) for
all J ∈ J ; and (ii) µ(I, J) = 0 for all (I, J) 6∈ SolM (resp., (I, J) 6∈ USolM ). Note that
this intuitively says that all non-solutions (I, J) have probability zero and the existence
of a solution does not exclude that some source databases with probability zero have no
corresponding target instance.
Example 1. An ontological data exchange (ODE) problem M = (S, T, Σs , Σt , Σst )
is given by the source schema S = {Researcher/2, ResearchArea/2, Publication/3}
(the number after each predicate denotes its arity), the target schema T = {UResearch-
Area/3, Lecturer/2}, the source ontology Σs = {σs , νs }, the target ontology Σt =
{σt , νt }, and the mapping Σst = {σst , νm }, where:
                     σs : Publication(X, Y, Z) → ResearchArea(X, Y),
                     νs : Researcher(X, Y) ∧ ResearchArea(X, Y) → ⊥,
                     σt : UResearchArea(U, D, T) → ∃Z Lecturer(T, Z),
                     νt : Lecturer(X, Y) ∧ Lecturer(Y, X) → ⊥,
                  Possible source database facts
               ra Researcher(Alice, UnivOx)                           Derived source database facts
               rp Researcher(Paul, UnivOx)                           aaml ResearchArea(Alice, ML)
               paml Publication(Alice, ML, JMLR)                     aadb ResearchArea(Alice, DB)
               padb Publication(Alice, DB, TODS)                     apdb ResearchArea(Paul, DB)
               ppdb Publication(Paul, DB, TODS)                      apai ResearchArea(Paul, AI)
               ppai Publication(Paul, AI, AIJ)
                                                                      Possible target instance facts
           Probabilistic source database Prs = (I, µs )
                                                                 uml UResearchArea(UnivOx, N1 , ML)
           I1 = {ra ,rp ,paml ,ppdb ,aaml ,apdb } 0.5
                                                                 uai UResearchArea(UnivOx, N2 , AI)
           I2 = {ra ,rp ,paml ,ppai ,aaml ,apai } 0.2
                                                                 udb UResearchArea(UnivOx, N3 , DB)
           I3 = {ra ,rp ,padb ,ppai ,aadb ,apai } 0.15
                                                                 lml Lecturer(ML, N4 )
           I4 = {ra ,rp ,padb ,ppdb ,aadb ,apdb } 0.075
                                                                 lai Lecturer(AI, N5 )
           I5 = {ra ,padb ,aadb }                 0.075
                                                                 ldb Lecturer(DB, N6 )
          Probabilistic target instance Prt1 = (J1 , µt1 )
                                                             Probabilistic target instance Prt2 = (J2 , µt2 )
          J1 = {uml ,udb ,lml ,ldb } 0.5
                                                             J5 = {uml ,udb ,lml ,ldb }           0.55
          J2 = {uml ,uai ,lml ,lai } 0.2
                                                             J6 = {uml ,uai ,lml ,lai }           0.1
          J3 = {uai ,udb ,lai ,ldb } 0.15
                                                             J7 = {uml ,uai ,udb ,lml ,lai ,ldb } 0.35
          J4 = {udb ,ldb }           0.15


Table 1. Probabilistic source database and two probabilistic target instances for Example 1
(N1 , . . . , N6 are nulls); both are probabilistic solutions, but only Prt1 is universal.


  Fig. 1. Probabilistic universal solution Prt1 .                    Fig. 2. Probabilistic solution Prt2 .


        σst : ResearchArea(N, T) ∧ Researcher(N, U) → ∃D UResearchArea(U, D, T),
        νst : ResearchArea(N, T) ∧ UResearchArea(U, T, N) → ⊥.

Given the probabilistic source database in Table 1, two probabilistic instances Prt1 =
(J1 , µt1 ) and Prt2 = (J2 , µt2 ) that are probabilistic solutions are shown in Table 1.
Note that only Prt1 is also a probabilistic universal solution. Note also that Figures 1
and 2 show the probability spaces over Prt1 and Prt2 , respectively.                   


    Query answering in ontological data exchange is performed over the target ontology
and is generalized from deterministic data exchange. A union of conjunctive queries (or
                              Wk
UCQ) has the form q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), where each ∃Yi Φi (X, Yi , Ci )
with i ∈ {1, . . . , k} is a CQ with exactly the variables X and Yi , and the constants
Ci . Given an ODE problem M = (S, T, Σs , Σt , Σst ), probabilistic source database
                               Wk
Prs = (I, µs ), UCQ q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), and tuple t (a ground instance
of X in q) over C, the confidence of t relative to q, denoted conf q (t), in Prs relative to
M is the infimum of Prt (q(t)) subject to all probabilistic solutions Prt for Prs relative
to M. Here, Prt (q(t)) for Prt = (J , µt ) is the sum of all µt (J) such that q(t) evaluates
to true in the instance J ∈ J (i.e., some BCQ ∃Yi Φi (t, Yi , Ci ) with i ∈ {1, . . . , k}
evaluates to true in J).

Example 2. Consider again the setting of Example 1, and let q be a UCQ of a stu-
dent who wants to know whether she can study either machine learning or artificial
intelligence at the University of Oxford: q() = ∃X, Z(Lecturer(AI, X) ∧ UResearch-
Area(UnivOx, Z, AI)) ∨ ∃X, Z(Lecturer(ML, X) ∧ UResearchArea(UnivOx, Z, ML)).
Then, q yields the probabilities 0.85 and 1 on Prt1 and Prt2 , respectively.     


3.2   Probabilistic Ontological Data Exchange

Probabilistic ontological data exchange extends deterministic ontological data exchange
by turning the deterministic source-to-target mapping into a probabilistic source-to-
target mapping, i.e., we have a probability distribution over the set of all subsets of Σst .
More specifically, a probabilistic ontological data exchange (PODE) problem M =
(S, T, Σs , Σt , Σst , µst ) consists of (i) a source schema S, (ii) a target schema T dis-
joint from S, (iii) a finite set Σs of TGDs and NCs over S (called source ontology),
(iv) a finite set Σt of TGDs and NCs over T (called target ontology), (v) a finite set
                                                                             Σst
Pst of TGDs and0 NCs σ over S ∪ T, and (vi) a function µst : 2
Σ                                                                                → [0, 1] such that
     0
    Σ ⊆Σst   µ st (Σ  ) = 1 (called probabilistic  (source-to-target)      mapping).
     A probabilistic target instance Prt = (J , µt ) is a probabilistic solution (resp., prob-
abilistic universal solution) for a probabilistic source database Prs = (I, µs ) relative to
a PODE problem M = (S, T, Σs , Σt , Σst , µst ), if there exists a probability space Pr =
 = (I ×J ×2Σ      P , µ) such that: (i) the three
                   st
                                                  marginals of µ are µs , µt ,P  and µst , such that:
(i.a) µs (I) = J∈J , Σ 0 ⊆Σst µ(I, J, Σ 0 ) for    all  I  ∈ I,  (i.b) µt (J) =    I∈I, Σ 0 ⊆Σst µ(I,
J, Σ 0 ) for all J ∈ J , and (i.c) µst (Σ 0 ) = I∈I, J∈J µ(I, J, Σ 0 ) for all Σ 0 ⊆ Σst ; and
                                                P
(ii) µ(I, J, Σ 0 ) = 0 for all (I, J) 6∈ Sol (S,T,Σ 0 ) (resp., (I, J) 6∈ USol (S,T,Σ 0 ) ).
     Using probabilistic (universal) solutions for probabilistic source databases relative
to PODE problems, the semantics of UCQs is lifted to PODE problems as follows.
Given a PODE problem M = (S, T, Σs , Σt , Σst , µst ), a probabilistic source database
                                     Wk
Prs = (I, µs ), a UCQ q(X) = i=1 ∃Yi Φi (X, Yi , Ci ), and a tuple t (a ground in-
stance of X in q) over C, the confidence of t relative to q, denoted conf q (t), in Prs rel-
ative to M is the infimum of Prt (q(t)) subject to all probabilistic solutions Prt for Prs
relative to M. Here, Prt (q(t)) for Prt = (J , µt ) is the sum of all µt (J) such that q(t)
evaluates to true in the instance J ∈ J .


3.3   Compact Encoding

We use a compact encoding of both probabilistic databases and probabilistic map-
pings, which is based on annotating facts, TGDs, and NCs by probabilistic events in
a Bayesian network, rather than explicitly specifying the whole probability space.
    We first define annotations and annotated atoms. Let e1 , . . . , en be n ≥ 1 elemen-
tary events. A world w is a conjunction `1 ∧· · ·∧`n , where each `i , i ∈ {1, . . . , n}, is ei-
ther the elementary event ei or its negation ¬ei . An annotation λ is any Boolean combi-
nation of elementary events (i.e., all elementary events are annotations, and if λ1 and λ2
                           Possible source database facts Annotation
                        ra   Researcher(Alice, UnivOx)    true
                        rp   Researcher(Paul, UnivOx)     e1 ∨ e2 ∨ e3 ∨ e4
                        paml Publication(Alice, ML, JMLR) e1 ∨ e2
                        padb Publication(Alice, DB, TODS) ¬ e1 ∧ ¬ e2
                        ppdb Publication(Paul, DB, TODS) e1 ∨ (¬ e2 ∧ ¬ e3 ∧ e4 )
                        ppai Publication(Paul, AI, AIJ)   (¬ e1 ∧ e2 ) ∨ (¬ e1 ∧ e3 )

           Table 2. Annotation encoding of the probabilistic source database in Table 1.


                       Table 3. Bayesian network over the elementary events.


are annotations, then also ¬λ1 and λ1 ∧ λ2 ). An annotated atom has the form a : λ,
where a is an atom, and λ is an annotation.
     The compact encoding of probabilistic databases can then be defined as follows.
Note that this encoding is also underlying our complexity analysis in Section 4. A set A
of annotated atoms along with a probability µ(w) ∈ [0, 1] for every world w compactly
encodes a probabilistic database P r = (I, µ) whenever: (i) the probability µ of ev-
ery annotation λ is the sum of the probabilities of all worlds in which λ is true, and
(ii) the probability µ of every subset-maximal database {a1 , . . . , am } ∈ I 4 such that
{a1 : λ1 , . . . , am : λm } ⊆ A for some annotations λ1 , . . . , λm is the probability µ of
λ1 ∧ · · · ∧ λm (and the probability µ of every other database in I is 0).
     We assume that the probability distributions for the underlying events are given by
a Bayesian network, which is usually used for compactly specifying a joint probability
space, encoding also a certain causal structure between the variables. The following
example in Tables 2 and 3 illustrates the compact encoding of probabilistic source data-
bases via Boolean annotations relative to an underlying Bayesian network.
     If the mapping is probabilistic as well, then we use two disjoint sets of elementary
events, one for encoding the probabilistic source database and the other one for the
mapping. In this way, the probabilistic source database is independent from the proba-
bilistic mapping. We now define the compact encoding of probabilistic mappings. An
annotated TGD (resp., NC) has the form σ : λ, where σ is a TGD (resp., NC), and λ
is an annotation. A set Σ of annotated TGDs and NCs σ : λ with σ ∈ Σst along with
a probability µ(w) ∈ [0, 1] for every world w compactly encodes a probabilistic map-
pings µst : 2Σst → [0, 1] whenever (i) the probability µ of every annotation λ is the sum
of the probabilities of all worlds in which λ is true, and (ii) the probability µst of every
 4
     That is, we do not consider subsets of the databases here.
subset-maximal {σ1 , . . . , σk } ⊆ Σst such that {σ1 : λ1 , . . . , σk : λk } ⊆ Σ for some
annotations λ1 , . . . , λk is the probability µ of λ1 ∧ · · · ∧ λk (and the probability µst of
every other subset of Σst is 0).

3.4   Computational Problems
We consider the following computational problems:
 Existence of a solution (resp., universal solution): Given an ODE or a PODE prob-
   lem M and a probabilistic source database Prs , decide whether there exists a prob-
   abilistic (resp., probabilistic universal) solution for Prs relative to M.
 Answering UCQs: Given an ODE or a PODE problem M, a probabilistic source data-
   base Prs , a UCQ q(X), and a tuple t over C, compute conf Q (t) in Prs w.r.t. M.


4     Computational Complexity
We now analyze the computational complexity of deciding the existence of a (univer-
sal) probabilistic solution for deterministic and probabilistic ontological data exchange
problems. We also delineate some tractable special cases, and we provide some com-
plexity results for exact UCQ answering for ODE and PODE problems.
    We assume some elementary background in complexity theory [15, 20]. We now
briefly recall the complexity classes that we encounter in our complexity results. The
complexity classes PSPACE (resp., P, EXP, 2EXP) contain all decision problems that
can be solved in polynomial space (resp., polynomial, exponential, double exponential
time) on a deterministic Turing machine, while the complexity classes NP and NEXP
contain all decision problems that can be solved in polynomial and exponential time
on a nondeterministic Turing machine, respectively; coNP and coNEXP are their com-
plementary classes, where “Yes” and “No” instances are interchanged. The complexity
class AC0 is the class of all languages that are decidable by uniform families of Boolean
circuits of polynomial size and constant depth. The inclusion relationships among the
above (decision) complexity classes (all currently believed to be strict) are as follows:
                    0
               AC       ⊆ P ⊆ NP, coNP ⊆ PSPACE ⊆ EXP ⊆ NEXP, coNEXP ⊆ 2EXP

    The (function) complexity class #P is the set of all functions that are computable
by a polynomial-time nondeterministic Turing machine whose output for a given input
string I is the number of accepting computations for I.

4.1   Decidability Paradigms
The main (syntactic) conditions on TGDs that guarantee the decidability of CQ answer-
ing are guardedness [6], stickiness [8], and acyclicity. Each one of these conditions has
its “weak” counterpart: weak guardedness [6], weak stickiness [8], and weak acyclic-
ity [11], respectively.
     A TGD σ is guarded if there exists an atom in its body that contains (or “guards”)
all the body variables of σ. The class of guarded TGDs, denoted G, is defined as the
                  Data Comb. ba-comb. fp-comb.                 Data Comb. ba-comb. fp-comb.
      L, LF, AF   in AC0 PSPACE   NP      NP       L, LF, AF   coNP PSPACE coNP      coNP
          G          P    2EXP    EXP     NP           G       coNP 2EXP     EXP     coNP
         WG         EXP   2EXP    EXP    EXP          WG        EXP 2 EXP    EXP      EXP
                        0
        S, SF     in AC    EXP     NP    NP          S, SF     coNP EXP     coNP     coNP
        F, GF       P     EXP      NP    NP          F, GF     coNP EXP     coNP     coNP
          A       in AC0 NEXP     NEXP   NP            A       coNP coNEXP coNEXP    coNP
      WS, WA         P   2EXP     2EXP   NP        WS, WA      coNP 2EXP    2EXP     coNP


  Fig. 3. Complexity of BCQ answering [18].      Fig. 4. Complexity of existence of a proba-
  All entries except for “in AC0 ” are com-      bilistic (universal) solution (for both deter-
  pleteness ones, where hardness in all cases    ministic and probabilistic ODE). All entries
  holds even for ground atomic BCQs.             are completeness results.


family of all possible sets of guarded TGDs. A key subclass of guarded TGDs are the
so-called linear TGDs with just one body atom (which is automatically a guard), and
the corresponding class is denoted L. Weakly guarded TGDs extend guarded TGDs by
requiring only “harmful” body variables to appear in the guard, and the associated class
is denoted WG. It is easy to verify that L ⊂ G ⊂ WG.
     Stickiness is inherently different from guardedness, and its central property can be
described as follows: variables that appear more than once in a body (i.e., join variables)
are always propagated (or “stick”) to the inferred atoms. A set of TGDs that enjoys the
above property is called sticky, and the corresponding class is denoted S. Weak sticki-
ness is a relaxation of stickiness where only “harmful” variables are taken into account.
A set of TGDs which enjoys weak stickiness is weakly sticky, and the associated class
is denoted WS. Observe that S ⊂ WS.
     A set Σ of TGDs is acyclic if its predicate graph is acyclic, and the underlying class
is denoted A. In fact, an acyclic set of TGDs can be seen as a nonrecursive set of TGDs.
We say Σ is weakly acyclic if its dependency graph enjoys a certain acyclicity condition,
which actually guarantees the existence of a finite canonical model; the associated class
is denoted WA. Clearly, A ⊂ WA.
     Another key fragment of TGDs, which deserves our attention, are the so-called
full TGDs, i.e., TGDs without existentially quantified variables, and the corresponding
class is denoted F. If we further assume that full TGDs enjoy linearity, guardedness,
stickiness, or acyclicity, then we obtain the classes LF, GF, SF, and AF, respectively.


4.2    Overview of Complexity Results

Our complexity results for deciding the existence of a probabilistic (universal) solution
for both ODE and PODE problems with annotations over events relative to an under-
lying Bayesian network are summarized in Fig. 4 for all classes of existential rules
discussed above in the data, combined, ba-combined, and fp-combined complexity (all
entries are completeness results). For L, LF, AF, S, SF, and A in the data complexity,
we obtain tractability when the underlying Bayesian network is a polytree. For all other
cases, hardness holds even when the underlying Bayesian network is a polytree. Finally,
for all classes of existential rules discussed above except for WG, answering UCQs for
both ODE and PODE problems is in #P in the data complexity.
4.3   Deterministic Ontological Data Exchange
The first result shows that deciding whether there exists a probabilistic (or probabilistic
universal) solution for a probabilistic source database relative to an ODE problem is
complete for C (resp., coC), if BCQ answering for the involved sets of TGDs and NCs
is complete for a deterministic (resp., nondeterministic) complexity class C ⊇ PSPACE
(resp., C ⊇ NP), and hardness holds even for ground atomic BCQs. As a corollary, by
the complexity of BCQ answering with TGDs and NCs in Figure 3 [18], we imme-
diately obtain the complexity results shown in Figure 4 for deciding the existence of
a probabilistic (universal) solution (in deterministic ontological data exchange) in the
combined, ba-combined, and fp-combined complexity, and for the class WG of TGDs
and NCs in the data complexity. The hardness results hold even when the underlying
Bayesian network is a polytree.
Theorem 1. Given a probabilistic source database P rs relative to a source ontology
Σs and an ODE problem M = (S, T, Σs , Σt , Σst ) such that Σs ∪ Σt ∪ Σst belongs to
a class of TGDs and NCs for which BCQ answering is complete for a deterministic
(resp., nondeterministic) complexity class C ⊇ PSPACE (resp., C ⊇ NP), and hardness
holds even for ground atomic BCQs, deciding the existence of a probabilistic (universal)
solution for P rs relative to Σs and M is complete for C (resp., coC). Hardness holds
even when the underlying Bayesian network is a polytree.

    The following result shows that deciding whether there exists a probabilistic (uni-
versal) solution for a probabilistic source database relative to an ODE problem is com-
plete for coNP in the data complexity, for all classes of sets of TGDs and NCs considered
in this paper, except for WG. Hardness for coNP for the classes G, F, GF, WS, and WA
holds even when the underlying Bayesian network is a polytree.
Theorem 2. Given a probabilistic source database P rs relative to a source ontology
Σs and an ODE problem M = (S, T, Σs , Σt , Σst ) such that Σs ∪ Σt ∪ Σst belongs to
a class among L, LF, AF, G, S, SF, F, GF, A, WS, and WA, deciding whether there
exists a probabilistic (or probabilistic universal) solution for P rs relative to Σs and
M is coNP-complete in the data complexity. Hardness for coNP for the classes G, F,
GF, WS, and WA holds even when the underlying Bayesian network is a polytree.

    The following result shows that deciding whether there exists a probabilistic (or
probabilistic universal) solution for a probabilistic source database relative to an ODE
problem is in P in the data complexity, if BCQ answering for the involved sets of TGDs
and NCs is first-order rewritable as a Boolean UCQ, and the underlying Bayesian net-
work is a polytree. As a corollary, by the complexity of BCQ answering with TGDs and
NCs, deciding the existence of a solution is in P for the classes L, LF, AF, S, SF, and A
in the data complexity, if the underlying Bayesian network is a polytree.
Theorem 3. Given a probabilistic source database P rs relative to a source ontology
Σs , with a polytree as Bayesian network, and an ODE problem M = (S, T, Σs , Σt , Σst )
such that Σs ∪ Σt ∪ Σst belongs to a class of TGDs and NCs for which BCQ answering
is first-order rewritable as a Boolean UCQ, deciding whether there exists a probabilis-
tic (universal) solution for P rs relative to Σs and M is in P in the data complexity.
    Finally, the following theorem shows that answering UCQs for probabilistic source
databases relative to an ODE problem is complete for #P in the data complexity for all
above classes of existential rules except for WG.
Theorem 4. Given (i) an ODE problem M = (S, T, Σt , Σs , Σst ) such that Σs ∪ Σst ∪
Σt belongs to a class among L, LF, AF, G, S, SF, F, GF, A, WS, and WA, and (ii) a prob-
abilistic source database P rs relative to Σs such that there exists a solution for P rs
relative to M, (iii) a UCQ Q = q(X) over T, and (iv) a tuple a, computing confQ (a) is
#P-complete in the data complexity.

4.4   Probabilistic Ontological Data Exchange
All the results of Section 4.3 in Theorems 1 and 4 carry over to the case of proba-
bilistic ontological data exchange. Clearly, the hardness results carry over immediately,
since deterministic ontological data exchange is a special case of probabilistic ontolog-
ical data exchange. As for the membership results, we additionally consider the worlds
for the probabilistic mapping, which are iterated through in the data complexity and
guessed in the combined, the ba-combined, and the fp-combined complexity.


5     Summary and Outlook
We have defined deterministic and probabilistic ontological data exchange problems,
where probabilistic knowledge is exchanged between two ontologies. The two ontolo-
gies and the mapping between them are defined via existential rules, where the rules for
the mapping are deterministic and probabilistic, respectively. We have given a precise
analysis of the computational complexity of deciding the existence of a probabilistic
(universal) solution for different classes of existential rules in both deterministic and
probabilistic ontological data exchange. We also have delineated some tractable special
cases, and we have provided some complexity results for exact UCQ answering.
    An interesting topic for future research is to further explore the tractable cases of
probabilistic solution existence and whether they can be extended, e.g., by slightly gen-
eralizing the type of the mapping rules. Another issue for future work is to further
analyze the complexity of answering UCQs for different classes of existential rules in
deterministic and probabilistic ontological data exchange.
Acknowledgments. This work was supported by an EU (FP7/2007-2013) Marie-Curie
Intra-European Fellowship (“PRODIMA”), the UK EPSRC grant EP/J008346/1 (“PrO-
QAW”), the ERC grant 246858 (“DIADEM”), a Yahoo! Research Fellowship, and
funds from Universidad Nacional del Sur and CONICET, Argentina. This paper is a
short version of a paper that appeared in Proc. RuleML 2015 [19].


References
 1. Arenas, M., Botoeva, E., Calvanese, D., Ryzhikov, V.: Exchanging OWL2 QL knowledge
    bases. In: Proc. IJCAI. pp. 703–710 (2013)
 2. Arenas, M., Botoeva, E., Calvanese, D., Ryzhikov, V., Sherkhonov, E.: Exchanging descrip-
    tion logic knowledge bases. In: Proc. KR. pp. 563–567 (2012)
 3. Arenas, M., Pérez, J., Reutter, J.L.: Data exchange beyond complete data. J. ACM 60(4),
    28:1–28:59 (2013)
 4. Baader, F.: Least common subsumers and most specific concepts in a description logic with
    existential restrictions and terminological cycles. In: Proc. IJCAI. pp. 364–369 (2003)
 5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. IJCAI. pp. 364–369
    (2005)
 6. Calı̀, A., Gottlob, G., Kifer, M.: Taming the infinite chase: Query answering under expressive
    relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)
 7. Cali, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/–: A family of
    logical knowledge representation and query languages for new applications. In: Proc. LICS.
    pp. 228–242 (2010)
 8. Calı̀, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: The query
    answering problem. Artif. Intell. 193, 87–128 (2012)
 9. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning
    39(3), 385–429 (2007)
10. Fagin, R., Kimelfeld, B., Kolaitis, P.G.: Probabilistic data exchange. J. ACM 58(4), 15:1–
    15:55 (2011)
11. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: Semantics and query answer-
    ing. Theor. Comput. Sci. 336(1), 89–124 (2005)
12. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information
    retrieval and database systems. ACM Trans. Inf. Sys. 15(1), 32–66 (1997)
13. Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: Proc. PODS. pp. 31–
    40 (2007)
14. Imielinski, T., Witold Lipski, J.: Incomplete information in relational databases. J. ACM
    31(4), 761–791 (1984)
15. Johnson, D.S.: A catalog of complexity classes. In: van Leeuwen, J. (ed.) Handbook of The-
    oretical Computer Science, vol. A, chap. 2, pp. 67–161. MIT Press (1990)
16. Krisnadhi, A., Lutz, C.: Data complexity in the EL family of description logics. In: Proc.
    LPAR, LNCS, vol. 4790, pp. 333–347. Springer (2007)
17. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. PODS. pp. 233–246
    (2002)
18. Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query
    answering under existential rules. In: Proc. AAAI. pp. 1546–1552 (2015)
19. Lukasiewicz, T., Martinez, M.V., Predoiu, L., Simari, G.I.: Existential rules and Bayesian
    networks for probabilistic ontological data exchange. In: Proc. RuleML. LNCS, vol. 9202,
    pp. 294–310. Springer (2015)
20. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley (1994)
21. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking
    data to ontologies. J. Data Sem. 10, 133–173 (2008)
22. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. M & C (2011)
23. Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proc.
    STOC. pp. 137–146 (1982)