Query Answering in Bayesian Description Logics İsmail İlkan Ceylan? Theoretical Computer Science, TU Dresden, Germany ceylan@tcs.inf.tu-dresden.de Abstract. The Bayesian Description Logic (BDL) BEL is a probabilistic DL, which extends the lightweight DL EL by defining a joint probability distribution over EL axioms with the help of a Bayesian network (BN). In the recent work, extensions of standard logical reasoning tasks in BEL are shown to be reducible to inferences in BNs. This work concentrates on a more general reasoning task, namely on conjunctive query answering in BEL where every query is associated to a probability leading to different reasoning problems. In particular, we study the probabilistic query entailment, top-k answers, and top-k con- texts as reasoning problems. Our complexity analysis suggests that all of these problems are tractable under certain assumptions. 1 Introduction Description Logics (DLs) [3], as a successful family of knowledge representation (KR) formalisms, have been employed in various application domains such as conceptual modeling, databases, bio-medical ontologies, natural language pro- cessing, configuration, and the semantic web1 . Arguably, all these domains, as is real world, are subject to imprecision; may it be an assertion about an individual or a terminological statement, it often comes along with a degree of uncertainty. The fact that classical DLs had severe limitations in representing and rea- soning under uncertainty led to a body of work [20] tailored towards this goal. Several extensions to DLs have been proposed with different characteristics in terms of their logical expressivity, their semantics, and their independence as- sumptions. BDLs [6] have been proposed as a means of representing the uncertainty over DL axioms that are being asserted. In BDLs, every axiom is associated with a probability, which is encoded with the help of a BN. This family of logics provides a compact and easy way of encoding probabilities over DL axioms. Two important features of BDLs are that they do not force any independence assumptions, and they are based on the so-called multiple world semantics. The focus of this work is the DL BEL [7], a Bayesian extension of the lightweight DL EL [2] for which several probabilistic reasoning tasks have been ? Supported by DFG within the Research Training Group “RoSI” (GRK 1907). 1 http://www.w3.org/TR/owl2-overview/ Table 1: Syntax and Semantics of EL Name Syntax Semantics Top > ∆I Conjunction C uD C I ∩ DI Exist. Rest. ∃r.C {d | ∃e ∈ ∆I : (d, e) ∈ rI , e ∈ C I } GCI CvD C I ⊆ DI Concept assertion C(a) aI ∈ C I Role assertion r(a, b) (aI , bI ) ∈ rI studied such as the probabilistic entailment, or finding most likely context (sub- ontology) for an entailment. In fact, tight complexity bounds have been obtained for these problems [8]. Nevertheless, problems related to query answering, and in particular con- junctive query (CQ) answering, has not been studied in the context of BDLs, so far. In this paper, we close this gap and focus on i) probabilistic query en- tailment: “What is the probability of a query to be entailed?” ii) probabilistic query answering: “What are the top-k answers to a query?” and finally iii) the most likely context: “What are the top-k contexts that entail a query?” Consequently, we argue that these problems generalize the reasoning prob- lems that have been considered so far. Unsurprisingly, reasoning in BEL is in- tractible as is CQ answering in EL and inference in BNs. Further analysis shows that tractability can be regained by fixing the BN and the query. 2 Conjunctive Query Answering in EL We briefly review the DL EL [5] and query answering in EL, which constitute the basis of this paper. Formally, let NI , NC and NR be disjoint sets of individual-, concept- and role-names, respectively. EL concept language is defined by the grammar rule C ::= A | > | C u C | ∃r.C, where A ∈ NC and r ∈ NR . The semantics of EL is given by an interpretation: that is a tuple I = (∆I , ·I ) where ∆I is a non-empty domain and ·I is an interpretation function that maps every individual name a to an element aI ∈ ∆I ; every concept name A to a set AI ⊆ ∆I and every role name r to a binary relation rI ⊆ ∆I × ∆I . The interpretation function ·I is extended to EL concepts as shown in the upper part of Table 1. The domain knowledge is encoded through a set of axioms, which restrict the interpretation domain of the concepts. A TBox T is a finite set of general concept inclusions (GCIs) of the form C v D, where C, D are concepts. An ABox is a finite set of concept assertions C(a) and role assertions r(a, b), where a, b ∈ NI , C is a concept and r ∈ NR . A knowledge base is a pair K = (T , A) where T is a TBox and A is an ABox. We use the term axiom as a general expression for GCIs and assertions. The interpretation I satisfies an axiom λ iff it satisfies the conditions on the lower part of Table 1. It is a model of the TBox T if it satisfies all GCIs in T and a model of the ABox A if it satisfies all the assertions in A. An interpretation is a model of the KB K = (T , A) iff it is a model of both T and A. For the rest of this paper we will denote as NI (A) the set of all individual names that appear in the ABox A. CQA is an important reasoning task for DLs that has been investigated in the context of EL. Let NV be a set of variables disjoint from NC , NR , and NI . An atom is an expression of the form A(χ) or r(χ, ψ), where A ∈ NC , r ∈ NR , and χ, ψ ∈ NI ∪ NV. A conjunctive query (CQ) q is a non-empty set of atoms associated to a set DV(q) ⊆ NV of distinguished variables. If DV(q) = ∅, then q is called a Boolean CQ. A special case of a CQ is an instance query (IQ), which consists of only one atom A(χ) with A ∈ NC . Let q be a Boolean CQ and IV(q) be the set of all individual names and variables appearing in q. The interpretation I satisfies q if there exists a function π : IV(q) → ∆I such that (i) π(a) = aI for all a ∈ NI ∩IV(q), (ii) π(χ) ∈ AI for all A(χ) ∈ q, and (iii) (π(χ), π(ψ)) ∈ rI for all r(χ, ψ) ∈ q. In this case, we call π a match for I and q. The ontology O entails q (O |= q) iff every model of O satisfies q. For an arbitrary CQ q, a function a : DV(q) → NI (A) is an answer to q w.r.t. O iff O entails the Boolean CQ a(q) obtained by replacing every distinguished variable χ ∈ DV(q) with a(χ). Conjunctive query answering (CQA) is the task of finding all answers of a CQ, and query entailment is the problem of deciding whether an ontology entails a given Boolean CQ by replacing every distinguished variable χ ∈ DV(q) with a(χ). It is well known that query entailment in EL is polynomial w.r.t. data and KB complexity, but NP-complete w.r.t. combined complexity [23]. Notice that, EL does not enjoy the so-called full first order rewritability which has been considered as a key feature for CQA, since it allows one to reduce the problem to standard tasks in Relational Database Management Systems (RDMSs). Yet, CQA in EL can be successfully employed using a combined approach as described in [22]. 3 The Bayesian Description Logic BEL The Bayesian DL BEL [7] has been introduced as a probabilistic extension of the light-weight DL EL. In BEL probabilities are encoded through a Bayesian network (BN) [11]; that is, a pair B = (G, Φ), where G = (V, E) is a finite directed acyclic graph (DAG) whose nodes represent (boolean) random variables, and Φ contains, for every node x ∈ V , a conditional probability distribution PB (x | π(x)) of x given its parents π(x). If V is the set of nodes in G, we say that B is a BN over V . BNs are widely studied probabilistic graphical models where the underlying graph G = (V, E) encodes a series of conditional independence assumptions between the random variables. Every variable x ∈ V is known to be conditionally independent of its non-descendants given its parents. Thus, every BN B defines a unique joint probability distribution (JPD) over V given by Y PB (V ) = PB (x | π(x)). x∈V The concept language of BEL is the same as the EL concept language. The difference appears in encoding the domain knowledge, i.e. in forming axioms. BEL generalizes classical TBoxes (resp. ABoxes) by annotating the GCIs (resp. assertions) with a context defined by a set of literals belonging to a BN. Formally, let NI be a set of individual names and V a finite set of boolean variables. A V -context is a conjunction of literals over V . A V -restricted general concept inclusion (V -GCI) is an expression of the form hC v D : κi where C, D are BEL concepts and κ is a V -context. A V -restricted assertion (V -assertion) is an expression of the form hC(a) : κi, or hr(a, b) : κi where a, b ∈ NI , C, D are BEL concepts and κ is a V -context. A V -TBox (resp.V -ABox) is a finite set of V -GCIs (resp.V -assertions). A BEL knowledge base (KB) is a tuple K = (B, T , A) where B is a BN over V , T is a V -TBox and A is a V -ABox. We will sometimes speak of contextual axioms to address both V -GCIs and V -assertions. The intuition behind the contextual axioms is to enforce an axiom to hold within a given context, but not necessarily in others. The semantic of such axioms is realized with the so-called contextual interpretations, which differently from the classical interpretations also evaluate the context variables. Formally, given a finite set of Boolean variables V , (I, V I ) is a contextual interpretation where V I is a propositional interpretation over V , and I = (∆I , ·I ) is a classical EL interpretation. We will usually ignore the prefix and speak simply of e.g. a KB, a TBox, an ABox, or an interpretation. The interpretation function ·I is extended to arbitrary BEL concepts as in EL, i.e. using the rules in Table 1. We say that the contextual interpretation (I, V I ) is a model of an axiom hλ : κi denoted as (I, V I ) |= hλ : κi, iff either (i) V I 6|= κ, or (ii) I |= λ. It is a model of the TBox T (resp. ABox A) iff it is a model of all the axioms in T (resp. A). A contextual interpretation (I, V I ) needs to satisfy only the axioms asserted within a context κ for which it holds that V I |= κ. Formally, let K = (B, T , A) be a BEL KB: Given a contextual interpretation (I, V I ) where V I = W, we define the EL KB KW = (TW , AW ) that needs to be satisfied by I as: TW := {C v D | hC v D : ϕi ∈ T , W |= ϕ}, AW := {C(a) | hC(a) : ϕi ∈ A, W |= ϕ} ∪ {r(a, b) | hr(a, b) : ϕi ∈ A, W |= ϕ}. In BEL, uncertainty is represented through a BN that describes a joint proba- bility distribution over the context variables. Semantically, BEL is linked to this distribution with the so called multiple world semantics: A probabilistic inter- pretation defines a probability distribution over a set of (contextual) interpre- tations; this distribution is required to be consistent with the joint probability distribution provided by the BN. Formally, a probabilistic interpretation is a pair P = (I, PI ), where I is a set of contextual interpretations and PI is a probability distribution over I such that PI (I, V I ) > 0 only for finitely many interpretations (I, V I ) ∈ I. P is a model of the TBox T (resp. ABox A) if every (I, V I ) ∈ I is a model of T (resp. A). P is consistent with the BN B if for every possible valuation W of the variables in V it holds that X PI (I, V I ) = PB (W). (I,V I )∈I, V I =W The probabilistic interpretation P is a model of the KB (B, T , A) iff it is a (probabilistic) model of T , A and consistent with B. To provide a fine-grained analysis of the complexity of reasoning in BEL, we use different measures for the size of the input. In data complexity, we measure only the size of the ABox, and consider the rest of the KB and the query fixed. For ontology complexity we use the size of the TBox and the ABox; in network complexity the relevant input is the BN, while the combined complexity considers the size of the whole input. 4 Probabilistic Query Entailment Different reasoning tasks have been studied in the context of Bayesian DLs; per- haps the most prominent one being the probabilistic entailment [6]. Although, probabilistic entailment has been considered generally, its focus was on entail- ments of simple consequences, i.e. consequences of the form subsumption, in- stance checking etc., all of which are tasks that can be decided in time polynomial in EL. Thus, the class of problems based on entailments of simple consequences has lead tight complexity bounds in BEL [8]. Here we generalize these results and study probabilistic query entailment. In this setting, we are not just interested in the entailment of a query q but also in the probability of such entailment. Definition 1 (probabilistic query entailment). Let K = (B, T , A) be a BEL KB over V and P = (I, P ) a probabilistic interpretation. P defines a probability distribution PP over all conjunctive queries q given by X PP (q) := P (I, V I ). (I,V I )∈I, I|=q The probability of the query q w.r.t. K is PK (q) := inf P|=K PP (q). A query q is entailed with probability p ∈ (0, 1] iff PK (q) ≥ p. Recall that every valuation W defines an EL ontology that contains all the axioms that must be satisfied by any contextual interpretation using the valua- tion W. Given a Boolean CQ q, we can build a probabilistic model Pq = (I, P ) of K such that for every valuation W there is exactly one contextual interpretation IW ∈ I, and it satisfies that IW |= q iff KW |= q. It is easy to see that every other model P of K is such that PP (q) ≥ PPq (q), which yields the following theorem. x x 0.7 y z x 0.7 y x y 0.3 ¬x 0.5 x ¬y 0.1 ¬x y 0 z ¬x ¬y 0.9 Fig. 1: The BN BABC over the variables {x, y, z} P Theorem 2. For every BEL KB K and Boolean CQ q PK (q) = KW |=q PB (W). Given Theorem 2, one can compute the probability of any query by summing up the probabilities of the worlds that entail the query q. We illustrate probabilistic query entailment with a simple example. Example 3. Consider the BEL KB K = ((TABC , AABC ), BABC ) where TABC := { hA v ∃r.B : {y}i , hB v C : {x}i} AABC := { hA(a) : {x}i , hr(a, b) : {z}i , hC(b) : {x, z}i , hA(c) : {y}i} BABC is the BN given in Figure 1 and the Boolean CQ q = {A(χ), r(χ, ψ), C(ψ)}. Clearly, KW |= q only for worlds W such that W |= (x ∧ y) ∨ (x ∧ z). Hence, we get PK (q) = PBABC ((x ∧ y) ∨ (x ∧ z)) = 0.411. Clearly the number of worlds might be exponential in |V |. In fact, this corre- sponds to exponentially many query entailment tests, which can be performed using polynomial space only. Theorem 4. Probabilistic query entailment is polynomial w.r.t. data and ontol- ogy complexity; and in PSpace w.r.t. network and combined complexity. The bounds for network and combined complexity can be improved if we restrict the queries to instance queries only. It is then possible to use a novel structure, called the proof structure such as the one presented in [8]. The general idea is to reduce probabilistic reasoning in BEL knowledge bases to standard inferences in a BN. In essence, a proof structure compactly describes the class of contexts that entail the wanted consequence. Using this proof-structure, it is possible to construct a BN from which the probability of such consequence can be computed. Importantly, it has been shown that such reduction can be performed in polynomial time. In a nutshell, a proof structure is a directed acyclic hyper-graph, in which every node represents an axiom. It is constructed in a bottom up manner with the help of a set of deduction rules. Starting from an initial set of axioms given by the KB, it adds new nodes for the axioms resulting from 1-step application of the deduction rules. Edges are used for denoting the axioms that have been used for the deduction. This process continues until the rules are saturated under the set of axioms. This structure enables us to trace back all the causes for a consequence. Thus, once transformed into a BN, it represents all contexts for a consequence, the probability of which can then be computed via the BN. For the details, we refer to [8]. To provide a better complexity bound for probabilistic query entailment, we extend the proof structure to also handle the assertional knowledge, which was not present so far. Following a naı̈ve approach it is possible to introduce a new set of deduction rules; instead, we make use of nominals to handle the assertions. Briefly, the DL ELO extends EL with nominals; that is, it allows special types of concepts of the form {a} with the semantics {aI }. It is well-known that in the presence of nominals, EL KBs can be represented without an ABox. Thus, for an ELO KB it is possible to benefit from the deduction rules presented in [19] to construct a proof structure. Using the approach in [8] with the new rules given in [19] over an ELO KB, we construct a proof structure for an EL KB K = (T , A) that is guaranteed to contain the information of all possible causes for a consequence to follow from K. Moreover, this hypergraph is acyclic and has polynomially many nodes, on the size of K, by the properties of the rules and their applications. A BEL KB can be transformed into a BELO KB in the obvious way. Let K = {B, T , A} be a BEL KB, we construct the BELO KB K0 = {B, T 0 } where T 0 = T ∪ { h{a} v C : κi | hC(a) : κi ∈ A } ∪ { h{a} v ∃.r{b} : κi | hr(a, b) : κi ∈ A }. Clearly, K |= c iff K0 |= c for any consequence c. Hence, for any ABox assertion, it is possible to construct a proof structure of polynomial size. To check the probability of an instance query C(χ) we construct a BN using the proof structures of C(a) where a is an individual appearing in the ABox. Observe that the number of proof structures is bound with the individuals available in the ABox and we obtain a polynomial construction w.r.t. the size of the input. Together with the hardness of probabilistic entailment of simple consequences in BEL without ABoxes, we get the following result. Lemma 5. Probabilistic query entailment restricted to IQs is PP-complete w.r.t. the combined complexity. 5 Probabilistic Query Answering Query answering is the problem of finding mappings for a query, i.e. one is not just interested whether a query is entailed or not, but also with the witnesses of such entailment. Typically, data is assumed to be large and it is not always very feasible to return all answers to a query q to the user. One of the most important applications of query answering is returning the top-k answers to a given query q w.r.t. a measure. By this way users do not only get a feasible number of answers but also a fine grained view over the data. In the context of probabilities, we are interested in finding the answers that are most likely. Let q be a query with the distinguished variables DV(q), and K = (B, T , A) a BEL KB. We denote by Ind(A) the set of all individual names appearing in A. Recall that every function a : DV(q) → Ind(A) defines a CQ obtained by replacing every χ ∈ DV(q) in q with a(χ). Abusing the notation, we call this query a(q). We call any function a : DV(q) → Ind(A) an answer to q w.r.t. K, and define its probability as PK (a) := PK (a(q)). Since an answer defines a boolean CQ, all complexity results for CQs transfer immediately. Every answer to a query q, has a probability, which we use as a measure to distinguish the answers. We refine the set of answers w.r.t. their probabilities and return top answers only. Definition 6 (top-k answer). Let q be a query, K be a BEL KB, and k ∈ N. A top-k answer to q w.r.t. K is a tuple (a1 , . . . , ak ) of different answers to q w.r.t. K such that (i) for all i, 1 ≤ i < k, PK (ai ) ≥ PK (ai+1 ), and (ii) for every other answer a, PK (ak ) ≥ PK (a). In other words, a top-k answer is an ordered tuple of the k answers with the highest probability. We assume that k is a constant that is fixed a priori. Thus, it is not considered part of the input of the problem. Obviously, since different answers may have the same probability, top-k answers are not unique. Here we are only interested in finding one of them. Stating it as a decision problem, we want to verify whether a given tuple is a top-k answer. Example 7. Consider the BEL KB K = ((TABC , AABC ), BABC ) provided in Example 3 and the query q = {A(χ)} with χ ∈ DV. We are interested in identifying the top-1 answer to q w.r.t. K. Notice that both a0 : χ 7→ a and a1 : χ 7→ c are answers to q with positive probability. Clearly, a0 is the top-1 answer since PK (a0 ) > PK (a1 ). Assuming that the size of q and the BN B are fixed, there are polynomially many answers to q w.r.t. K, and for each answer a, we can compute PB (a) performing polynomially many EL query entailment tests. Thus, it is possible to verify whether (a1 , . . . , ak ) is a top-k answer in polynomial time w.r.t. ontology complexity. If we consider the combined complexity, the problem can be decided as fol- lows. For every answer to the query, we only keep track of those answers that are best by checking the probabilities PB (a) of the individual answers iteratively. Since the latter can be done in PSpace, we obtain an upper bound. Theorem 8. Let A = (a1 , . . . , ak ) be a tuple of answers to q w.r.t. K. Deciding whether A is a top-k answer is polynomial w.r.t. data and ontology complexity, in PSpace w.r.t. network complexity and combined complexity. We show a lower bound for this problem w.r.t. the combined complexity. by providing a reduction from the decision version of the maximum a-posteriori (D-MAP) problem for BNs [11]. Formally, given a BN B over V , a set Q ⊆ V , a context κ, and p > 0, the D-MAP problem consists of deciding whether there exists a valuation µ of the variables in Q such that PB (κ ∧ µ) > p. Consider an arbitrary but fixed instance of D-MAP described by the BN B = ((V, E), Φ), the context κ, Q ⊆ V , and p > 0. We introduce a new Boolean random variable z not appearing in V . Using this variable, we construct a new DAG (V 0 , E) with V 0 = V ∪ {z} and a new BN B 0 = ((V 0 , E), Φ0 ), where PB0 (v | π(v)) = PB (v | π(x)) for all v ∈ V , and PB0 (z) = p. Consider the BEL KB K = (B 0 , ∅, A) where A := {hAx (ax ) : xi , hAx (bx ) : ¬xi , hAx (c) : zi | x ∈ Q} ∪ {hB(a) : κi , hB(c) : zi}, and query q := {Ax (χx ) | x ∈ Q} ∪ {B(χ)}, where all the variables are distin- guished; i.e., DV(q) = {χx | x ∈ Q} ∪ {χ}. It is easy to see that the mapping a0 : DV(q) → {c} is an answer to this query and PK (a0 ) = p. Moreover, any other answer that maps any variable to c will have the probability at most p, since it can only be entailed in contexts satisfying z. Suppose that there is an answer a such that PK (a) > p. This V answer must V map every variable χx to either ax or bx and χ to a. Let µa := a(χx )=ax x ∧ a(χx )=bx ¬x. By construction, µa is a valuation of the variables in Q, PB (κ ∧ µa ) > p, and a(q) is only entailed by valuations satisfying the context κ∧µa . Overall this means that a0 is not a top-1 answer iff there is a valuation µ of the variables in Q such that PB (κ ∧ µ) > p. Theorem 9. Deciding whether a tuple A is a top-k answer is coNPPP -hard w.r.t. combined complexity. Notice that the proof uses a very simple query which is in fact acyclic. Thus, contrary to classical EL [4], restricting to acyclic queries does not suffice for reducing the complexity of reasoning. Clearly, if we consider IQs this hardness might not hold any more. Obtaining most probable answers for a query is a crucial task for the domains where imprecise characterizations of knowledge is necessary. The next section is dedicated to another reasoning task that can be seen dual to top-k answers, namely top-k contexts. 6 Most Likely Contexts for a Query Dually to finding the most likely answers to a query, we are also interested in finding the k most likely contexts that entail a given Boolean query q. More precisely, suppose that we have already observed that the query q holds; then, we are interested in finding out which is the current context. As in the previous section, we do not consider one, but search for a fixed number of contexts that are the most likely to hold. To define this reasoning task formally, we must generalize the notion of the ontology KW defined to consider arbitrary contexts κ, which we denote as Kκ . For any contextual interpretation (I, V I ) with V I |= κ it must hold that I |= Kκ . If Kκ entails the Boolean query q, then we say that q holds in context κ. We are interested in finding out the most likely contexts in which a given query holds. Definition 10 (top-k contexts). Let q be a CQ, K a BEL KB, and k ∈ N. κ1 , . . . , κk are top-k contexts for q w.r.t. K if Kκi entails q for all i, 1 ≤ i ≤ k; PB (κi ) ≥ PB (κi+1 ) for all i, 1 ≤ i ≤ k; and there is no other context κ such that Kκ |= q and PB (κ) > PB (κk ). We illustrate top-k mlc with our continuing example. In this case, we are interested in finding out the 2 most likely context that entail the query. Example 11. Consider the BEL KB K = ((TABC , AABC ), BABC ) and query q provided in Example 3. Clearly all contexts κ that entail q are such that κ |= {x, y}∨{x, z}. The top-2 contexts are then h{x, y}, {x, z}i since PBABC ({x, y}) > PBABC ({x, z}). The problem of finding one most likely context has been studied for simple queries. In those special cases, it was shown to be coNPPP -complete problem w.r.t. combined complexity [8]. The coNPPP upper bound holds also for top-k contexts w.r.t. combined complexity: if a tuple is not a top-k mlc, then guess a new context κ and show using a PP oracle that Kκ |= q and PB (κ) > PB (κk ). If the BN is fixed, then the number of contexts is constant, and they can be ordered w.r.t. their complexity in constant time. The top-k mlc problem is then solved by applying a constant number of EL CQ entailment tests, yielding a polynomial upper bound w.r.t. ontology complexity. All these complexity results are summarized in the following theorem. Theorem 12. Deciding whether κ1 , . . . , κk are top-k mlc for q w.r.t. the KB K is polynomial w.r.t. data, and ontology complexity, PP-hard and in NPPP w.r.t. network complexity, and NPPP -complete w.r.t. combined complexity. Given the hardness of deciding top-k contexts, we consider a special case of this problem: Suppose now that all contexts are of a special form, i.e. they are valuations, we call this problem top-k worlds. In this case, we need to guess a world W and decide whether i) KW |= q and ii) PK (W) > PK (Wk ), where the former requires an NP oracle whereas the latter can be decided in polynomial time using tha standard chain rule of BNs. Notice that, top-k contexts and top-k answers are dual to each other, but they do not necessarily overlap. Consider for instance the case, where all top-k answers to a query q are retrieved from the same context κ. In this case, top-k contexts for q will contain other contexts than κ with the assumption that k > 1. Deciding top-k contexts is particularly informative for cases where the diversity of knowledge is important. We have discussed several reasoning problems in BEL w.r.t. CQs which we considered as natural problems that could arise in several domains. For a sum- mary of the results, see Table 2. 7 Related Work The literature on probabilistic extensions of DLs consists of various formalisms, each of which with different characteristics [20]. Despite the fact that probabilis- tic query answering has been studied widely in relational databases [15, 13, 9], Table 2: BEL reasoning problems and their complexity Problem data ontology network combined probabilistic CQ entailment P P PP-c PP/PSpace probabilistic IQ entailment P P PP-c PP-c. PP top-k answer P P PP/PSpace coNP /PSpace PP PP top-k contexts P P PP/coNP coNP /PSpace top-k worlds P P coNP-c coNP/Π2p RDF graphs [16] and XML databases [1, 17], only few of the probabilistic DLs considered CQA as a reasoning task. In the probabilistic extension of Datalog+/- [14] authors are interested in retrieving the answers that are above a threshold value that is set a priori. In contrast to BEL, in probabilistic Datalog+/- the underlying semantics is based on Markov logic networks. The Prob-DL family [21] extends classical DLs with subjective probabilities, also known as Type II probabilities [18]. The main dif- ference with our logic is that Prob-EL introduces probabilities as a concept con- structor, whereas we allow only probabilities over axioms. More closely related to BEL is BDL-Lite [10]. As is in BEL, BDL-Lite only allows probabilities over axioms and conditional dependencies are represented faithfully. However, as it has been pointed before [8], the authors use a closed world assumption, which easily leads to inconsistencies for the Bayesian extension of EL. 8 Conclusions We have studied probabilistic query entailment, top-k answers and top-k con- texts as reasoning problems. Though not being complete, for each of these prob- lems, we provided a complexity analysis. Moreover, we have shown that assuming that the given BN and query are relatively small, all problems become tractable. Removing this assumption immediately results in the loss of tractability, which is not surprising given the intractability results in BNs and CQA in EL. As a future work, we want to obtain tight bounds w.r.t. all measures provided. We have shown tight complexity bounds for the query entailment problem of IQs. Restricting our attention to IQs, other problems might also get easier under widely accepted assumptions of complexity theory. It should be reminded that this is unfortunately not the case for acyclic queries. On the practical side, we will consider optimizing the reasoning mechanisms and we will implement a system for reasoning in BEL that will benefit both from techniques in DLs, such as module extraction, query processing and from techniques in reasoning with lifted BNs, mainly based on logic programming as in [12]. References 1. Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Proc. of EDBT’06. LNCS, vol. 3896. Springer Verlag (2006) 2. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. of IJCAI’05. Morgan Kaufmann Publishers (2005) 3. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applica- tions. Cambridge University Press, 2nd edn. (2007) 4. Bienvenu, M., Ortiz, M., Šimkus, M., Xiao, G.: Tractable queries for lightweight description logics. In: Proc. of IJCAI’13. AAAI Press (2013) 5. Brandt, S.: Polynomial Time Reasoning in a Description Logic with Existential Restrictions, GCI Axioms, and—What Else? In: Proc. of ECAI’04. vol. 110. IOS Press (2004) 6. Ceylan, İ.İ., Peñaloza, R.: Bayesian Description Logics. In: Proc. of DL’14. CEUR Workshop Proceedings, vol. 1193. CEUR-WS (2014) 7. Ceylan, İ.İ., Peñaloza, R.: The Bayesian Description Logic BEL. In: Proc. of IJ- CAR’14. LNCS, vol. 8562. Springer Verlag (2014) 8. Ceylan, İ.İ., Peñaloza, R.: Tight Complexity Bounds for Reasoning in the Descrip- tion Logic BEL. In: Proc. of JELIA’14. LNCS, vol. 8761. Springer Verlag (2014) 9. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal 16(4) (2007) 10. D’Amato, C., Fanizzi, N., Lukasiewicz, T.: Tractable Reasoning with Bayesian Description Logics. In: Proc. of SUM’08. LNCS, vol. 5291. Springer Verlag (2008) 11. Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge Uni- versity Press (2009) 12. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic prolog and its application in link discovery. In: Proc. of IJCAI’07. Morgan-Kaufmann Pub. (2007) 13. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of in- formation retrieval and database systems. ACM TOIS’97 15(1) (1997) 14. Gottlob, G., Lukasiewicz, T., Martinez, M.V., Simari, G.I.: Query answering under probabilistic uncertainty in datalog +/- ontologies. Ann. Math. AI 69(1) (2013) 15. Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proc. ACM SIGACT-SIGMOD-SIGART’98 (1998) 16. Huang, H., Liu, C.: Query evaluation on probabilistic RDF databases. In: WISE09,. LNCS, vol. 5802. Springer Verlag (2009) 17. Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: A probabilistic semistructured data model and algebra. In: Proc. ICDE’03 (2003) 18. Joseph, H.: An Analysis of First - Order Logics of Probability. In: Proc. of IJCAI’89. Morgan Kaufmann Publishers (1989) 19. Kazakov, Y., Krötzsch, M.R., Simančı́k, F.: Practical Reasoning with Nominals in the EL Family of Description Logics. In: Proc. of KR’12. AAAI Press (2012) 20. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description logics for the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web 6(4), 291–308 (Nov 2008) 21. Lutz, C., Schröder, L.: Probabilistic Description Logics for Subjective Uncertainty. In: Proc. of KR’10. AAAI Press (2010) 22. Lutz, C., Toman, D., Wolter, F.: Conjunctive Query Answering in the Description Logic EL Using a Relational Database System. In: Proc. of IJCAI’09. AAAI (2009) 23. Rosati, R.: On conjunctive query answering in EL. In: Proc. of DL’07. CEUR Workshop Proceedings, vol. 250. CEUR-WS (2007)