Query Answering in Bayesian Description Logics

                                 İsmail İlkan Ceylan?

                Theoretical Computer Science, TU Dresden, Germany
                          ceylan@tcs.inf.tu-dresden.de


        Abstract. The Bayesian Description Logic (BDL) BEL is a probabilistic
        DL, which extends the lightweight DL EL by defining a joint probability
        distribution over EL axioms with the help of a Bayesian network (BN).
        In the recent work, extensions of standard logical reasoning tasks in BEL
        are shown to be reducible to inferences in BNs.
        This work concentrates on a more general reasoning task, namely on
        conjunctive query answering in BEL where every query is associated to
        a probability leading to different reasoning problems. In particular, we
        study the probabilistic query entailment, top-k answers, and top-k con-
        texts as reasoning problems. Our complexity analysis suggests that all
        of these problems are tractable under certain assumptions.


1     Introduction

Description Logics (DLs) [3], as a successful family of knowledge representation
(KR) formalisms, have been employed in various application domains such as
conceptual modeling, databases, bio-medical ontologies, natural language pro-
cessing, configuration, and the semantic web1 . Arguably, all these domains, as is
real world, are subject to imprecision; may it be an assertion about an individual
or a terminological statement, it often comes along with a degree of uncertainty.
    The fact that classical DLs had severe limitations in representing and rea-
soning under uncertainty led to a body of work [20] tailored towards this goal.
Several extensions to DLs have been proposed with different characteristics in
terms of their logical expressivity, their semantics, and their independence as-
sumptions.
    BDLs [6] have been proposed as a means of representing the uncertainty
over DL axioms that are being asserted. In BDLs, every axiom is associated
with a probability, which is encoded with the help of a BN. This family of logics
provides a compact and easy way of encoding probabilities over DL axioms.
Two important features of BDLs are that they do not force any independence
assumptions, and they are based on the so-called multiple world semantics.
    The focus of this work is the DL BEL [7], a Bayesian extension of the
lightweight DL EL [2] for which several probabilistic reasoning tasks have been
?
    Supported by DFG within the Research Training Group “RoSI” (GRK 1907).
1
    http://www.w3.org/TR/owl2-overview/
                       Table 1: Syntax and Semantics of EL
        Name                  Syntax      Semantics
        Top                   >           ∆I
        Conjunction           C uD        C I ∩ DI
        Exist. Rest.          ∃r.C        {d | ∃e ∈ ∆I : (d, e) ∈ rI , e ∈ C I }
        GCI                   CvD         C I ⊆ DI
        Concept assertion     C(a)        aI ∈ C I
        Role assertion        r(a, b)     (aI , bI ) ∈ rI


studied such as the probabilistic entailment, or finding most likely context (sub-
ontology) for an entailment. In fact, tight complexity bounds have been obtained
for these problems [8].
    Nevertheless, problems related to query answering, and in particular con-
junctive query (CQ) answering, has not been studied in the context of BDLs,
so far. In this paper, we close this gap and focus on i) probabilistic query en-
tailment: “What is the probability of a query to be entailed?” ii) probabilistic
query answering: “What are the top-k answers to a query?” and finally iii) the
most likely context: “What are the top-k contexts that entail a query?”
    Consequently, we argue that these problems generalize the reasoning prob-
lems that have been considered so far. Unsurprisingly, reasoning in BEL is in-
tractible as is CQ answering in EL and inference in BNs. Further analysis shows
that tractability can be regained by fixing the BN and the query.


2   Conjunctive Query Answering in EL

We briefly review the DL EL [5] and query answering in EL, which constitute the
basis of this paper. Formally, let NI , NC and NR be disjoint sets of individual-,
concept- and role-names, respectively. EL concept language is defined by the
grammar rule C ::= A | > | C u C | ∃r.C, where A ∈ NC and r ∈ NR .
    The semantics of EL is given by an interpretation: that is a tuple I = (∆I , ·I )
where ∆I is a non-empty domain and ·I is an interpretation function that maps
every individual name a to an element aI ∈ ∆I ; every concept name A to a
set AI ⊆ ∆I and every role name r to a binary relation rI ⊆ ∆I × ∆I . The
interpretation function ·I is extended to EL concepts as shown in the upper part
of Table 1.
    The domain knowledge is encoded through a set of axioms, which restrict the
interpretation domain of the concepts. A TBox T is a finite set of general concept
inclusions (GCIs) of the form C v D, where C, D are concepts. An ABox is a
finite set of concept assertions C(a) and role assertions r(a, b), where a, b ∈ NI ,
C is a concept and r ∈ NR . A knowledge base is a pair K = (T , A) where T is
a TBox and A is an ABox. We use the term axiom as a general expression for
GCIs and assertions.
    The interpretation I satisfies an axiom λ iff it satisfies the conditions on the
lower part of Table 1. It is a model of the TBox T if it satisfies all GCIs in T and
a model of the ABox A if it satisfies all the assertions in A. An interpretation is
a model of the KB K = (T , A) iff it is a model of both T and A. For the rest of
this paper we will denote as NI (A) the set of all individual names that appear
in the ABox A.
    CQA is an important reasoning task for DLs that has been investigated in
the context of EL. Let NV be a set of variables disjoint from NC , NR , and NI .
An atom is an expression of the form A(χ) or r(χ, ψ), where A ∈ NC , r ∈ NR ,
and χ, ψ ∈ NI ∪ NV. A conjunctive query (CQ) q is a non-empty set of atoms
associated to a set DV(q) ⊆ NV of distinguished variables. If DV(q) = ∅, then q
is called a Boolean CQ. A special case of a CQ is an instance query (IQ), which
consists of only one atom A(χ) with A ∈ NC .
    Let q be a Boolean CQ and IV(q) be the set of all individual names and
variables appearing in q. The interpretation I satisfies q if there exists a function
π : IV(q) → ∆I such that (i) π(a) = aI for all a ∈ NI ∩IV(q), (ii) π(χ) ∈ AI for all
A(χ) ∈ q, and (iii) (π(χ), π(ψ)) ∈ rI for all r(χ, ψ) ∈ q. In this case, we call π a
match for I and q. The ontology O entails q (O |= q) iff every model of O satisfies
q. For an arbitrary CQ q, a function a : DV(q) → NI (A) is an answer to q w.r.t.
O iff O entails the Boolean CQ a(q) obtained by replacing every distinguished
variable χ ∈ DV(q) with a(χ). Conjunctive query answering (CQA) is the task
of finding all answers of a CQ, and query entailment is the problem of deciding
whether an ontology entails a given Boolean CQ by replacing every distinguished
variable χ ∈ DV(q) with a(χ).
    It is well known that query entailment in EL is polynomial w.r.t. data and
KB complexity, but NP-complete w.r.t. combined complexity [23]. Notice that,
EL does not enjoy the so-called full first order rewritability which has been
considered as a key feature for CQA, since it allows one to reduce the problem
to standard tasks in Relational Database Management Systems (RDMSs). Yet,
CQA in EL can be successfully employed using a combined approach as described
in [22].


3   The Bayesian Description Logic BEL

The Bayesian DL BEL [7] has been introduced as a probabilistic extension of
the light-weight DL EL. In BEL probabilities are encoded through a Bayesian
network (BN) [11]; that is, a pair B = (G, Φ), where G = (V, E) is a finite
directed acyclic graph (DAG) whose nodes represent (boolean) random variables,
and Φ contains, for every node x ∈ V , a conditional probability distribution
PB (x | π(x)) of x given its parents π(x). If V is the set of nodes in G, we say
that B is a BN over V .
    BNs are widely studied probabilistic graphical models where the underlying
graph G = (V, E) encodes a series of conditional independence assumptions
between the random variables. Every variable x ∈ V is known to be conditionally
independent of its non-descendants given its parents. Thus, every BN B defines
a unique joint probability distribution (JPD) over V given by
                                      Y
                           PB (V ) =     PB (x | π(x)).
                                       x∈V

    The concept language of BEL is the same as the EL concept language. The
difference appears in encoding the domain knowledge, i.e. in forming axioms.
BEL generalizes classical TBoxes (resp. ABoxes) by annotating the GCIs (resp.
assertions) with a context defined by a set of literals belonging to a BN.
    Formally, let NI be a set of individual names and V a finite set of boolean
variables. A V -context is a conjunction of literals over V . A V -restricted general
concept inclusion (V -GCI) is an expression of the form hC v D : κi where C, D
are BEL concepts and κ is a V -context. A V -restricted assertion (V -assertion) is
an expression of the form hC(a) : κi, or hr(a, b) : κi where a, b ∈ NI , C, D are BEL
concepts and κ is a V -context. A V -TBox (resp.V -ABox) is a finite set of V -GCIs
(resp.V -assertions). A BEL knowledge base (KB) is a tuple K = (B, T , A) where
B is a BN over V , T is a V -TBox and A is a V -ABox.
    We will sometimes speak of contextual axioms to address both V -GCIs and
V -assertions. The intuition behind the contextual axioms is to enforce an axiom
to hold within a given context, but not necessarily in others. The semantic of such
axioms is realized with the so-called contextual interpretations, which differently
from the classical interpretations also evaluate the context variables. Formally,
given a finite set of Boolean variables V , (I, V I ) is a contextual interpretation
where V I is a propositional interpretation over V , and I = (∆I , ·I ) is a classical
EL interpretation. We will usually ignore the prefix and speak simply of e.g. a
KB, a TBox, an ABox, or an interpretation.
    The interpretation function ·I is extended to arbitrary BEL concepts as in
EL, i.e. using the rules in Table 1. We say that the contextual interpretation
(I, V I ) is a model of an axiom hλ : κi denoted as (I, V I ) |= hλ : κi, iff either (i)
V I 6|= κ, or (ii) I |= λ. It is a model of the TBox T (resp. ABox A) iff it is a
model of all the axioms in T (resp. A).
    A contextual interpretation (I, V I ) needs to satisfy only the axioms asserted
within a context κ for which it holds that V I |= κ. Formally, let K = (B, T , A)
be a BEL KB: Given a contextual interpretation (I, V I ) where V I = W, we
define the EL KB KW = (TW , AW ) that needs to be satisfied by I as:

 TW := {C v D | hC v D : ϕi ∈ T , W |= ϕ},
 AW := {C(a) | hC(a) : ϕi ∈ A, W |= ϕ} ∪ {r(a, b) | hr(a, b) : ϕi ∈ A, W |= ϕ}.

In BEL, uncertainty is represented through a BN that describes a joint proba-
bility distribution over the context variables. Semantically, BEL is linked to this
distribution with the so called multiple world semantics: A probabilistic inter-
pretation defines a probability distribution over a set of (contextual) interpre-
tations; this distribution is required to be consistent with the joint probability
distribution provided by the BN. Formally, a probabilistic interpretation is a pair
P = (I, PI ), where I is a set of contextual interpretations and PI is a probability
distribution over I such that PI (I, V I ) > 0 only for finitely many interpretations
(I, V I ) ∈ I. P is a model of the TBox T (resp. ABox A) if every (I, V I ) ∈ I
is a model of T (resp. A). P is consistent with the BN B if for every possible
valuation W of the variables in V it holds that
                            X
                                 PI (I, V I ) = PB (W).
                      (I,V I )∈I, V I =W


The probabilistic interpretation P is a model of the KB (B, T , A) iff it is a
(probabilistic) model of T , A and consistent with B.
   To provide a fine-grained analysis of the complexity of reasoning in BEL, we
use different measures for the size of the input. In data complexity, we measure
only the size of the ABox, and consider the rest of the KB and the query fixed.
For ontology complexity we use the size of the TBox and the ABox; in network
complexity the relevant input is the BN, while the combined complexity considers
the size of the whole input.


4   Probabilistic Query Entailment

Different reasoning tasks have been studied in the context of Bayesian DLs; per-
haps the most prominent one being the probabilistic entailment [6]. Although,
probabilistic entailment has been considered generally, its focus was on entail-
ments of simple consequences, i.e. consequences of the form subsumption, in-
stance checking etc., all of which are tasks that can be decided in time polynomial
in EL. Thus, the class of problems based on entailments of simple consequences
has lead tight complexity bounds in BEL [8].
    Here we generalize these results and study probabilistic query entailment. In
this setting, we are not just interested in the entailment of a query q but also in
the probability of such entailment.

Definition 1 (probabilistic query entailment). Let K = (B, T , A) be a BEL
KB over V and P = (I, P ) a probabilistic interpretation. P defines a probability
distribution PP over all conjunctive queries q given by
                                      X
                        PP (q) :=              P (I, V I ).
                                    (I,V I )∈I, I|=q


The probability of the query q w.r.t. K is PK (q) := inf P|=K PP (q). A query q is
entailed with probability p ∈ (0, 1] iff PK (q) ≥ p.

    Recall that every valuation W defines an EL ontology that contains all the
axioms that must be satisfied by any contextual interpretation using the valua-
tion W. Given a Boolean CQ q, we can build a probabilistic model Pq = (I, P ) of
K such that for every valuation W there is exactly one contextual interpretation
IW ∈ I, and it satisfies that IW |= q iff KW |= q. It is easy to see that every
other model P of K is such that PP (q) ≥ PPq (q), which yields the following
theorem.
                                                   x
                                             x
                                                  0.7
                   y                                                     z

             x    0.7     y                                   x y       0.3
            ¬x    0.5                                         x ¬y      0.1
                                                              ¬x y       0
                                                        z     ¬x ¬y     0.9

                 Fig. 1: The BN BABC over the variables {x, y, z}

                                                                      P
Theorem 2. For every BEL KB K and Boolean CQ q PK (q) =                  KW |=q PB (W).

Given Theorem 2, one can compute the probability of any query by summing up
the probabilities of the worlds that entail the query q. We illustrate probabilistic
query entailment with a simple example.

Example 3. Consider the BEL KB K = ((TABC , AABC ), BABC ) where

        TABC := { hA v ∃r.B : {y}i , hB v C : {x}i}
       AABC := { hA(a) : {x}i , hr(a, b) : {z}i , hC(b) : {x, z}i , hA(c) : {y}i}

BABC is the BN given in Figure 1 and the Boolean CQ q = {A(χ), r(χ, ψ), C(ψ)}.
Clearly, KW |= q only for worlds W such that W |= (x ∧ y) ∨ (x ∧ z). Hence, we
get PK (q) = PBABC ((x ∧ y) ∨ (x ∧ z)) = 0.411.

Clearly the number of worlds might be exponential in |V |. In fact, this corre-
sponds to exponentially many query entailment tests, which can be performed
using polynomial space only.
Theorem 4. Probabilistic query entailment is polynomial w.r.t. data and ontol-
ogy complexity; and in PSpace w.r.t. network and combined complexity.
    The bounds for network and combined complexity can be improved if we
restrict the queries to instance queries only. It is then possible to use a novel
structure, called the proof structure such as the one presented in [8]. The general
idea is to reduce probabilistic reasoning in BEL knowledge bases to standard
inferences in a BN. In essence, a proof structure compactly describes the class
of contexts that entail the wanted consequence. Using this proof-structure, it
is possible to construct a BN from which the probability of such consequence
can be computed. Importantly, it has been shown that such reduction can be
performed in polynomial time.
    In a nutshell, a proof structure is a directed acyclic hyper-graph, in which
every node represents an axiom. It is constructed in a bottom up manner with
the help of a set of deduction rules. Starting from an initial set of axioms given
by the KB, it adds new nodes for the axioms resulting from 1-step application
of the deduction rules. Edges are used for denoting the axioms that have been
used for the deduction. This process continues until the rules are saturated under
the set of axioms. This structure enables us to trace back all the causes for a
consequence. Thus, once transformed into a BN, it represents all contexts for a
consequence, the probability of which can then be computed via the BN. For
the details, we refer to [8].
    To provide a better complexity bound for probabilistic query entailment, we
extend the proof structure to also handle the assertional knowledge, which was
not present so far. Following a naı̈ve approach it is possible to introduce a new
set of deduction rules; instead, we make use of nominals to handle the assertions.
    Briefly, the DL ELO extends EL with nominals; that is, it allows special types
of concepts of the form {a} with the semantics {aI }. It is well-known that in the
presence of nominals, EL KBs can be represented without an ABox. Thus, for
an ELO KB it is possible to benefit from the deduction rules presented in [19]
to construct a proof structure. Using the approach in [8] with the new rules
given in [19] over an ELO KB, we construct a proof structure for an EL KB
K = (T , A) that is guaranteed to contain the information of all possible causes
for a consequence to follow from K. Moreover, this hypergraph is acyclic and has
polynomially many nodes, on the size of K, by the properties of the rules and
their applications.
    A BEL KB can be transformed into a BELO KB in the obvious way. Let
K = {B, T , A} be a BEL KB, we construct the BELO KB K0 = {B, T 0 } where

             T 0 = T ∪ { h{a} v C : κi | hC(a) : κi ∈ A }
                       ∪ { h{a} v ∃.r{b} : κi | hr(a, b) : κi ∈ A }.

   Clearly, K |= c iff K0 |= c for any consequence c. Hence, for any ABox
assertion, it is possible to construct a proof structure of polynomial size. To
check the probability of an instance query C(χ) we construct a BN using the
proof structures of C(a) where a is an individual appearing in the ABox. Observe
that the number of proof structures is bound with the individuals available in
the ABox and we obtain a polynomial construction w.r.t. the size of the input.
   Together with the hardness of probabilistic entailment of simple consequences
in BEL without ABoxes, we get the following result.
Lemma 5. Probabilistic query entailment restricted to IQs is PP-complete w.r.t.
the combined complexity.


5   Probabilistic Query Answering
Query answering is the problem of finding mappings for a query, i.e. one is not
just interested whether a query is entailed or not, but also with the witnesses of
such entailment. Typically, data is assumed to be large and it is not always very
feasible to return all answers to a query q to the user. One of the most important
applications of query answering is returning the top-k answers to a given query q
w.r.t. a measure. By this way users do not only get a feasible number of answers
but also a fine grained view over the data. In the context of probabilities, we are
interested in finding the answers that are most likely.
   Let q be a query with the distinguished variables DV(q), and K = (B, T , A)
a BEL KB. We denote by Ind(A) the set of all individual names appearing in
A. Recall that every function a : DV(q) → Ind(A) defines a CQ obtained by
replacing every χ ∈ DV(q) in q with a(χ). Abusing the notation, we call this
query a(q). We call any function a : DV(q) → Ind(A) an answer to q w.r.t.
K, and define its probability as PK (a) := PK (a(q)). Since an answer defines a
boolean CQ, all complexity results for CQs transfer immediately. Every answer
to a query q, has a probability, which we use as a measure to distinguish the
answers. We refine the set of answers w.r.t. their probabilities and return top
answers only.

Definition 6 (top-k answer). Let q be a query, K be a BEL KB, and k ∈ N.
A top-k answer to q w.r.t. K is a tuple (a1 , . . . , ak ) of different answers to q
w.r.t. K such that (i) for all i, 1 ≤ i < k, PK (ai ) ≥ PK (ai+1 ), and (ii) for every
other answer a, PK (ak ) ≥ PK (a).

In other words, a top-k answer is an ordered tuple of the k answers with the
highest probability. We assume that k is a constant that is fixed a priori. Thus,
it is not considered part of the input of the problem. Obviously, since different
answers may have the same probability, top-k answers are not unique. Here we
are only interested in finding one of them. Stating it as a decision problem, we
want to verify whether a given tuple is a top-k answer.

Example 7. Consider the BEL KB K = ((TABC , AABC ), BABC ) provided in Example 3
and the query q = {A(χ)} with χ ∈ DV. We are interested in identifying the top-1
answer to q w.r.t. K. Notice that both a0 : χ 7→ a and a1 : χ 7→ c are answers to q
with positive probability. Clearly, a0 is the top-1 answer since PK (a0 ) > PK (a1 ).

    Assuming that the size of q and the BN B are fixed, there are polynomially
many answers to q w.r.t. K, and for each answer a, we can compute PB (a)
performing polynomially many EL query entailment tests. Thus, it is possible to
verify whether (a1 , . . . , ak ) is a top-k answer in polynomial time w.r.t. ontology
complexity.
    If we consider the combined complexity, the problem can be decided as fol-
lows. For every answer to the query, we only keep track of those answers that
are best by checking the probabilities PB (a) of the individual answers iteratively.
Since the latter can be done in PSpace, we obtain an upper bound.

Theorem 8. Let A = (a1 , . . . , ak ) be a tuple of answers to q w.r.t. K. Deciding
whether A is a top-k answer is polynomial w.r.t. data and ontology complexity,
in PSpace w.r.t. network complexity and combined complexity.

    We show a lower bound for this problem w.r.t. the combined complexity.
by providing a reduction from the decision version of the maximum a-posteriori
(D-MAP) problem for BNs [11]. Formally, given a BN B over V , a set Q ⊆ V ,
a context κ, and p > 0, the D-MAP problem consists of deciding whether there
exists a valuation µ of the variables in Q such that PB (κ ∧ µ) > p.
   Consider an arbitrary but fixed instance of D-MAP described by the BN
B = ((V, E), Φ), the context κ, Q ⊆ V , and p > 0. We introduce a new Boolean
random variable z not appearing in V . Using this variable, we construct a new
DAG (V 0 , E) with V 0 = V ∪ {z} and a new BN B 0 = ((V 0 , E), Φ0 ), where PB0 (v |
π(v)) = PB (v | π(x)) for all v ∈ V , and PB0 (z) = p. Consider the BEL KB
K = (B 0 , ∅, A) where

            A := {hAx (ax ) : xi , hAx (bx ) : ¬xi , hAx (c) : zi | x ∈ Q} ∪
                  {hB(a) : κi , hB(c) : zi},

and query q := {Ax (χx ) | x ∈ Q} ∪ {B(χ)}, where all the variables are distin-
guished; i.e., DV(q) = {χx | x ∈ Q} ∪ {χ}. It is easy to see that the mapping
a0 : DV(q) → {c} is an answer to this query and PK (a0 ) = p. Moreover, any
other answer that maps any variable to c will have the probability at most p,
since it can only be entailed in contexts satisfying z. Suppose that there is an
answer a such that PK (a) > p. This
                                 V answer must  V map every variable χx to either
ax or bx and χ to a. Let µa := a(χx )=ax x ∧ a(χx )=bx ¬x. By construction, µa
is a valuation of the variables in Q, PB (κ ∧ µa ) > p, and a(q) is only entailed by
valuations satisfying the context κ∧µa . Overall this means that a0 is not a top-1
answer iff there is a valuation µ of the variables in Q such that PB (κ ∧ µ) > p.
Theorem 9. Deciding whether a tuple A is a top-k answer is coNPPP -hard
w.r.t. combined complexity.
   Notice that the proof uses a very simple query which is in fact acyclic. Thus,
contrary to classical EL [4], restricting to acyclic queries does not suffice for
reducing the complexity of reasoning. Clearly, if we consider IQs this hardness
might not hold any more.
   Obtaining most probable answers for a query is a crucial task for the domains
where imprecise characterizations of knowledge is necessary. The next section is
dedicated to another reasoning task that can be seen dual to top-k answers,
namely top-k contexts.


6   Most Likely Contexts for a Query

Dually to finding the most likely answers to a query, we are also interested in
finding the k most likely contexts that entail a given Boolean query q. More
precisely, suppose that we have already observed that the query q holds; then,
we are interested in finding out which is the current context. As in the previous
section, we do not consider one, but search for a fixed number of contexts that
are the most likely to hold.
    To define this reasoning task formally, we must generalize the notion of the
ontology KW defined to consider arbitrary contexts κ, which we denote as Kκ .
For any contextual interpretation (I, V I ) with V I |= κ it must hold that I |= Kκ .
If Kκ entails the Boolean query q, then we say that q holds in context κ. We are
interested in finding out the most likely contexts in which a given query holds.
Definition 10 (top-k contexts). Let q be a CQ, K a BEL KB, and k ∈ N.
κ1 , . . . , κk are top-k contexts for q w.r.t. K if Kκi entails q for all i, 1 ≤ i ≤ k;
PB (κi ) ≥ PB (κi+1 ) for all i, 1 ≤ i ≤ k; and there is no other context κ such that
Kκ |= q and PB (κ) > PB (κk ).
    We illustrate top-k mlc with our continuing example. In this case, we are
interested in finding out the 2 most likely context that entail the query.
Example 11. Consider the BEL KB K = ((TABC , AABC ), BABC ) and query q provided
in Example 3. Clearly all contexts κ that entail q are such that κ |= {x, y}∨{x, z}.
The top-2 contexts are then h{x, y}, {x, z}i since PBABC ({x, y}) > PBABC ({x, z}).
The problem of finding one most likely context has been studied for simple
queries. In those special cases, it was shown to be coNPPP -complete problem
w.r.t. combined complexity [8]. The coNPPP upper bound holds also for top-k
contexts w.r.t. combined complexity: if a tuple is not a top-k mlc, then guess a
new context κ and show using a PP oracle that Kκ |= q and PB (κ) > PB (κk ).
If the BN is fixed, then the number of contexts is constant, and they can be
ordered w.r.t. their complexity in constant time. The top-k mlc problem is then
solved by applying a constant number of EL CQ entailment tests, yielding a
polynomial upper bound w.r.t. ontology complexity. All these complexity results
are summarized in the following theorem.
Theorem 12. Deciding whether κ1 , . . . , κk are top-k mlc for q w.r.t. the KB K
is polynomial w.r.t. data, and ontology complexity, PP-hard and in NPPP w.r.t.
network complexity, and NPPP -complete w.r.t. combined complexity.
    Given the hardness of deciding top-k contexts, we consider a special case of
this problem: Suppose now that all contexts are of a special form, i.e. they are
valuations, we call this problem top-k worlds. In this case, we need to guess a
world W and decide whether i) KW |= q and ii) PK (W) > PK (Wk ), where the
former requires an NP oracle whereas the latter can be decided in polynomial
time using tha standard chain rule of BNs.
    Notice that, top-k contexts and top-k answers are dual to each other, but
they do not necessarily overlap. Consider for instance the case, where all top-k
answers to a query q are retrieved from the same context κ. In this case, top-k
contexts for q will contain other contexts than κ with the assumption that k > 1.
Deciding top-k contexts is particularly informative for cases where the diversity
of knowledge is important.
    We have discussed several reasoning problems in BEL w.r.t. CQs which we
considered as natural problems that could arise in several domains. For a sum-
mary of the results, see Table 2.


7    Related Work
The literature on probabilistic extensions of DLs consists of various formalisms,
each of which with different characteristics [20]. Despite the fact that probabilis-
tic query answering has been studied widely in relational databases [15, 13, 9],
               Table 2: BEL reasoning problems and their complexity
            Problem               data   ontology    network         combined
    probabilistic CQ entailment    P        P         PP-c          PP/PSpace
    probabilistic IQ entailment    P        P         PP-c                PP-c.
                                                                      PP
           top-k answer            P        P       PP/PSpace      coNP    /PSpace
                                                              PP      PP
          top-k contexts           P        P       PP/coNP        coNP    /PSpace
           top-k worlds            P        P        coNP-c          coNP/Π2p


RDF graphs [16] and XML databases [1, 17], only few of the probabilistic DLs
considered CQA as a reasoning task.
    In the probabilistic extension of Datalog+/- [14] authors are interested in
retrieving the answers that are above a threshold value that is set a priori. In
contrast to BEL, in probabilistic Datalog+/- the underlying semantics is based
on Markov logic networks. The Prob-DL family [21] extends classical DLs with
subjective probabilities, also known as Type II probabilities [18]. The main dif-
ference with our logic is that Prob-EL introduces probabilities as a concept con-
structor, whereas we allow only probabilities over axioms. More closely related
to BEL is BDL-Lite [10]. As is in BEL, BDL-Lite only allows probabilities over
axioms and conditional dependencies are represented faithfully. However, as it
has been pointed before [8], the authors use a closed world assumption, which
easily leads to inconsistencies for the Bayesian extension of EL.


8      Conclusions

We have studied probabilistic query entailment, top-k answers and top-k con-
texts as reasoning problems. Though not being complete, for each of these prob-
lems, we provided a complexity analysis. Moreover, we have shown that assuming
that the given BN and query are relatively small, all problems become tractable.
Removing this assumption immediately results in the loss of tractability, which
is not surprising given the intractability results in BNs and CQA in EL.
    As a future work, we want to obtain tight bounds w.r.t. all measures provided.
We have shown tight complexity bounds for the query entailment problem of IQs.
Restricting our attention to IQs, other problems might also get easier under
widely accepted assumptions of complexity theory. It should be reminded that
this is unfortunately not the case for acyclic queries.
    On the practical side, we will consider optimizing the reasoning mechanisms
and we will implement a system for reasoning in BEL that will benefit both
from techniques in DLs, such as module extraction, query processing and from
techniques in reasoning with lifted BNs, mainly based on logic programming as
in [12].
References
 1. Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in
    XML. In: Proc. of EDBT’06. LNCS, vol. 3896. Springer Verlag (2006)
 2. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. of IJCAI’05.
    Morgan Kaufmann Publishers (2005)
 3. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.
    (eds.): The Description Logic Handbook: Theory, Implementation, and Applica-
    tions. Cambridge University Press, 2nd edn. (2007)
 4. Bienvenu, M., Ortiz, M., Šimkus, M., Xiao, G.: Tractable queries for lightweight
    description logics. In: Proc. of IJCAI’13. AAAI Press (2013)
 5. Brandt, S.: Polynomial Time Reasoning in a Description Logic with Existential
    Restrictions, GCI Axioms, and—What Else? In: Proc. of ECAI’04. vol. 110. IOS
    Press (2004)
 6. Ceylan, İ.İ., Peñaloza, R.: Bayesian Description Logics. In: Proc. of DL’14. CEUR
    Workshop Proceedings, vol. 1193. CEUR-WS (2014)
 7. Ceylan, İ.İ., Peñaloza, R.: The Bayesian Description Logic BEL. In: Proc. of IJ-
    CAR’14. LNCS, vol. 8562. Springer Verlag (2014)
 8. Ceylan, İ.İ., Peñaloza, R.: Tight Complexity Bounds for Reasoning in the Descrip-
    tion Logic BEL. In: Proc. of JELIA’14. LNCS, vol. 8761. Springer Verlag (2014)
 9. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB
    Journal 16(4) (2007)
10. D’Amato, C., Fanizzi, N., Lukasiewicz, T.: Tractable Reasoning with Bayesian
    Description Logics. In: Proc. of SUM’08. LNCS, vol. 5291. Springer Verlag (2008)
11. Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge Uni-
    versity Press (2009)
12. De Raedt, L., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic prolog and its
    application in link discovery. In: Proc. of IJCAI’07. Morgan-Kaufmann Pub. (2007)
13. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of in-
    formation retrieval and database systems. ACM TOIS’97 15(1) (1997)
14. Gottlob, G., Lukasiewicz, T., Martinez, M.V., Simari, G.I.: Query answering under
    probabilistic uncertainty in datalog +/- ontologies. Ann. Math. AI 69(1) (2013)
15. Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proc.
    ACM SIGACT-SIGMOD-SIGART’98 (1998)
16. Huang, H., Liu, C.: Query evaluation on probabilistic RDF databases. In: WISE09,.
    LNCS, vol. 5802. Springer Verlag (2009)
17. Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: A probabilistic semistructured
    data model and algebra. In: Proc. ICDE’03 (2003)
18. Joseph, H.: An Analysis of First - Order Logics of Probability. In: Proc. of IJCAI’89.
    Morgan Kaufmann Publishers (1989)
19. Kazakov, Y., Krötzsch, M.R., Simančı́k, F.: Practical Reasoning with Nominals in
    the EL Family of Description Logics. In: Proc. of KR’12. AAAI Press (2012)
20. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in description
    logics for the Semantic Web. Web Semantics: Science, Services and Agents on the
    World Wide Web 6(4), 291–308 (Nov 2008)
21. Lutz, C., Schröder, L.: Probabilistic Description Logics for Subjective Uncertainty.
    In: Proc. of KR’10. AAAI Press (2010)
22. Lutz, C., Toman, D., Wolter, F.: Conjunctive Query Answering in the Description
    Logic EL Using a Relational Database System. In: Proc. of IJCAI’09. AAAI (2009)
23. Rosati, R.: On conjunctive query answering in EL. In: Proc. of DL’07. CEUR
    Workshop Proceedings, vol. 250. CEUR-WS (2007)