-

Answering EL Queries in the Presence of Preferences

İsmail İlkan Ceylan

ceylan@tcs.inf.tu-dresden.de 2

Thomas Lukasiewicz

Thomas.Lukasiewicz@cs.ox.ac.uk 0

Rafael Peñaloza

rafael.penaloza@unibz.it 1 0 Department of Computer Science, University of Oxford , UK 1 KRDB Research Centre, Free University of Bozen-Bolzano , Italy 2 Theoretical Computer Science , TU Dresden , Germany

Conjunctive query (CQ) answering is an important reasoning task in description logics (DLs). Its goal is to retrieve the tuples of individuals that satisfy a conjunctive query; i.e., a finite set of atomic queries. These tuples are called answers. Clearly, a given CQ may have a considerable number of answers, specially if the set of individual names appearing in the ABox is large, as is the case for many existing DL ontologies. In order to manage all these answers in a structural manner, one can try to extend query answering with preference criteria, in such a way that the most preferred answers are returned first. Possibilistic networks (PNs) have arisen as a way of representing conditional preferences over a finite set of events in a compact way [1]. The general idea is to provide a possibility degree to each conditional event which is proportional to the preference given to that event. We apply this idea to model the preferences of query answers indirectly, by modeling the preferences over the contexts that entail them. In a nutshell, we divide an EL knowledge base (KB) into contexts, and use a possibilistic network to describe the joint possibility distribution over these contexts. Our formalism is based on ideas previously presented for reasoning under probabilistic uncertainty described by a Bayesian network [3]. The preference of an answer to the query is the possibility degree of the best context that entails this answer. Dually, we also compute, given a query, the most preferred source; that is, the context with the highest degree that entails this query. Similar to Bayesian networks [4], PNs are graphical models providing a compact representation of a discrete possibility distribution, through some independence assumptions [2]. A possibility distribution over a set is a function Pos : ! [0; 1] that intuitively provides a degree of how possible is an event ! 2 to happen. This function is extended to sets by defining Pos( ) = sup!2 Pos(!). The product conditional distribution which is defined by the equation Pos( \ ) = Pos( j ) Pos( ). Possibilistic networks decompose a possibility distribution into a product of conditional probability distributions that depend on the structure of a graph. A

? Supported by DFG within the Research Training Group “RoSI” (GRK 1907). ?? The work was developed while the author was still affiliated with TU Dresden and the author has been partially supported by the Cluster of Excellence “cfAED”. x :x y :y possibilistic network (PN) is a pair P = (G; ), where G = (V; E) is a DAG, and contains a conditional possibility distribution PosP (x j pa(x)) of every variable x 2 V given its parents pa(x) (see Figure 1). This PN defines the joint possibility distribution over the valuations of the variables in V

PosP (V ) = Y PosP (x j pa(x)):

x2V

Let V be a fixed but arbitrary finite set of propositional variables. A V -context is a propositional formula over V . A V -GCI is of the form hC v D : 'i with C; D concepts and ' a V -context. A V -TBox is a finite set of V -GCIs. V -assertions are of the form hC(a) : 'i or hr(a; b) : 'i where r 2 NC, a; b 2 NI, C is a concept and ' is a V -context. A V -ABox is a finite set of V -assertions. A PEL KB is a tuple K = (P; T ; A) where P is a PN over V , T is a V -TBox and A is a V -ABox.

The semantics of this logic is defined using multiple worlds. A contextual interpretation is a pair (I; W) where I is an EL interpretation and W is a valuation of the variables in V . (I; W) satisfies the axiom h : 'i ((I; W) j= h : 'i), iff either (i) W 6j= ', or (ii) I j= . It is a model of the PEL TBox T (resp. ABox A) iff it satisfies all the axioms in T (resp. A). A possibilistic interpretation is a pair P = (J; Pos), where J is a finite set of contextual interpretations and Pos is a possibility distribution over I. P is a model of the PEL TBox T (resp. ABox A) if every (I; W) 2 J is a model of T (resp. A). P is a model of the PN P if for every valuation W,

max Pos(I; W) = PosP (W): (I;W)2J P is a model of the PEL KB K = (P; T ; A) iff it is a model of T , A, and P.

Each possibilistic interpretation P = (I; Pos) defines a possibility distribution PosP over all CQs given by PosP(q) := max(I;W)2I; Ij=qfPos(I; W)g: The entailment degree of q w.r.t. the PEL KB K is

PosK(q) := Pinj=fKfPosP(q)g: These possibility distributions are extended to contexts in the obvious way, by setting PosP(') := PosP (') = maxWj=' PosP (W). We can then define the conditional possibilities of a query given a context, and of a context given a query, using the standard product rule. Formally,

PosK(q ^ ') = PosK(q j ')PosK(');

PosK(q ^ ') = PosK(' j q)PosK(q); where

PosK(q ^ ') =

inf max (I;Pos)j=K Ij=q; Wj=

Pos(I; W)g :

We consider three main reasoning problems in this setting; namely, deciding p-entailment, retrieving the top-k answers to a query, and the k most preferred worlds entailing a given query. We formally define these problems next. The problem of p-entailment refers to deciding whether PosK(q) p for some given p 2 (0; 1]. The top-k answer problem consists in deciding whether a tuple (a1; : : : ; ak) of different answers to q w.r.t. K is such that (i) for all i; 1 i < k, PosK(ai) PosK(ai+1), and (ii) for every other answer a, PosK(ak) PosK(a). This problem can be generalized to consider additional contextual evidence; that is, verify whether (a1; : : : ; ak) are the top-k answers to q given the context '. Finally, the k most preferred worlds problem is the problem of deciding whether a tuple of k valuations of the variables V (W1; : : : ; Wk) is such that PosK(Wi j q) PosK(Wi+1 j q) holds for all i; 1 i < k, and there exists no other valuation W such that PosK(W j q) > PosK(Wk j q).

The complexity of all these problems is summarized in Table 1, where network complexity refers to the complexity considering only the size of the PN as input, KB complexity considers the size of the ABox and TBox, while combined complexity considers the whole KB together with the PN and the query as the size of the input. As it can be seen, all the problems remain tractable w.r.t. data and KB complexity, but the complexity increases as soon as the PN or the query is considered part of the input. This corresponds to the behaviour exhibited by query answering in the classical EL [ 5 ]. The full details of these results can be found in the appendix.

Although all the complexity bounds are tight, they are all based on performing black-box query entailment tests on EL KBs. As future work we plan to adapt specific query answering techniques to produce effective algorithms that can be used in practice. We will also extend our framework to other kinds of standard and non-stardard reasoning tasks.

1. BenAmor , N., Dubois , D. , Gouider , H. , Prade , H.: Possibilistic Networks : A New Setting for Modeling Preferences . In: Proc. of SUM'14. LNCS , vol. 8720 . Springer Verlag ( 2014 )

2. Benferhat , S. , Dubois , D. , Garcia , L. , Prade , H.: Possibilistic logic bases and possibilistic graphs . In: Proc. of UAI'99 . Morgan-Kaufmann Publishers ( 1999 )

3. Ceylan , İ.İ., Peñaloza , R. : The Bayesian Description Logic BEL . In: Proc. of IJCAR'14. LNCS , vol. 8562 . Springer Verlag ( 2014 )

4. Darwiche , A. : Modeling and Reasoning with Bayesian Networks . Cambridge University Press ( 2009 )

5. Rosati , R.: On conjunctive query answering in EL . In: Proc. of DL'07. CEUR Workshop Proceedings , vol. 250 . CEUR-WS ( 2007 )