Preference-Based Query Answering in Datalog+/– Ontologies Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari Department of Computer Science, University of Oxford, UK firstname.lastname@cs.ox.ac.uk Abstract. The study of preferences has a long tradition in many disciplines, but it has only relatively recently entered the realm of data management through their application in answering queries to relational databases. The current revolution in data availability through the Web and, perhaps most importantly in the last few years, social media sites and applications, puts ontology languages at the forefront of data and information management technologies. In this paper, we propose the first (to our knowledge) integration of ontology languages with pref- erences as in relational databases by developing PrefDatalog+/–, an extension of the Datalog+/– family of languages with preference management formalisms closely related to those previously studied for relational databases. We focus on two kinds of answers to queries that are relevant to this setting, skyline and k- rank (a generalization of top-k queries), and develop algorithms for computing these answers to DAQs (disjunctions of atomic queries) as well as preliminary algorithms for CQs (conjunctive queries) based on related work in the databases and ontologies literature. We show that DAQ answering in PrefDatalog+/– can be done in polynomial time in the data complexity, as in relational databases, as long as query answering can also be done in polynomial time (in the data complexity) in the underlying classical ontology. 1 Introduction and Related Work In the past two decades, there has been a rapid development in the research and devel- opment of technologies to make an effective use of Web data. For instance, ontology languages, along with associated query answering algorithms, have been extensively studied with computational tractability in mind in order to guarantee their applicabil- ity on large scale knowledge bases. However, there remains much work to be done in central areas, such as search, where the state of the art is still centered in receiving key- words from a user and returning a set of links to Web documents. The main drawback to this paradigm is that the quality of the set of answers is much more important to users than the quantity. The evolution that seeks to address these issues is called se- mantic search, and is focused on identifying objects on the Web instead of documents and keywords. The other revolution that has recently taken place in the Web is often re- ferred to as the Social Web (as an evolution of the Semantic Web [2]) or Web 2.0, and is centered around a system composed of platforms that users adopt to share information and collaborate in a social environment. This makes the main tenets of semantic search ever more relevant for virtually all activities on the Web, since the fundamentally hu- man component of these systems makes each user’s personal preferences have a much more prevalent role than what was observed prior to this paradigm shift. The current challenge for Web search is thus inherently linked to leveraging the social components of Web content in order to develop some form of semantic search and query answering on the Web as a whole. Preferences have long been studied as an important part of people’s daily lives, and have thus received much attention in many areas of study such as philosophy (as early as Aristotle), choice theory (for instance, in rational decision making), and certain so- cial sciences (such as certain areas dealing with voting, auctions, and other mechanisms for social choice). Though these and many other avenues of study have naturally led to the study of preferences in computer science, the most relevant to our work is that of their incorporation into query answering mechanisms. To date (and to our knowledge), the state of the art in this respect is centered around relational databases. The seminal work in preference-based query answering was that of [9], in which the authors extend the SQL language to incorporate user preferences, showing that the resulting formal- ism could be translated into the domain relational calculus. Preference formulas were introduced in [6], where a logical formalism is proposed that allows an embedding of preference specifications into SQL through a winnow operator that is parameterized by a preference formula; the winnow operator is a generalization of the skyline operator, first introduced in [3]. Perhaps closest in nature to our approach is that of preference Datalog programs [8], which are a restriction of preference logic programs [7] that con- tain no uninterpreted function symbols and extend classical Datalog with constructs for determining which predicates must be optimized along with the optimization criteria (i.e., the set of preferences); the authors develop algorithms based on optimizations of the basic bottom-up evaluation approach to query answering based on the set of input preferences. Also, conditional preference bases [10] are closely related since they con- sist of a DL knowledge base (in SHOIN (D) or SHIF(D)) and a preference model built from variable-strength conditional preferences. For a recent survey of preference- based query answering formalisms, see [11]. In this paper, we propose a novel integration of ontology languages with preference management mechanisms by developing an extension of the Datalog+/– family of on- tology languages [5] with preference specification languages that are closely related to those previously studied in the relational databases literature. The main contributions of this paper are: (i) the introduction of the PrefDatalog+/– framework, the first (to our knowledge) combination of ontology languages with preferences as in relational databases; (ii) the development of algorithms for answering both skyline and k-rank queries (the latter is a generalization of top-k queries) for disjunctions of atomic queries (based on a novel augmented chase procedure) and conjunctive queries (based on the classical chase procedure and leveraging algorithms from related work in relational databases), along with proofs of correctness and running times showing that answering disjunctions of atomic queries (DAQs) in PrefDatalog+/– can be done in polynomial time in the data complexity, i.e., within the same complexity bounds as in relational databases, as long as query answering in the underlying classical ontology can also be done in polynomial time in the data complexity; and (iii) a complexity result showing that CQ answering is intractable in general. The rest of this paper is organized as follows. In Section 2, we present prelimi- nary concepts on classical Datalog+/– and preference specification formalisms from relational databases. Section 3 introduces the syntax and semantics of PrefDatalog+/–, along with the two kinds of queries that we focus on in this work. Section 4 presents algorithms for skyline and k-rank queries to PrefDatalog+/– ontologies, focusing espe- cially on DAQs; though an intractability result is proved for CQs, we also discuss how algorithms from relational databases can be leveraged for these queries as well. Finally, in Section 5, we conclude and discuss future work. Proofs of all stated results can be found in the full version of this paper. 2 Preliminary Concepts We now briefly recall some necessary background concepts. 2.1 Datalog+/– First, we present some basics on Datalog+/– [5], namely, on relational databases, (Boolean) conjunctive queries ((B)CQs), tuple-generating dependencies (TGDs), negative con- straints, the chase, and ontologies in Datalog+/–. Databases and Queries. We assume (i) an infinite universe of (data) constants ∆ (which constitute the “normal” domain of a database), (ii) an infinite set of (labeled) nulls ∆N (used as “fresh” Skolem terms, which are placeholders for unknown values, and can thus be seen as variables), and (iii) an infinite set of variables V (used in queries, dependencies, and constraints). Different constants represent different values (unique name assumption), while different nulls may represent the same value. We assume a lexicographic order on ∆ ∪ ∆N , with every symbol in ∆N following all symbols in ∆. We denote by X sequences of variables X1 , . . . , Xk with k > 0. We assume a relational schema R, which is a finite set of predicate symbols (or simply predicates). A term t is a constant, null, or variable. An atomic formula (or atom) a has the form P (t1 , ..., tn ), where P is an n-ary predicate, and t1 , ..., tn are terms. A database (instance) D for a relational schema R is a (possibly infinite) set of atoms with predicates from R and arguments from ∆. A conjunctive query (CQ) over R has the form Q(X) = ∃Y Φ(X, Y), where Φ(X, Y) is a conjunction of atoms (possibly equalities, but not inequalities) with the variables X and Y, and possibly constants, but without nulls. A Boolean CQ (BCQ) over R is a CQ of the form Q(), often written as the set of all its atoms, without quantifiers. Answers to CQs and BCQs are defined via homomorphisms, which are mappings µ : ∆ ∪ ∆N ∪ V → ∆ ∪ ∆N ∪ V such that (i) c ∈ ∆ implies µ(c) = c, (ii) c ∈ ∆N implies µ(c) ∈ ∆ ∪ ∆N , and (iii) µ is naturally extended to atoms, sets of atoms, and conjunctions of atoms. The set of all answers to a CQ Q(X) = ∃Y Φ(X, Y) over a database D, denoted Q(D), is the set of all tuples t over ∆ for which there exists a homomorphism µ : X ∪ Y → ∆ ∪ ∆N such that µ(Φ(X, Y)) ⊆ D and µ(X) = t. The answer to a BCQ Q() over a database D is Yes, denoted D |= Q, iff Q(D) ̸= ∅. Given a relational schema R, a tuple-generating dependency (TGD) σ is a first- order formula of the form ∀X∀Y Φ(X, Y) → ∃Z Ψ (X, Z), where Φ(X, Y) and Ψ (X, Z) are conjunctions of atoms over R (without nulls), called the body and the head of σ, denoted body(σ) and head (σ), respectively. Such σ is satisfied in a database D for R iff, whenever there exists a homomorphism h that maps the atoms of Φ(X, Y) to atoms of D, there exists an extension h′ of h that maps the atoms of Ψ (X, Z) to atoms of D. All sets of TGDs are finite here. Since TGDs can be reduced to TGDs with only single atoms in their heads, in the sequel, every TGD has w.l.o.g. a single atom in its head. A TGD σ is guarded iff it contains an atom in its body that contains all universally quantified variables of σ. The leftmost such atom is the guard atom (or guard) of σ. Query answering under TGDs, i.e., the evaluation of CQs and BCQs on databases under a set of TGDs is defined as follows. For a database D for R, and a set of TGDs Σ on R, the set of models of D and Σ, denoted mods(D, Σ), is the set of all (possibly infinite) databases B such that (i) D ⊆ B and (ii) every σ ∈ Σ is satisfied in B. The set of answers for a CQ Q to D and Σ, denoted ans(Q, D, Σ), is the set of all tuples a such that a ∈ Q(B) for all B ∈ mods(D, Σ). The answer for a BCQ Q to D and Σ is Yes, denoted D ∪ Σ |= Q, iff ans(Q, D, Σ) ̸= ∅. Note that query answering under general TGDs is undecidable [1], even when the schema and TGDs are fixed [4]. Decidability of query answering for the guarded case follows from a bounded tree-width property. The data complexity of query answering in this case is P-complete. Negative constraints (or simply constraints) γ are first-order formulas ∀XΦ(X) → ⊥, where Φ(X) (called the body of γ) is a conjunction of atoms (without nulls). Under the standard semantics of query answering of BCQs in Datalog+/– with TGDs, adding neg- ative constraints is computationally easy, as for each constraint ∀XΦ(X) → ⊥, we only have to check that the BCQ Φ(X) evaluates to false in D under Σ; if one of these checks fails, then the answer to the original BCQ Q is false, otherwise the constraints can simply be ignored when answering the BCQ Q. The final component of the language corresponds to equality generating dependen- cies; since they are not directly used here, we refer to [5] for their details. We usually omit the universal quantifiers in TGDs and negative constraints, and we implicitly as- sume that all sets of dependencies and/or constraints are finite. The Chase. The chase was first introduced to enable checking implication of depen- dencies, and later also for checking query containment. By “chase”, we refer both to the chase procedure and to its output. The TGD chase works on a database via so-called TGD chase rules. TGD Chase Rule. Let D be a database, and σ a TGD of the form Φ(X, Y) → ∃Z Ψ (X, Z). Then, σ is applicable to D if there exists a homomorphism h that maps the atoms of Φ(X, Y) to atoms of D. Let σ be applicable to D, and h1 be a homomorphism that extends h as follows: for each Xi ∈ X, h1 (Xi ) = h(Xi ); for each Zj ∈ Z, h1 (Zj ) = zj , where zj is a “fresh” null, i.e., zj ∈ ∆N , zj does not occur in D, and zj lexicographically follows all other nulls already introduced. The application of σ on D adds to D the atom h1 (Ψ (X, Z)) if not already in D. The chase algorithm for a database D and a set of TGDs Σ consists of an ex- haustive application of the TGD chase rule in a breadth-first (level-saturating) fashion, which outputs a (possibly infinite) chase for D and Σ. Formally, the chase of level up to 0 of D relative to Σ, denoted chase0 (D, Σ), is defined as D, assigning to every atom in D the (derivation) level 0. For every k > 1, the chase of level up to k of D relative to Σ, denoted chasek (D, Σ), is constructed as follows: let I1 , . . . , In be all possible images of bodies of TGDs in Σ relative to some homomorphism such that (i) I1 , . . . , In ⊆ chasek−1 (D, Σ) and (ii) the highest level of an atom in every Ii is k − 1; then, perform every corresponding TGD application on chasek−1 (D, Σ), choosing the applied TGDs and homomorphisms in a (fixed) linear and lexicographic order (resp.), and assigning to every new atom the level k. The chase of D relative to Σ, denoted chase(D, Σ), is defined as the limit of chasek (D, Σ) for k → ∞. The (possibly infinite) chase relative to TGDs is a universal model, i.e., there exists a homomorphism from chase(D, Σ) onto every B ∈ mods(D, Σ) [5]. This implies that BCQs Q over D and Σ can be evaluated on the chase for D and Σ, i.e., D ∪ Σ |= Q is equivalent to chase(D, Σ) |= Q. For guarded TGDs Σ, such BCQs Q can be evaluated on an initial fragment of chase(D, Σ) of constant depth k · |Q|, which is possible in polynomial time in the data complexity. Datalog+/– Ontologies. A Datalog+/– ontology KB = (D, Σ), where Σ = ΣT ∪ ΣNC , consists of a database D, a set of TGDs ΣT , and a set of negative constraints ΣNC . We say KB is guarded (resp., linear) iff ΣT is guarded (resp., linear). Example 1 (used in the sequel as a running example) illustrates a simple Datalog+/– ontology. Example 1. Let O = ⟨D, Σ⟩ be a simple ontology describing gift ideas for friends: Σ = { scifi book(T, A) → book(T, A), fant book(T, A) → book(T, A), mmorpg(X) → vidGame(X), actAdv(X) → vidGame(X), book(T, A) → educ(T ), puzzle(N ) → educ(N ) }, and D = {scifi book(b1 , asimov), scifi book(b2 , asimov), fant book(b3 .tolkien), puzzle(p1 ), mmorpg(v1 ), actAdv(v2 )}. This ontology considers two kinds of book and video game genres, and states that books and puzzles are categorized under an “educational” label. Database D provides some instances for each of these genres.  Finally, we would like to recall that different fragments of Datalog+/– have been shown to generalize the EL and F-Logic Lite description logics, as well as the DL-Lite family of tractable description logics [5]. 2.2 Preference Relations via Preference Formulas Though the PrefDatalog+/– formalism introduced in Section 3 allows essentially any preference framework to be used, in this paper we adopt a formalism slightly general- izing the preference formulas approach that was proposed in [6] in order to develop the concrete approaches to query answering discussed in Section 4. In the following, let ∆Pref be a finite set of constants, RPref be a finite set of predicate names, and VPref be an infinite set of variables; we denote with HPref the Herbrand base relative to these sets, respectively. A preference relation is any binary relation ≻ ⊆ HPref × HPref . Note that, slightly generalizing the framework in [6], preference relations can be defined between atoms corresponding to different predicates, and that formulas are not “iff” statements. Definition 1. Let a1 and a2 be atoms over RPref , VPref , and ∆Pref . A preference formula pf is of the form “a1 ≻ a2 if C(a1 , a2 )”, where C(a1 , a2 ) is a first-order formula. We call C(a1 , a2 ) the condition of pf, denoted cond(pf). The smallest (with respect to set inclusion) preference relation defined via a set P of preference formulas is denoted with ≻P . Example 2. Following the same topic of Example 1, the following formulas might rep- resent a specific friend’s preferences for children’s gifts: C1 : educ(X) ≻ vidGame(Y ) if ⊤ C2 : book(T1 , A1 ) ≻ book(T2 , A2 ) if scifi book(T1 , A1 ) ∧ fant book(T2 , A2 ) C3 : book(T1 , A1 ) ≻ book(T2 , A2 ) if T1 = b1 ∧ T2 = b2 C4 : educ(X) ≻ educ(Y ) if puzzle(X) ∧ fant book(Y ) C5 : vidGame(X) ≻ vidGame(Y ) if actAdv(X) ∧ mmorpg(Y ) For instance, formula C1 states that educational gifts are preferable to video games (unconditionally), while C4 states that an educational gift that is a puzzle is preferable to one that is a fantasy book.  3 PrefDatalog+/–: Syntax and Semantics Recall that we have (as discussed in Section 2), the logical languages for ontologies and preferences. For the former, we have an infinite universe of constants ∆Ont , an infinite set of variables VOnt , and a finite set of predicate names ROnt ; analogously for the latter, we have a finite set of constants ∆Pref , an infinite set of variables VPref , and a finite set of predicate names RPref . In the following, we assume w.l.o.g. that RPref ⊆ ROnt , ∆Pref ⊆ ∆Ont , and VPref ⊆ VOnt . These sets give rise to corresponding Herbrand bases consisting of all possible ground atoms that can be formed, which we denote by HOnt and HPref , respectively. Clearly, we have HPref ⊆ HOnt , meaning that preference relations are defined over a subset of the possible ground atoms. Definition 2. Let O be a Datalog+/– ontology and P be a set of preference formulas with Herbrand bases HOnt and HPref , respectively. A preference-based Datalog+/– on- tology (PrefDatalog+/– ontology, or knowledge base) is of the form KB = (O, P ), where HPref ⊆ HOnt . If O is a guarded Datalog+/– ontology, we say that KB is a guarded PrefDatalog+/– ontology. Semantics. The semantics of PrefDatalog+/– arises as a direct combination of the se- mantics of Datalog+/– and that of preference formulas. ∨ Let KB = (O, P ): KB |= a1 ≻P a2 iff (i) O |= a1 and O |= a2 ; and (ii) |= pfi ∈P cond(pfi )(a1 , a2 ). Intu- itively, the consequences of a knowledge base KB = (O, P ) are computed in terms of the chase for the classical Datalog+/– ontology O, and the formulas in P describe the preference relation over pairs of atoms in HOnt . The following is a simple example. Example 3. Let KB = (O, P ) be a PrefDatalog+/– ontology, with ontology O from Example 1 and set of preference formulas P from Example 2. We can see that KB |= educ(b1 ) ≻P vidGame(v1 ), since O |= educ(b1 ) and C1 ∈ P allows us to conclude that educ(b1 ) ≻P vidGame(v1 ). On the other hand, we have KB ̸|= educ(b1 ) ≻P educ(b2 ), since condition (ii) is not satisfied.  Queries. In this paper, we are interested in two kinds of preference-based queries that can be issued over PrefDatalog+/– ontologies: skyline and k-rank. In order to define answers to such queries, we adopt the following definitions from classical logic. A substitution is a function from variables to variables or constants. Two sets S and T unify via a substitution θ iff θS = θT , where θA denotes the application of θ to all variables in all elements of A (here, θ is a unifier). A most general unifier (mgu) is a unifier θ such that for all other unifiers ω, there exists a substitution σ such that ω = σ ◦ θ. We consider two kinds of classical queries: conjunctive queries (CQs) and disjunctive atomic queries (disjunctions of atoms – DAQs). Definition 3. Consider PrefDatalog+/– ontology KB = (O, P ), k > 0, and DAQ Q(X) = q1 (X1 ) ∨ ... ∨ qn (Xn ) where the qi ’s{are atoms and X1 ∪ ... ∪ Xn = X. ′ ′ ′ } as: θqi ′| O |= θqi and @θ s.t. O |= The set of skyline answers to Q is defined θ qj and θ qj ≻P θqi , with 1 6 i, j 6 n , where θ, θ are ground substitutions for the variables in Q(X). A k-rank answer to Q(X) is defined ⟨ for transitive ⟩ relations ≻P as a sequence of maximal length of mgu’s for X: S = θ1 , . . . , θk′ such that O |= θi Q for 1 6 i 6 k ′ 6 k, and S is built by subsequently appending the skyline answers to Q, removing these atoms from consideration, and repeating the process until either S = k or no more answers to Q remain. In Definition. 3, note that k-rank answers are only defined when the preference relation is transitive—this is required for the answers to make sense, since the atoms in each rank are supposed to be preferred to those in the next ranks. This kind of answer can be seen as a generalization of traditional top-k answers [11] that are still defined when ≻P is not a weak order, and their name arises from the concept of rank introduced in [6]. Intuitively, for DAQs, both kinds of answers can be seen as atomic consequences of O that satisfy the query: the skyline answers can be seen as sets of atoms that are not dominated by any other such atom, while k-rank answers are k-tuples sorted according to the preference relation. We refer to these as answers in atom form. Example 4. Consider again ontology KB = (O, P ) from Example 3, Q1 (X, Y ) = book(X, Y ), and Q2 (X) = educ(X); the ≻P relation is depicted in Figure 1 (dot- ted lines). The set of skyline answers to query Q1 is {book(b1 , asimov)}, while for Q2 it is {educ(p1 ), educ(b1 ), educ(b2 )}. A 3-rank answer to Q1 is: ⟨book(b1 , asimov), book(b2 , asimov), book(b3 , tolkien)⟩. For Q2 , a 3-rank answer is ⟨educ(p1 ), educ(b1 ), educ(b2 )⟩. Finally, Q3 = puzzle(X) ∨ vidGame(X) yields {puzzle(p1 ), vidGame(v2 )} as skyline answers, and a 3-rank answer is {puzzle(p1 ), vidGame(v2 ), vidGame(v1 )}. Finally, we have a result relating k-rank to the traditional top-k queries (cf. [11]). Proposition 1. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ or a CQ, and k > 0. If ≻P is a weak order, then k-rank answers are equal to top-k answers (modulo ordering of incomparable elements).  scifi_book(b1,a) scifi_book(b2,a) fant_book(b3,t) puzzle(p1) mmorpg(v1) actAdv(v2)  book(b1,a) 9 book(b a) 9 book(b t)  2, 3, vidGame(v1) vidGame(v2)  educ(b1) educ(b2) educ(b3) educ(p1) Fig. 1. Preference-augmented chase forest for the running example. Dashed arrows correspond to preference edges, and the marks in the upper left corner represent marked nodes for different queries; author names have been shortened to letters. Conjunctive Queries. Definition 3 can easily be extended to conjunctive queries— intuitively, substitutions in the set of answers must be for all qi ’s in the query instead of just one and, in the case of (non-atomic) conjunctive queries, the substitutions in answers no longer yield single atoms but rather sets of atoms; therefore, we must extend the preference specification framework to take into account sets of atoms instead of individual ones. One such approach was proposed in [12], where a mechanism to define a preference relation over tuple sets ≻PS : 2HPref × 2HPref is introduced. Though this is not the focus of this paper, in Section 4.2 we treat their complexity and briefly discuss how methods from the relational databases literature can be applied to answer them. 4 Preference-Based Query Answering In this section, we propose algorithms for answering queries to PrefDatalog+/– ontolo- gies; we start with DAQs and then move on to the conjunctive case. 4.1 Disjunctive Atomic Queries We begin by introducing a structure that is used in preference-based query answering. Next, we investigate algorithms for skyline and k-rank query answering for DAQs. The Preference-Augmented Chase (prefChase). In order to compute skyline and k- rank answers to queries over a PrefDatalog+/– ontology KB = (⟨D, Σ⟩, P ), we make use of a data structure called the chase forest for D, Σ, and query Q [5], which is defined as the directed graph consisting of (i) for every a ∈ D, one node labeled with a, and (ii) for every node labeled with a ∈ chase(D, Σ) and for every atom b ∈ chase(D, Σ) that is obtained from a and possibly other atoms by a one-step application of a TGD σ ∈ Σ with a as guard, one node labeled with b along with an edge from the node labeled with a. Here, we propose an augmented chase forest, comprised of the necessary fi- nite part of the chase forest relative to a given query that is augmented with an ad- ditional kind of edge that we call preference edges, which occur between nodes la- beled with a, b ∈ chase(D, Σ) iff a ≻P b. Finally, when an edge is introduced between nodes whose labels satisfy Q, the node with the incoming edge is marked. Figure 1 Algorithm 1: Skyline(KB = (⟨D, Σ⟩, P ), Q) Input: PrefDatalog+/– ontology KB and DAQ Q(X). Output: Set of skyline answers {a1 , ..., ak } to Q. 1. Initialize Res as an empty set of ground atoms; 2. C:= prefChase(KB , Q); 3. For each atom a labeling node v in C do begin 4. If a |= Q and v is unmarked then Res:= Res ∪ {a}; 5. end; 6. return Res. Fig. 2. An algorithm that computes the set of skyline answers (in atom form) to an atomic query. shows prefChase(KB , Q) for the PrefDatalog+/– ontology from Example 3; for illus- trative reasons, markings for three different queries have been included in the figure: book(X, Y ) (check mark), educ(X) (star), and vidGame(X) (diamond). Theorem 1. If KB = (⟨D, Σ⟩, P ) is a guarded PrefDatalog+/– ontology and Q a DAQ, prefChase(KB , Q) can be computed in time O(n2 ) in the data complexity. Skyline Queries. Algorithm Skyline (Fig. 2) begins by computing the preference- augmented chase forest relative to the input ontology and query and then simply ana- lyzing the set of marked nodes (i.e., those that are dominated by other tuples relative to the query). As we can see, the node markings have done almost all of the work towards answering the skyline query; the algorithm only needs to go through the structure and find the nodes whose labels satisfy the query and, if unmarked, add them to the output. Example 5. Consider again the setup in Example 4 and Figure 1. For query Q1 , the only unmarked atom is book(b1 , a), and so the skyline answer is the singleton set with this atom. For Q2 , we get all the educ atoms except educ(b3 ).  Theorem 2. Let KB be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. Then, (i) Algorithm Skyline correctly computes the skyline answers to Q, and (ii) Algo- rithm Skyline runs in time O(k) in the data complexity, where k is the time needed to compute prefChase(KB , Q) in the data complexity. Corollary 1. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. If O is a guarded Datalog+/– ontology, then the set of skyline answers to Q can be computed in polynomial time in the data complexity. k-Rank Queries. Algorithm k-Rank (Figure 3) also begins by computing the preference- augmented chase forest. The main while loop iterates through the process of comput- ing the skyline answers to Q by using the Skyline algorithm, updating the result by appending these answers in arbitrary order, and removing the nodes and edges involved in the result from the chase forest; the loop finally prepares for the next iteration by updating the markings in the forest. Once the loop is finished, the algorithm returns the first k results, since the last iteration might add superfluous elements. Algorithm 2: k-Rank(KB = (⟨D, Σ⟩, P ), Q, k) Input: PrefDatalog+/– ontology KB , DAQ Q(X), k > 0. Output: k-rank answer ⟨a1 , ..., ak′ ⟩ to Q. 1. Initialize Res as an empty vector of ground atoms; 2. C:= prefChase(KB , Q); 3. i:= k; 4. While i > 0 do begin 5. S:= Skyline(KB , Q); 6. Append S to Res (arbitrary order); 7. Remove S from C along with all associated edges; 8. Identify marked nodes that no longer have incoming preference edges and change them to unmarked; 9. i:= i − |S|; 10. End; 11. Return truncate(Res, k). Fig. 3. An algorithm for obtaining a k-rank answer (in atom form) to DAQ Q. Example 6. Consider the setup in Example 4 and the structure in Figure 1. The node book(b1 , a) is the first skyline answer; Algorithm k-Rank then proceeds to update Res with ⟨book(b1 , a)⟩ and to remove this node (and associated edges) from the forest; according to Line 8, node book(b2 , a) now becomes unmarked. The algorithm continues until the three book atoms are processed.  Theorem 3. Let KB be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. Then, (i) Algorithm k-Rank correctly computes a k-rank answer to Q, and (ii) Algorithm k- Rank runs in time O(k) in the data complexity, where k is the time needed to compute prefChase(KB , Q) in the data complexity. Corollary 2. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. If O is a guarded Datalog+/– ontology, then a k-rank answer to Q can be computed in polynomial time in the data complexity. 4.2 Conjunctive Queries As discussed at the end of Section 3, non-atomic conjunctive queries require a prefer- ence relation that is defined between sets of tuples—in this section we assume that we have an extension of the preference formulas formalism PS that yields such a relation; a detailed treatment of this extension is outside the scope of this paper. Unfortunately, the following result tells us that this change has a direct impact on the complexity of both skyline and k-rank query answering. Theorem 4. Let KB = (⟨D, Σ⟩, PS) be a guarded PrefDatalog+/– ontology, where PS describes a preference relation ≻PS : 2HPref × 2HPref such that membership can be tested in polynomial time, and Q be a CQ. If the relational schema and Σ are fixed, deciding if the set of skyline answers or a k-rank answer to Q are non-empty is Σ2p -complete. The heuristic algorithms presented in [12] for relational databases can be applied to PrefDatalog+/– by computing the prefChase relative to the query and thus materializ- ing the necessary part of the ontology into a database. The following result provides a way to leverage tools developed for relational databases for first order rewritable frag- ments; we use prefQ-DPM to denote either Skyline or k-Rank, and prefQ-RDB the corresponding algorithm for relational databases. Theorem 5. Let KB = (⟨D, Σ⟩, PS) be a PrefDatalog+/– ontology and Q a CQ. If ⟨D, Σ⟩ belongs to an FO-rewritable fragment of Datalog+/– and all F ∈ PS are such that cond(F ) are conjunctions of atoms, then we can compute PS′ and Q′ in PTIME such that prefQ-DPM(KB , Q) = prefQ-RDB(D, PS′ , Q′ ). Intuitively, the theorem is a consequence of the fact that for each F ∈ PS, cond(F ) can be treated like a BCQ and thus rewritten relative to ΣT . This result is important since it identifies conditions under which preference-based queries can be answered using highly optimized relational database management systems. 5 Summary and Outlook In this work, we have presented an extension of the Datalog+/– family of ontology lan- guages for preference-based query answering, a topic that has recently become central in dealing with data related to the Social Semantic Web. We focused on two well-known kinds of preference-based queries, namely skyline and k-rank (a generalization of tradi- tional top-k queries), and studied algorithms and complexity of computing their answers when the underlying classical queries are either DAQs or CQs. For DAQs, we proposed an augmented version of the chase forest, and showed how this structure can be used to answer both kinds of preference-based queries in polynomial time in the data complex- ity whenever classical query answering is PTIME in the underlying Datalog+/– ontol- ogy. For CQs, we showed that this desirable property is lost, and note that related work in relational databases can however be applied in order to leverage heuristics—we also provided a result for leveraging an RDBMS for FO-rewritable fragments of Datalog+/–. Current and future work involves implementing and testing the PrefDatalog+/– frame- work, and exploring specific heuristics and other approximation techniques that can be applied to answer preference-based CQs scalably. Acknowledgments. A slightly longer version of this paper appears in the proceed- ings of IJCAI 2013. This work was supported by UK EPSRC grant EP/J008346/1– “PrOQAW”, ERC Grant 246858–“DIADEM”, and a Yahoo! Research Fellowship. References 1. C. Beeri and M. Y. Vardi. The implication problem for data dependencies. In Proc. ICALP- 1981, volume 115 of LNCS, pages 73–85. Springer, 1981. 2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Sci. Am., 284(5):34–43, 2002. 3. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In Proc. ICDE-2001, pages 421–430, Washington, DC, USA, 2001. IEEE Computer Society. 4. A. Calı̀, G. Gottlob, and M Kifer. Taming the infinite chase: Query answering under expres- sive relational constraints. In Proc. KR-2008, pages 70–80. AAAI Press, 2008. 5. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem., 14:57–83, 2012. 6. J. Chomicki. Preference formulas in relational queries. ACM Trans. Database Syst., 28(4):427–466, December 2003. 7. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic programming. In Proc. ICLP-1995, pages 731–745, 1995. 8. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference queries in deductive databases. New Generat. Comput., 19(1):57–86, 2001. 9. M. Lacroix and P. Lavency. Preferences: Putting more knowledge into queries. In Proc. VLDB-1997, volume 87, pages 1–4, 1987. 10. T. Lukasiewicz and J. Schellhase. Variable-strength conditional preferences for ranking ob- jects in ontologies. J. of Web Sem., 5(3), 2007. 11. K. Stefanidis, G. Koutrika, and E. Pitoura. A survey on representation, composition and application of preferences in database systems. ACM Trans. Database Syst., 36(3):19:1– 19:45, August 2011. 12. X. Zhang and J. Chomicki. Preference queries over sets. In Proc. ICDE-2011, pages 1019– 1030, 2011.