Preference-Based Query Answering
                      in Datalog+/– Ontologies

         Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari

                 Department of Computer Science, University of Oxford, UK
                        firstname.lastname@cs.ox.ac.uk


       Abstract. The study of preferences has a long tradition in many disciplines, but
       it has only relatively recently entered the realm of data management through their
       application in answering queries to relational databases. The current revolution
       in data availability through the Web and, perhaps most importantly in the last
       few years, social media sites and applications, puts ontology languages at the
       forefront of data and information management technologies. In this paper, we
       propose the first (to our knowledge) integration of ontology languages with pref-
       erences as in relational databases by developing PrefDatalog+/–, an extension
       of the Datalog+/– family of languages with preference management formalisms
       closely related to those previously studied for relational databases. We focus on
       two kinds of answers to queries that are relevant to this setting, skyline and k-
       rank (a generalization of top-k queries), and develop algorithms for computing
       these answers to DAQs (disjunctions of atomic queries) as well as preliminary
       algorithms for CQs (conjunctive queries) based on related work in the databases
       and ontologies literature. We show that DAQ answering in PrefDatalog+/– can be
       done in polynomial time in the data complexity, as in relational databases, as long
       as query answering can also be done in polynomial time (in the data complexity)
       in the underlying classical ontology.


1   Introduction and Related Work
In the past two decades, there has been a rapid development in the research and devel-
opment of technologies to make an effective use of Web data. For instance, ontology
languages, along with associated query answering algorithms, have been extensively
studied with computational tractability in mind in order to guarantee their applicabil-
ity on large scale knowledge bases. However, there remains much work to be done in
central areas, such as search, where the state of the art is still centered in receiving key-
words from a user and returning a set of links to Web documents. The main drawback
to this paradigm is that the quality of the set of answers is much more important to
users than the quantity. The evolution that seeks to address these issues is called se-
mantic search, and is focused on identifying objects on the Web instead of documents
and keywords. The other revolution that has recently taken place in the Web is often re-
ferred to as the Social Web (as an evolution of the Semantic Web [2]) or Web 2.0, and is
centered around a system composed of platforms that users adopt to share information
and collaborate in a social environment. This makes the main tenets of semantic search
ever more relevant for virtually all activities on the Web, since the fundamentally hu-
man component of these systems makes each user’s personal preferences have a much
more prevalent role than what was observed prior to this paradigm shift. The current
challenge for Web search is thus inherently linked to leveraging the social components
of Web content in order to develop some form of semantic search and query answering
on the Web as a whole.

     Preferences have long been studied as an important part of people’s daily lives, and
have thus received much attention in many areas of study such as philosophy (as early
as Aristotle), choice theory (for instance, in rational decision making), and certain so-
cial sciences (such as certain areas dealing with voting, auctions, and other mechanisms
for social choice). Though these and many other avenues of study have naturally led to
the study of preferences in computer science, the most relevant to our work is that of
their incorporation into query answering mechanisms. To date (and to our knowledge),
the state of the art in this respect is centered around relational databases. The seminal
work in preference-based query answering was that of [9], in which the authors extend
the SQL language to incorporate user preferences, showing that the resulting formal-
ism could be translated into the domain relational calculus. Preference formulas were
introduced in [6], where a logical formalism is proposed that allows an embedding of
preference specifications into SQL through a winnow operator that is parameterized by
a preference formula; the winnow operator is a generalization of the skyline operator,
first introduced in [3]. Perhaps closest in nature to our approach is that of preference
Datalog programs [8], which are a restriction of preference logic programs [7] that con-
tain no uninterpreted function symbols and extend classical Datalog with constructs for
determining which predicates must be optimized along with the optimization criteria
(i.e., the set of preferences); the authors develop algorithms based on optimizations of
the basic bottom-up evaluation approach to query answering based on the set of input
preferences. Also, conditional preference bases [10] are closely related since they con-
sist of a DL knowledge base (in SHOIN (D) or SHIF(D)) and a preference model
built from variable-strength conditional preferences. For a recent survey of preference-
based query answering formalisms, see [11].

    In this paper, we propose a novel integration of ontology languages with preference
management mechanisms by developing an extension of the Datalog+/– family of on-
tology languages [5] with preference specification languages that are closely related to
those previously studied in the relational databases literature. The main contributions
of this paper are: (i) the introduction of the PrefDatalog+/– framework, the first (to
our knowledge) combination of ontology languages with preferences as in relational
databases; (ii) the development of algorithms for answering both skyline and k-rank
queries (the latter is a generalization of top-k queries) for disjunctions of atomic queries
(based on a novel augmented chase procedure) and conjunctive queries (based on the
classical chase procedure and leveraging algorithms from related work in relational
databases), along with proofs of correctness and running times showing that answering
disjunctions of atomic queries (DAQs) in PrefDatalog+/– can be done in polynomial
time in the data complexity, i.e., within the same complexity bounds as in relational
databases, as long as query answering in the underlying classical ontology can also be
done in polynomial time in the data complexity; and (iii) a complexity result showing
that CQ answering is intractable in general.
    The rest of this paper is organized as follows. In Section 2, we present prelimi-
nary concepts on classical Datalog+/– and preference specification formalisms from
relational databases. Section 3 introduces the syntax and semantics of PrefDatalog+/–,
along with the two kinds of queries that we focus on in this work. Section 4 presents
algorithms for skyline and k-rank queries to PrefDatalog+/– ontologies, focusing espe-
cially on DAQs; though an intractability result is proved for CQs, we also discuss how
algorithms from relational databases can be leveraged for these queries as well. Finally,
in Section 5, we conclude and discuss future work. Proofs of all stated results can be
found in the full version of this paper.


2   Preliminary Concepts
We now briefly recall some necessary background concepts.

2.1 Datalog+/–
First, we present some basics on Datalog+/– [5], namely, on relational databases, (Boolean)
conjunctive queries ((B)CQs), tuple-generating dependencies (TGDs), negative con-
straints, the chase, and ontologies in Datalog+/–.
Databases and Queries. We assume (i) an infinite universe of (data) constants ∆
(which constitute the “normal” domain of a database), (ii) an infinite set of (labeled)
nulls ∆N (used as “fresh” Skolem terms, which are placeholders for unknown values,
and can thus be seen as variables), and (iii) an infinite set of variables V (used in queries,
dependencies, and constraints). Different constants represent different values (unique
name assumption), while different nulls may represent the same value. We assume a
lexicographic order on ∆ ∪ ∆N , with every symbol in ∆N following all symbols in ∆.
We denote by X sequences of variables X1 , . . . , Xk with k > 0. We assume a relational
schema R, which is a finite set of predicate symbols (or simply predicates). A term t is
a constant, null, or variable. An atomic formula (or atom) a has the form P (t1 , ..., tn ),
where P is an n-ary predicate, and t1 , ..., tn are terms.
    A database (instance) D for a relational schema R is a (possibly infinite) set of
atoms with predicates from R and arguments from ∆. A conjunctive query (CQ) over
R has the form Q(X) = ∃Y Φ(X, Y), where Φ(X, Y) is a conjunction of atoms
(possibly equalities, but not inequalities) with the variables X and Y, and possibly
constants, but without nulls. A Boolean CQ (BCQ) over R is a CQ of the form Q(),
often written as the set of all its atoms, without quantifiers. Answers to CQs and BCQs
are defined via homomorphisms, which are mappings µ : ∆ ∪ ∆N ∪ V → ∆ ∪ ∆N ∪ V
such that (i) c ∈ ∆ implies µ(c) = c, (ii) c ∈ ∆N implies µ(c) ∈ ∆ ∪ ∆N , and (iii) µ is
naturally extended to atoms, sets of atoms, and conjunctions of atoms. The set of all
answers to a CQ Q(X) = ∃Y Φ(X, Y) over a database D, denoted Q(D), is the set
of all tuples t over ∆ for which there exists a homomorphism µ : X ∪ Y → ∆ ∪ ∆N
such that µ(Φ(X, Y)) ⊆ D and µ(X) = t. The answer to a BCQ Q() over a database D
is Yes, denoted D |= Q, iff Q(D) ̸= ∅.
     Given a relational schema R, a tuple-generating dependency (TGD) σ is a first-
order formula of the form ∀X∀Y Φ(X, Y) → ∃Z Ψ (X, Z), where Φ(X, Y) and Ψ (X,
Z) are conjunctions of atoms over R (without nulls), called the body and the head of σ,
denoted body(σ) and head (σ), respectively. Such σ is satisfied in a database D for R
iff, whenever there exists a homomorphism h that maps the atoms of Φ(X, Y) to atoms
of D, there exists an extension h′ of h that maps the atoms of Ψ (X, Z) to atoms of D.
All sets of TGDs are finite here. Since TGDs can be reduced to TGDs with only single
atoms in their heads, in the sequel, every TGD has w.l.o.g. a single atom in its head.
A TGD σ is guarded iff it contains an atom in its body that contains all universally
quantified variables of σ. The leftmost such atom is the guard atom (or guard) of σ.
     Query answering under TGDs, i.e., the evaluation of CQs and BCQs on databases
under a set of TGDs is defined as follows. For a database D for R, and a set of TGDs
Σ on R, the set of models of D and Σ, denoted mods(D, Σ), is the set of all (possibly
infinite) databases B such that (i) D ⊆ B and (ii) every σ ∈ Σ is satisfied in B. The set
of answers for a CQ Q to D and Σ, denoted ans(Q, D, Σ), is the set of all tuples a such
that a ∈ Q(B) for all B ∈ mods(D, Σ). The answer for a BCQ Q to D and Σ is Yes,
denoted D ∪ Σ |= Q, iff ans(Q, D, Σ) ̸= ∅. Note that query answering under general
TGDs is undecidable [1], even when the schema and TGDs are fixed [4]. Decidability
of query answering for the guarded case follows from a bounded tree-width property.
The data complexity of query answering in this case is P-complete.
     Negative constraints (or simply constraints) γ are first-order formulas ∀XΦ(X) → ⊥,
where Φ(X) (called the body of γ) is a conjunction of atoms (without nulls). Under the
standard semantics of query answering of BCQs in Datalog+/– with TGDs, adding neg-
ative constraints is computationally easy, as for each constraint ∀XΦ(X) → ⊥, we only
have to check that the BCQ Φ(X) evaluates to false in D under Σ; if one of these
checks fails, then the answer to the original BCQ Q is false, otherwise the constraints
can simply be ignored when answering the BCQ Q.
     The final component of the language corresponds to equality generating dependen-
cies; since they are not directly used here, we refer to [5] for their details. We usually
omit the universal quantifiers in TGDs and negative constraints, and we implicitly as-
sume that all sets of dependencies and/or constraints are finite.
The Chase. The chase was first introduced to enable checking implication of depen-
dencies, and later also for checking query containment. By “chase”, we refer both to
the chase procedure and to its output. The TGD chase works on a database via so-called
TGD chase rules.
    TGD Chase Rule. Let D be a database, and σ a TGD of the form Φ(X, Y) →
∃Z Ψ (X, Z). Then, σ is applicable to D if there exists a homomorphism h that maps the
atoms of Φ(X, Y) to atoms of D. Let σ be applicable to D, and h1 be a homomorphism
that extends h as follows: for each Xi ∈ X, h1 (Xi ) = h(Xi ); for each Zj ∈ Z,
h1 (Zj ) = zj , where zj is a “fresh” null, i.e., zj ∈ ∆N , zj does not occur in D, and zj
lexicographically follows all other nulls already introduced. The application of σ on D
adds to D the atom h1 (Ψ (X, Z)) if not already in D.
    The chase algorithm for a database D and a set of TGDs Σ consists of an ex-
haustive application of the TGD chase rule in a breadth-first (level-saturating) fashion,
which outputs a (possibly infinite) chase for D and Σ. Formally, the chase of level up
to 0 of D relative to Σ, denoted chase0 (D, Σ), is defined as D, assigning to every
atom in D the (derivation) level 0. For every k > 1, the chase of level up to k of D
relative to Σ, denoted chasek (D, Σ), is constructed as follows: let I1 , . . . , In be all
possible images of bodies of TGDs in Σ relative to some homomorphism such that (i)
I1 , . . . , In ⊆ chasek−1 (D, Σ) and (ii) the highest level of an atom in every Ii is k − 1;
then, perform every corresponding TGD application on chasek−1 (D, Σ), choosing the
applied TGDs and homomorphisms in a (fixed) linear and lexicographic order (resp.),
and assigning to every new atom the level k. The chase of D relative to Σ, denoted
chase(D, Σ), is defined as the limit of chasek (D, Σ) for k → ∞.
      The (possibly infinite) chase relative to TGDs is a universal model, i.e., there exists
a homomorphism from chase(D, Σ) onto every B ∈ mods(D, Σ) [5]. This implies that
BCQs Q over D and Σ can be evaluated on the chase for D and Σ, i.e., D ∪ Σ |= Q is
equivalent to chase(D, Σ) |= Q. For guarded TGDs Σ, such BCQs Q can be evaluated
on an initial fragment of chase(D, Σ) of constant depth k · |Q|, which is possible in
polynomial time in the data complexity.
Datalog+/– Ontologies. A Datalog+/– ontology KB = (D, Σ), where Σ = ΣT ∪ ΣNC ,
consists of a database D, a set of TGDs ΣT , and a set of negative constraints ΣNC . We
say KB is guarded (resp., linear) iff ΣT is guarded (resp., linear). Example 1 (used in
the sequel as a running example) illustrates a simple Datalog+/– ontology.
Example 1. Let O = ⟨D, Σ⟩ be a simple ontology describing gift ideas for friends:
Σ = { scifi book(T, A) → book(T, A), fant book(T, A) → book(T, A),
      mmorpg(X) → vidGame(X), actAdv(X) → vidGame(X),
      book(T, A) → educ(T ),         puzzle(N ) → educ(N )         },
and D = {scifi book(b1 , asimov), scifi book(b2 , asimov), fant book(b3 .tolkien), puzzle(p1 ),
mmorpg(v1 ), actAdv(v2 )}.
This ontology considers two kinds of book and video game genres, and states that books
and puzzles are categorized under an “educational” label. Database D provides some
instances for each of these genres.                                                  


   Finally, we would like to recall that different fragments of Datalog+/– have been
shown to generalize the EL and F-Logic Lite description logics, as well as the DL-Lite
family of tractable description logics [5].

2.2   Preference Relations via Preference Formulas
Though the PrefDatalog+/– formalism introduced in Section 3 allows essentially any
preference framework to be used, in this paper we adopt a formalism slightly general-
izing the preference formulas approach that was proposed in [6] in order to develop the
concrete approaches to query answering discussed in Section 4.
    In the following, let ∆Pref be a finite set of constants, RPref be a finite set of predicate
names, and VPref be an infinite set of variables; we denote with HPref the Herbrand
base relative to these sets, respectively. A preference relation is any binary relation
≻ ⊆ HPref × HPref . Note that, slightly generalizing the framework in [6], preference
relations can be defined between atoms corresponding to different predicates, and that
formulas are not “iff” statements.
Definition 1. Let a1 and a2 be atoms over RPref , VPref , and ∆Pref . A preference formula
pf is of the form “a1 ≻ a2 if C(a1 , a2 )”, where C(a1 , a2 ) is a first-order formula. We
call C(a1 , a2 ) the condition of pf, denoted cond(pf).
The smallest (with respect to set inclusion) preference relation defined via a set P of
preference formulas is denoted with ≻P .
Example 2. Following the same topic of Example 1, the following formulas might rep-
resent a specific friend’s preferences for children’s gifts:
C1 : educ(X) ≻ vidGame(Y ) if ⊤
C2 : book(T1 , A1 ) ≻ book(T2 , A2 ) if scifi book(T1 , A1 ) ∧ fant book(T2 , A2 )
C3 : book(T1 , A1 ) ≻ book(T2 , A2 ) if T1 = b1 ∧ T2 = b2
C4 : educ(X) ≻ educ(Y ) if puzzle(X) ∧ fant book(Y )
C5 : vidGame(X) ≻ vidGame(Y ) if actAdv(X) ∧ mmorpg(Y )
For instance, formula C1 states that educational gifts are preferable to video games
(unconditionally), while C4 states that an educational gift that is a puzzle is preferable
to one that is a fantasy book.                                                           


3    PrefDatalog+/–: Syntax and Semantics

Recall that we have (as discussed in Section 2), the logical languages for ontologies
and preferences. For the former, we have an infinite universe of constants ∆Ont , an
infinite set of variables VOnt , and a finite set of predicate names ROnt ; analogously for
the latter, we have a finite set of constants ∆Pref , an infinite set of variables VPref , and
a finite set of predicate names RPref . In the following, we assume w.l.o.g. that RPref ⊆
ROnt , ∆Pref ⊆ ∆Ont , and VPref ⊆ VOnt . These sets give rise to corresponding Herbrand
bases consisting of all possible ground atoms that can be formed, which we denote by
HOnt and HPref , respectively. Clearly, we have HPref ⊆ HOnt , meaning that preference
relations are defined over a subset of the possible ground atoms.

Definition 2. Let O be a Datalog+/– ontology and P be a set of preference formulas
with Herbrand bases HOnt and HPref , respectively. A preference-based Datalog+/– on-
tology (PrefDatalog+/– ontology, or knowledge base) is of the form KB = (O, P ),
where HPref ⊆ HOnt . If O is a guarded Datalog+/– ontology, we say that KB is a
guarded PrefDatalog+/– ontology.


Semantics. The semantics of PrefDatalog+/– arises as a direct combination of the se-
mantics of Datalog+/– and that of preference formulas.   ∨ Let KB = (O, P ): KB |=
a1 ≻P a2 iff (i) O |= a1 and O |= a2 ; and (ii) |= pfi ∈P cond(pfi )(a1 , a2 ). Intu-
itively, the consequences of a knowledge base KB = (O, P ) are computed in terms of
the chase for the classical Datalog+/– ontology O, and the formulas in P describe the
preference relation over pairs of atoms in HOnt . The following is a simple example.

Example 3. Let KB = (O, P ) be a PrefDatalog+/– ontology, with ontology O from
Example 1 and set of preference formulas P from Example 2. We can see that KB |=
educ(b1 ) ≻P vidGame(v1 ), since O |= educ(b1 ) and C1 ∈ P allows us to conclude that
educ(b1 ) ≻P vidGame(v1 ). On the other hand, we have KB ̸|= educ(b1 ) ≻P educ(b2 ),
since condition (ii) is not satisfied.                                             


Queries. In this paper, we are interested in two kinds of preference-based queries that
can be issued over PrefDatalog+/– ontologies: skyline and k-rank. In order to define
answers to such queries, we adopt the following definitions from classical logic. A
substitution is a function from variables to variables or constants. Two sets S and T
unify via a substitution θ iff θS = θT , where θA denotes the application of θ to all
variables in all elements of A (here, θ is a unifier). A most general unifier (mgu) is
a unifier θ such that for all other unifiers ω, there exists a substitution σ such that
ω = σ ◦ θ. We consider two kinds of classical queries: conjunctive queries (CQs) and
disjunctive atomic queries (disjunctions of atoms – DAQs).

Definition 3. Consider PrefDatalog+/– ontology KB = (O, P ), k > 0, and DAQ
Q(X) = q1 (X1 ) ∨ ... ∨ qn (Xn ) where the qi ’s{are atoms and X1 ∪ ... ∪ Xn = X.
                                                                             ′
 ′         ′
                                        } as: θqi ′| O |= θqi and @θ s.t. O |=
The set of skyline answers to Q is defined
θ qj and θ qj ≻P θqi , with 1 6 i, j 6 n , where θ, θ are ground substitutions for the
variables in Q(X). A k-rank answer to Q(X) is defined
                                                 ⟨      for transitive
                                                               ⟩       relations ≻P as a
sequence of maximal length of mgu’s for X: S = θ1 , . . . , θk′ such that O |= θi Q for
1 6 i 6 k ′ 6 k, and S is built by subsequently appending the skyline answers to Q,
removing these atoms from consideration, and repeating the process until either S = k
or no more answers to Q remain.

In Definition. 3, note that k-rank answers are only defined when the preference relation
is transitive—this is required for the answers to make sense, since the atoms in each
rank are supposed to be preferred to those in the next ranks. This kind of answer can be
seen as a generalization of traditional top-k answers [11] that are still defined when ≻P
is not a weak order, and their name arises from the concept of rank introduced in [6].
     Intuitively, for DAQs, both kinds of answers can be seen as atomic consequences of
O that satisfy the query: the skyline answers can be seen as sets of atoms that are not
dominated by any other such atom, while k-rank answers are k-tuples sorted according
to the preference relation. We refer to these as answers in atom form.

Example 4. Consider again ontology KB = (O, P ) from Example 3, Q1 (X, Y ) =
book(X, Y ), and Q2 (X) = educ(X); the ≻P relation is depicted in Figure 1 (dot-
ted lines). The set of skyline answers to query Q1 is {book(b1 , asimov)}, while for
Q2 it is {educ(p1 ), educ(b1 ), educ(b2 )}. A 3-rank answer to Q1 is: ⟨book(b1 , asimov),
book(b2 , asimov), book(b3 , tolkien)⟩. For Q2 , a 3-rank answer is ⟨educ(p1 ), educ(b1 ),
educ(b2 )⟩. Finally, Q3 = puzzle(X) ∨ vidGame(X) yields {puzzle(p1 ), vidGame(v2 )}
as skyline answers, and a 3-rank answer is {puzzle(p1 ), vidGame(v2 ), vidGame(v1 )}.

Finally, we have a result relating k-rank to the traditional top-k queries (cf. [11]).
Proposition 1. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ or a
CQ, and k > 0. If ≻P is a weak order, then k-rank answers are equal to top-k answers
(modulo ordering of incomparable elements).
       
      scifi_book(b1,a)   scifi_book(b2,a)   fant_book(b3,t)   puzzle(p1)   mmorpg(v1)      actAdv(v2)
       

        book(b1,a)       9 book(b a)        9 book(b t)             
                                   2,                3,              vidGame(v1)        vidGame(v2)


                                            
           educ(b1)         educ(b2)            educ(b3)        educ(p1)


Fig. 1. Preference-augmented chase forest for the running example. Dashed arrows correspond
to preference edges, and the marks in the upper left corner represent marked nodes for different
queries; author names have been shortened to letters.


Conjunctive Queries. Definition 3 can easily be extended to conjunctive queries—
intuitively, substitutions in the set of answers must be for all qi ’s in the query instead
of just one and, in the case of (non-atomic) conjunctive queries, the substitutions in
answers no longer yield single atoms but rather sets of atoms; therefore, we must extend
the preference specification framework to take into account sets of atoms instead of
individual ones. One such approach was proposed in [12], where a mechanism to define
a preference relation over tuple sets ≻PS : 2HPref × 2HPref is introduced. Though this is
not the focus of this paper, in Section 4.2 we treat their complexity and briefly discuss
how methods from the relational databases literature can be applied to answer them.


4   Preference-Based Query Answering
In this section, we propose algorithms for answering queries to PrefDatalog+/– ontolo-
gies; we start with DAQs and then move on to the conjunctive case.

4.1 Disjunctive Atomic Queries
We begin by introducing a structure that is used in preference-based query answering.
Next, we investigate algorithms for skyline and k-rank query answering for DAQs.
The Preference-Augmented Chase (prefChase). In order to compute skyline and k-
rank answers to queries over a PrefDatalog+/– ontology KB = (⟨D, Σ⟩, P ), we make
use of a data structure called the chase forest for D, Σ, and query Q [5], which is defined
as the directed graph consisting of (i) for every a ∈ D, one node labeled with a, and (ii)
for every node labeled with a ∈ chase(D, Σ) and for every atom b ∈ chase(D, Σ) that
is obtained from a and possibly other atoms by a one-step application of a TGD σ ∈ Σ
with a as guard, one node labeled with b along with an edge from the node labeled
with a. Here, we propose an augmented chase forest, comprised of the necessary fi-
nite part of the chase forest relative to a given query that is augmented with an ad-
ditional kind of edge that we call preference edges, which occur between nodes la-
beled with a, b ∈ chase(D, Σ) iff a ≻P b. Finally, when an edge is introduced between
nodes whose labels satisfy Q, the node with the incoming edge is marked. Figure 1
 Algorithm 1: Skyline(KB = (⟨D, Σ⟩, P ), Q)
 Input: PrefDatalog+/– ontology KB and DAQ Q(X).
 Output: Set of skyline answers {a1 , ..., ak } to Q.
  1. Initialize Res as an empty set of ground atoms;
  2. C:= prefChase(KB , Q);
  3. For each atom a labeling node v in C do begin
  4. If a |= Q and v is unmarked then Res:= Res ∪ {a};
  5. end;
  6. return Res.

Fig. 2. An algorithm that computes the set of skyline answers (in atom form) to an atomic query.


shows prefChase(KB , Q) for the PrefDatalog+/– ontology from Example 3; for illus-
trative reasons, markings for three different queries have been included in the figure:
book(X, Y ) (check mark), educ(X) (star), and vidGame(X) (diamond).

Theorem 1. If KB = (⟨D, Σ⟩, P ) is a guarded PrefDatalog+/– ontology and Q a
DAQ, prefChase(KB , Q) can be computed in time O(n2 ) in the data complexity.

Skyline Queries. Algorithm Skyline (Fig. 2) begins by computing the preference-
augmented chase forest relative to the input ontology and query and then simply ana-
lyzing the set of marked nodes (i.e., those that are dominated by other tuples relative to
the query). As we can see, the node markings have done almost all of the work towards
answering the skyline query; the algorithm only needs to go through the structure and
find the nodes whose labels satisfy the query and, if unmarked, add them to the output.

Example 5. Consider again the setup in Example 4 and Figure 1. For query Q1 , the
only unmarked atom is book(b1 , a), and so the skyline answer is the singleton set with
this atom. For Q2 , we get all the educ atoms except educ(b3 ).                       


Theorem 2. Let KB be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. Then,
(i) Algorithm Skyline correctly computes the skyline answers to Q, and (ii) Algo-
rithm Skyline runs in time O(k) in the data complexity, where k is the time needed
to compute prefChase(KB , Q) in the data complexity.

Corollary 1. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ, and k >
0. If O is a guarded Datalog+/– ontology, then the set of skyline answers to Q can be
computed in polynomial time in the data complexity.

k-Rank Queries. Algorithm k-Rank (Figure 3) also begins by computing the preference-
augmented chase forest. The main while loop iterates through the process of comput-
ing the skyline answers to Q by using the Skyline algorithm, updating the result by
appending these answers in arbitrary order, and removing the nodes and edges involved
in the result from the chase forest; the loop finally prepares for the next iteration by
updating the markings in the forest. Once the loop is finished, the algorithm returns the
first k results, since the last iteration might add superfluous elements.
 Algorithm 2: k-Rank(KB = (⟨D, Σ⟩, P ), Q, k)
 Input: PrefDatalog+/– ontology KB , DAQ Q(X), k > 0.
 Output: k-rank answer ⟨a1 , ..., ak′ ⟩ to Q.
  1. Initialize Res as an empty vector of ground atoms;
  2. C:= prefChase(KB , Q);
  3. i:= k;
  4. While i > 0 do begin
  5. S:= Skyline(KB , Q);
  6. Append S to Res (arbitrary order);
  7. Remove S from C along with all associated edges;
  8. Identify marked nodes that no longer have incoming
               preference edges and change them to unmarked;
  9. i:= i − |S|;
 10. End;
 11. Return truncate(Res, k).

        Fig. 3. An algorithm for obtaining a k-rank answer (in atom form) to DAQ Q.


Example 6. Consider the setup in Example 4 and the structure in Figure 1. The node
book(b1 , a) is the first skyline answer; Algorithm k-Rank then proceeds to update Res
with ⟨book(b1 , a)⟩ and to remove this node (and associated edges) from the forest;
according to Line 8, node book(b2 , a) now becomes unmarked. The algorithm continues
until the three book atoms are processed.                                            


Theorem 3. Let KB be a PrefDatalog+/– ontology, Q be a DAQ, and k > 0. Then,
(i) Algorithm k-Rank correctly computes a k-rank answer to Q, and (ii) Algorithm k-
Rank runs in time O(k) in the data complexity, where k is the time needed to compute
prefChase(KB , Q) in the data complexity.

Corollary 2. Let KB = (O, P ) be a PrefDatalog+/– ontology, Q be a DAQ, and k >
0. If O is a guarded Datalog+/– ontology, then a k-rank answer to Q can be computed
in polynomial time in the data complexity.


4.2   Conjunctive Queries

As discussed at the end of Section 3, non-atomic conjunctive queries require a prefer-
ence relation that is defined between sets of tuples—in this section we assume that we
have an extension of the preference formulas formalism PS that yields such a relation;
a detailed treatment of this extension is outside the scope of this paper. Unfortunately,
the following result tells us that this change has a direct impact on the complexity of
both skyline and k-rank query answering.

Theorem 4. Let KB = (⟨D, Σ⟩, PS) be a guarded PrefDatalog+/– ontology, where PS
describes a preference relation ≻PS : 2HPref × 2HPref such that membership can be tested
in polynomial time, and Q be a CQ. If the relational schema and Σ are fixed, deciding
if the set of skyline answers or a k-rank answer to Q are non-empty is Σ2p -complete.
    The heuristic algorithms presented in [12] for relational databases can be applied to
PrefDatalog+/– by computing the prefChase relative to the query and thus materializ-
ing the necessary part of the ontology into a database. The following result provides a
way to leverage tools developed for relational databases for first order rewritable frag-
ments; we use prefQ-DPM to denote either Skyline or k-Rank, and prefQ-RDB the
corresponding algorithm for relational databases.
Theorem 5. Let KB = (⟨D, Σ⟩, PS) be a PrefDatalog+/– ontology and Q a CQ. If
⟨D, Σ⟩ belongs to an FO-rewritable fragment of Datalog+/– and all F ∈ PS are such
that cond(F ) are conjunctions of atoms, then we can compute PS′ and Q′ in PTIME
such that prefQ-DPM(KB , Q) = prefQ-RDB(D, PS′ , Q′ ).
Intuitively, the theorem is a consequence of the fact that for each F ∈ PS, cond(F ) can
be treated like a BCQ and thus rewritten relative to ΣT . This result is important since
it identifies conditions under which preference-based queries can be answered using
highly optimized relational database management systems.


5   Summary and Outlook
In this work, we have presented an extension of the Datalog+/– family of ontology lan-
guages for preference-based query answering, a topic that has recently become central
in dealing with data related to the Social Semantic Web. We focused on two well-known
kinds of preference-based queries, namely skyline and k-rank (a generalization of tradi-
tional top-k queries), and studied algorithms and complexity of computing their answers
when the underlying classical queries are either DAQs or CQs. For DAQs, we proposed
an augmented version of the chase forest, and showed how this structure can be used to
answer both kinds of preference-based queries in polynomial time in the data complex-
ity whenever classical query answering is PTIME in the underlying Datalog+/– ontol-
ogy. For CQs, we showed that this desirable property is lost, and note that related work
in relational databases can however be applied in order to leverage heuristics—we also
provided a result for leveraging an RDBMS for FO-rewritable fragments of Datalog+/–.
Current and future work involves implementing and testing the PrefDatalog+/– frame-
work, and exploring specific heuristics and other approximation techniques that can be
applied to answer preference-based CQs scalably.
Acknowledgments. A slightly longer version of this paper appears in the proceed-
ings of IJCAI 2013. This work was supported by UK EPSRC grant EP/J008346/1–
“PrOQAW”, ERC Grant 246858–“DIADEM”, and a Yahoo! Research Fellowship.


References
 1. C. Beeri and M. Y. Vardi. The implication problem for data dependencies. In Proc. ICALP-
    1981, volume 115 of LNCS, pages 73–85. Springer, 1981.
 2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Sci. Am., 284(5):34–43,
    2002.
 3. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In Proc. ICDE-2001,
    pages 421–430, Washington, DC, USA, 2001. IEEE Computer Society.
 4. A. Calı̀, G. Gottlob, and M Kifer. Taming the infinite chase: Query answering under expres-
    sive relational constraints. In Proc. KR-2008, pages 70–80. AAAI Press, 2008.
 5. A. Calı̀, G. Gottlob, and T. Lukasiewicz. A general Datalog-based framework for tractable
    query answering over ontologies. J. Web Sem., 14:57–83, 2012.
 6. J. Chomicki. Preference formulas in relational queries. ACM Trans. Database Syst.,
    28(4):427–466, December 2003.
 7. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic programming. In Proc.
    ICLP-1995, pages 731–745, 1995.
 8. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference queries in deductive databases.
    New Generat. Comput., 19(1):57–86, 2001.
 9. M. Lacroix and P. Lavency. Preferences: Putting more knowledge into queries. In Proc.
    VLDB-1997, volume 87, pages 1–4, 1987.
10. T. Lukasiewicz and J. Schellhase. Variable-strength conditional preferences for ranking ob-
    jects in ontologies. J. of Web Sem., 5(3), 2007.
11. K. Stefanidis, G. Koutrika, and E. Pitoura. A survey on representation, composition and
    application of preferences in database systems. ACM Trans. Database Syst., 36(3):19:1–
    19:45, August 2011.
12. X. Zhang and J. Chomicki. Preference queries over sets. In Proc. ICDE-2011, pages 1019–
    1030, 2011.