<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Answering Ontological Ranking Queries Based on Subjective Reports</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Department of Computer Science</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>University of Oxford</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>thomas.lukasiewicz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>vanina.martinez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>livia.predoiu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gerardo.simarig@cs.ox.ac.uk</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIMES, Universita` della Calabria</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The use of preferences in query answering, both in traditional databases and in ontology-based data access, has recently received much attention, due to its many real-world applications. In this paper, we tackle the problem of query answering in Datalog+/- ontologies subject to the querying user's preferences and a collection of subjective reports (i.e., scores for a list of features) of other users, who have their own preferences as well. All these pieces of information are combined to rank the query results. We first focus on the problem of ranking atoms in a database by leveraging reports and customizing their content according to the user's preferences. Then, we extend this approach to deal with ontological query answering using provenance information. Though the general problem is shown to have an exponential-time data complexity upper bound, we propose a special case that has polynomial time data complexity.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The use of preferences in query answering, both in traditional databases and in
ontologybased data access, has recently received much attention due to its many real-world
applications. In particular, in recent times, there has been a huge change in the way data
is created and consumed, and users have largely moved to the Social Web, a system of
platforms used to socially interact by sharing data and collaborating on tasks.</p>
      <p>In this paper, we tackle the problem of preference-based query answering in
Datalog+/– ontologies assuming that the user must rely on subjective reports to get a
complete picture and make a decision. This kind of situation arises all the time on the Web;
for instance, when searching for a hotel, users provide some basic information and
receive a list of answers to choose from, each associated with a set of subjective reports
(often called reviews) written by other users to tell everyone about their experience. The
main problem with this setup, however, is that users are often overwhelmed and
frustrated, because they cannot decide which reviews to focus on and which ones to ignore,
since it is likely that, for instance, a very negative (or positive) review may have been
produced on the basis of a feature that is completely irrelevant to the querying user.</p>
      <p>We study a formalization of this process and its incorporation into preference-based
query answering in Datalog+/– ontologies, proposing a framework for user-tailored
query answers on the basis of the ontology reports and users’ preferences.
The following are the main contributions of this paper:
– We present an approach to preference-based query answering in Datalog+/–
ontologies, given a collection of subjective reports containing scores for a list of features.
– We first propose a basic approach to rank atoms in the ontology’s database, which
combines reports, their authors’ preferences among the features, and the querying
user’s preferences among the features.
– We then extend our framework to ontologies with dependencies and propose a
method for dealing with query answering by leveraging provenance information
for propagating reports from the database atoms to newly inferred ones.
– As we are using a kind of “how”-provenance modeled using semirings, we can
map the general semiring to more specific ones that capture different ways in which
the information contained in reports can be leveraged, Though the general case is
exponential, we explore one such mapping for which ranking query answers can be
done in polynomial time data complexity.</p>
      <p>The rest of this paper is organized as follows. In Section 2, we provide some
preliminaries on Datalog+/– and the used preference models. Section 3 then defines subjective
reports and proposes a basic framework to deal with simple (no dependencies) Datalog+/–
ontologies. Section 4 considers ontologies with dependencies and addresses query
answering. Section 5 discusses related work, and Section 6 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        First, we briefly recall some basics on Datalog+/– [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], namely, on relational databases,
(Boolean) conjunctive queries ((B)CQs), tuple-generating dependencies (TGDs),
negative constraints, the chase, and ontologies in Datalog+/–.
      </p>
      <p>We assume (i) an infinite universe of (data) constants dom (which constitute the
“normal” domain of a database), (ii) an infinite set of (labeled) nulls N (used as
“fresh” Skolem terms, which are placeholders for unknown values, and can thus be seen
as variables), and (iii) an infinite set of variables V (used in queries, dependencies, and
constraints). Different constants represent different values (unique name assumption),
while different nulls may represent the same value. We assume a lexicographic order
on dom [ N , with every symbol in N following all symbols in dom. We denote
by X sequences of variables X1; : : : ; Xk with k &gt; 0. We assume a relational schema R,
which is a finite set of predicate symbols (or simply predicates). A term t is a constant,
null, or variable. An atomic formula (or atom) a has the form P (t1; :::; tn), where P
is an n-ary predicate, and t1; :::; tn are terms. A database (instance) D for a relational
schema R is a (possibly infinite) set of atoms with predicates from R and arguments
from dom.</p>
      <p>Given a relational schema R, a tuple-generating dependency (TGD) is a
firstorder formula 8X8Y (X; Y) ! 9Z (X; Z), where (X; Y) and (X; Z) are
conjunctions of atoms over R (without nulls), called the body and the head of ,
denoted body ( ) and head ( ), respectively. Satisfaction of TGDs are defined via
homomorphisms, which are mappings : dom [ N [ V ! dom [ N [ V such that (i)
c 2 dom implies (c) = c, (ii) c 2 N implies (c) 2 dom [ N , and (iii) is
naturally extended to atoms, sets of atoms, and conjunctions of atoms. A TGD is satisfied
in a database D for R iff, whenever there exists a homomorphism h that maps the
atoms of (X; Y) to atoms of D, there exists an extension h0 of h that maps the atoms
of (X; Z) to atoms of D. A TGD is guarded iff it contains an atom in its body that
contains all universally quantified variables of . All sets of TGDs are finite here. Since
TGDs can be reduced to TGDs with only single atoms in their heads, in the sequel,
every TGD has w.l.o.g. a single atom in its head.</p>
      <p>
        A conjunctive query (CQ) over R has the form Q(X) = 9Y (X; Y), where (X; Y)
is a conjunction of atoms (possibly equalities, but not inequalities) with the variables X
and Y, and possibly constants, but without nulls. A Boolean CQ (BCQ) over R is a CQ
of the form Q(), often written as the set of all its atoms, without quantifiers. The set of
answers for a CQ Q to D and , denoted ans (Q; D; ), is the set of all tuples a such
that a 2 Q(B) for all B 2 mods (D; ). The answer for a BCQ Q to D and is Yes,
denoted D [ j= Q, iff ans (Q; D; ) 6= ;. Note that query answering under general
TGDs is undecidable [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], even when the schema and TGDs are fixed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Decidability
of query answering for the guarded case follows from a bounded tree-width property.
The data complexity of query answering in this case is P-complete. We refer to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for
their details.
      </p>
      <p>
        The chase algorithm for a database D and a set of TGDs consists of an exhaustive
application of the TGDs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in a breadth-first (level-saturating) fashion, which outputs
a (possibly infinite) chase for D and . The (possibly infinite) chase relative to TGDs
is a universal model, i.e., there exists a homomorphism from chase (D; ) onto every
B 2 mods (D; ) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This implies that BCQs Q over D and can be evaluated on the
chase for D and , i.e., D [ j= Q is equivalent to chase (D; ) j= Q. For guarded
TGDs , such BCQs Q can be evaluated on an initial fragment of chase (D; ) of
constant depth k jQj, which is possible in polynomial time in the data complexity.
Datalog+/– Ontologies. A Datalog+/– ontology KB = (D; ), where = T [ E [
      </p>
      <p>
        NC, consists of a database D, a set of TGDs T , a set of EGDs E , and a set of
negative constraints NC. In order to ensure decidability and tractability of query
answering, we assume that EGDs are separable, which means that the interaction between
TGDs and EGDs is controlled—this condition can be ensured by the syntactic criterion
of non-conflicting keys; for details on these conditions, we refer the reader to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Finally, we say that KB is guarded iff T is guarded. The following example illustrates
a simple Datalog+/– ontology, used in the sequel as a running example.
Example 1. Consider the following ontology KB = (D;
):
D = fid1 : apthotel(h1); id2 : hotel(h2);
id3 : bb(bb1); id4 : hostel(hs1)g;
= f 1 : hotel(H ) ! accom(H );
3 : bb(B) ! accom(B);
5 : apthotel(A) ! apartment(A);
2 : apartment(A) ! accom(A);
4 : apthotel(A) ! hotel(A);
6 : hostel(H ) ! accom(H )g:
This ontology models a very simple accommodation booking domain, which can be
used as the underlying model in an online system. Accommodations can be either
hotels, apartments, B&amp;Bs, hostels, or aparthotel. Moreover, an aparthotel is both a hotel
and an apartment. Database D provides some example instances.
      </p>
      <p>Preference Models. A preference relation over a set S is a strict partial order (SPO)
over S, i.e., an irreflexive and transitive binary relation over S—we consider these to
be the minimal requirements for a useful preference relation. If a b, we say that a is
preferred to b. The indifference relation induced by is defined as follows: for any
a; b 2 S; a b iff a 6 b and b 6 a.</p>
      <p>A stratification of S relative to is an ordered sequence hS1; : : : ; Ski, where each
Si is a maximal subset of S such that for every a 2 Si there is no b 2 Sjk=i Sj with
b a. Intuitively, S1 contains the most preferred elements in S relative to , i.e., those
elements a 2 S for which there is no b 2 S such that b a. Then, S2 contains the
most preferred elements of S S1, and so on so forth. Notice that a stratification always
exists and is unique—moreover, it is a partition of S. Each Si is called a stratum.
3</p>
    </sec>
    <sec id="sec-3">
      <title>A Framework for Ranking Atoms based on Subjective Reports</title>
      <p>In this section, we propose a framework that allows us to produce a ranking of atoms
in a knowledgebase by leveraging subjective reports. For now, we restrict our attention
to the case where the set of dependencies is empty—we generalize our framework to
deal with dependencies and ontological query answering in the next section.</p>
      <p>More specifically, we consider the following setting. We are given a Datalog+/–
ontology KB = (D; ), with = ;, where ground atoms in D are associated with
reports provided by observers. A report is an evaluation for an entity of interest that
specifies a score for different features, which are the dimensions along which entities
can be evaluated. Also, each observer expresses a preference relation over the features.
Finally, a user looking at the ontology also has her own preference relation over the
features and wishes to rank the atoms in the ontology on the basis of the reports, her
preferences, as well as the observers’ preferences.</p>
      <p>Example 2. Consider KB = (D; ;) where D is the database from Example 1—recall
that predicate symbols refer to different types of accommodations. Accommodations
might be rated relative to location, cleanliness, price, breakfast, and internet; in this
case, these would be the features of interest. Observers can leave reports for the
accommodations in the knowledgebase, with each report specifying one score per feature.
Each observer has also a preference relation over the features specifying their
importance from the observer’s point of view. A user, who also has her own preference
relation over the features, wishes to rank the atoms in the knowledgebase (that is, all the
atoms in D), taking into account all the elements discussed above.</p>
      <p>Features and Reports. We assume the existence of an ordered sequence of features
hf1; : : : ; fmi, each of which has a domain dom (fi) = [0; 1] [ f g. We use F to denote
the set of features ff1; : : : ; fmg.3 The values in a feature domain are possible scores
that can be given for the feature, with “ ” meaning that no score is given. The binary
relation &gt; over dom (fi) is defined in the standard way with the additional requirement
that v &gt; for every v 2 [0; 1].
3 Throughout this section, we consider a single set of features shared by all predicate symbols;
this limitation is removed in the following section.</p>
      <p>A report is an element of dom(f1) dom(fm). We use Reports to denote
the set of all possible reports, i.e., Reports = dom(f1) dom(fm). A report
given by a certain observer for a ground atom A specifies m “scores” representing an
evaluation of A for each feature.</p>
      <p>Multiple reports, coming from different observers, can be associated with the same
ground atom—each report expresses the rating given to the ground atom by a specific
observer. Thus, we assume we have N observers and N partial functions i : D !
Reports, called report functions, for 1 6 i 6 N . Given an observer 1 6 i 6 N and a
ground atom A 2 D, if i(A) is defined, it denotes the report given by observer i to A;
if undefined, this means that observer i has no report for A (note that we assume that an
observer can have at most one report for a given ground atom).</p>
      <p>Example 3. Consider the accommodation booking domain of Example 2. Suppose the
features with respect to which accommodations are rated are hlocation; cleanliness;
price; breakfast; interneti; in the following, we abbreviate these features as loc; cl; pri;
br; net, respectively. Suppose we have 3 observers and that the i’s are as follows:
loc cl pri br net</p>
      <p>For instance, 1 says that the first observer assigned the report h0:8; 0; 0:4; ; 0:6i to
atom id1, which means that the score given to id1 relative to location (resp., cleanliness,
price, internet) is 0:8 (resp., 0, 0.4, 0.6), while no score is given for breakfast.
Comparing Reports. Given a report r = hs1; : : : ; smi, we use r[i] to refer to si. Given
two reports r1, r2 and a set of features F 0 F , we write
1. r1[F 0] = r2[F 0] iff r1[i] = r2[i] for every fi 2 F 0;
2. r1[F 0] &gt; r2[F 0] iff r1[i] &gt; r2[i] for every fi 2 F 0;
3. r1[F 0] &gt; r2[F 0] iff r1[F 0] &gt; r2[F 0] and r1[j] &gt; r2[j] for some fj 2 F 0.</p>
      <p>Now, a user inspecting the ontology might have preferences among features—for
instance, location and price might be the most preferred. We thus allow a user to
specify a preference relation over the features in order to influence the comparison of reports
and eventual ranking of atoms. The following definition formalizes the comparison
between two reports r1 and r2 relative to a preference relation over F . Intuitively, if
F1 is the first stratum of the stratification of F with respect to , and r1[F1] &gt; r2[F1],
then r1 &gt; r2. If r1[F1] = r2[F1], we compare r1 and r2 relative to the second stratum
F2 and if r1[F2] &gt; r2[F2], then r1 &gt; r2. If r1[F2] = r2[F2], then we compare r1 and
r2 over the third stratum, and so on.</p>
      <p>Definition 1. Consider two reports r1, r2 and a preference relation over F . Let
F1; : : : ; Fk be the stratification of F . We say r1 &gt; r2 relative to iff there exists
1 6 i 6 k such that: (i) r1[Fi] &gt; r2[Fi], and (ii) r1[Fj ] = r2[Fj ] for 1 6 j &lt; i.
 
loc
pri
cl
 
net
br
loc
net
br
cl
pri</p>
      <p>Example 4. Consider the reports of Example 3 and let F be the set of features therein.
Suppose a user has the preference relation u over F of Figure 1, where an edge from
a to b means that a u b. The stratification of F relative to u consists of three strata:
F1 = floc; netg, F2 = fclg, and F3 = fpri; brg. Consider now two reports: r1 = 1(id1)
and r2 = 1(id2). Then, we have r1 &gt; r2, since r1[F1] &gt; r2[F1].</p>
      <p>Each observer can have a preference relation over the set of features in much the
same way as the user looking at the ontology. Thus, we assume the existence of N such
preference relations, denoted with 1; : : : ; N (one for each observer).</p>
      <p>Ranking Atoms. Our goal is to obtain a ranking of the ground atoms in the
knowledgebase on the basis of: (i) N report functions 1; : : : ; N (one for each observer), (ii) N
preference relations 1; : : : ; N over F (one for each observer), and (iii) a preference
relation u over F (the user’s preferences). For this, we adopt a two-phase approach:
1. Generate a preference relation over D for each observer i on the basis of i and u,
thereby obtaining a preference relation over ground atoms that takes into account
the i-th observer’s scores and is “customized” according to the user’s preferences.
2. Then, the N preference relations obtained in the previous step are combined into
one by taking into account how “relevant” the observers’ feature preferences are
relative to the user’s preferences.</p>
      <p>Definition 2. Let i be a report function and u the user’s preference relation over F .
For all A1; A2 2 D, A1 &gt;i A2 iff (i) i(A1) is defined and i(A2) is undefined; or (ii)
both r1 = i(A1) and r2 = i(A2) are defined and r1 &gt; r2 relative to u.
The following example shows how atoms in the database can be ranked in our running
scenario by each observer when the user’s preferences are also incorporated.
Example 5. Consider the reports of Example 3 and the user’s preference relation u
over the features of Figure 1. Then, for each observer, we can determine an order among
ground atoms based on the observer’s reports, customizing it relative to the user’s
feature preferences. Specifically, we get:</p>
      <p>Observer 1: id1 &gt;1 id2; id4 &gt;1 id2; id4 &gt;1 id3;
Observer 2: id1 &gt;2 id2; id3 &gt;2 id4;</p>
      <p>
        Observer 3: id2 &gt;3 id1; id4 &gt;3 id1:
After obtaining a preference relation &gt;i over D for each observer i, we need to combine
them into a single one. In this second step, we want to take into account how “relevant”
the observers’ feature preferences are given the user’s feature preferences. Thus, we
assume the existence of a relevance function which takes as input two preference
relations over F (the user preference relation and the preference relation of an observer)
and gives as output a value in [0; 1], measuring how similar the two preference
relations are (e.g., various distance measures over fully and partially specified preference
structures have been proposed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]).
      </p>
      <p>Below, we provide a general definition of an operator that combines a set of
preference relations where each is associated with a relevance value.</p>
      <p>Definition 3. A preference aggregation operator is a function that takes as input a set
P of pairs h&gt;i; ii where &gt;i is a preference relation and i 2 [0; 1], and returns a
preference relation &gt; such that if a &gt;i b for every &gt;i appearing in P then a &gt; b.
For instance, when combining the &gt;i’s we may want to give more importance to the
more relevant ones, i.e., those with higher i value, and give the same importance to
&gt;i’s with the same i value.</p>
      <p>Example 6. Consider again the setting from Example 5, and suppose the observers’
preference relations over the features are such that the relevance function yields 0.8
for observers 1 and 2, and 0.3 for observer 3. An example of preference aggregation
operator can be the one that first combines &gt;1 and &gt;2 using, e.g., Pareto composition,
thereby obtaining id1 &gt;1 2 id2 and id4 &gt;1 2 id2. Then, it combines &gt;1 2 with &gt;3
with prioritized composition giving more importance to &gt;1 2 (because of the higher
relevance of the first two observers) thereby obtaining a final order among atoms &gt; as
follows: id1 &gt; id2, id4 &gt; id2, and id4 &gt; id1.</p>
      <p>Definition 4. Given a Datalog+/– ontology KB = (D; ), N report functions 1; : : : ; N ,
N preference relations 1; : : : ; N over F , a preference relation u over F , a
relevance function , and a preference aggregation operator agg, we define a ranking for
KB as agg(fh&gt;1; ( 1; u)i; : : : ; h&gt;N ; ( N ; u)ig).</p>
      <p>Notice that in the previous definition each &gt;i is obtained as per Definition 2. Once
a partial order over the ground atoms is obtained, a total order over this set can easily
be derived by computing one of its topological sortings.</p>
      <p>Proposition 1. The worst-case time complexity of computing a ranking for a Datalog+/–
ontology (D; ) is km2 + N (mn2 + f + logN ) + faNgg , where m is the number of
features, n = jDj, N is the number of observers, k = j uj, f is the worst-case time
complexity of the relevance function, and faNgg is is the worst-case time complexity of
the adopted preference aggregation operator over N preference relations.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Ontological Query Answering based on Subjective Reports</title>
      <p>Up to now, we have considered simple knowledgebases (D; ;), without directly
addressing ontological query answering. We now generalize our framework to deal with
conjunctive query answering over ontologies containing tuple-generating dependencies.</p>
      <p>
        The application of such dependencies in can generate new atoms, to which we
need to “propagate” reports associated with the ground atoms in D that participated in
the creation. Therefore, besides being able to compute answers to queries in the
classical manner, we must also provide a way of relating query answers to the information
contained in the reports associated with atoms that “contributed” to them—this suggests
the need to use an adequate data provenance representation. There is a variety of
approaches that have been proposed in the formalization of provenance and lineage [
        <xref ref-type="bibr" rid="ref12 ref6">12,
6</xref>
        ]. In this work, we adopt a special case of the framework from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], where
“how”provenance is formalized through semirings of polynomials; this is a very general and
flexible formalism that we adapt to our needs, as discussed next.
      </p>
      <p>A Data Provenance Model. Consider an ontology KB = (D; ). Recall that every
ground atom A 2 D has an id, denoted id(A); let X = fid(A) j A 2 Dg and S =
f0; 1g, we define a semiring of polynomials (S[X]; +; ; 0; 1), with variables from X
and coefficients from S, with operations + and being idempotent, associative, and
commutative, and distributing over +. Note that 1 (resp. 0) is the identity element
of (resp. +). The provenance information is modeled as annotations to atoms in H,
defined by a function Ann : H 7! S[X], i.e., Ann maps ground atoms to polynomials in
the semiring. This approach allows us to record the provenance information by means
of symbolic expressions over the ids of atoms using the semiring operations, which
in turn allows us to model not only information about which atoms contributed to the
answer’s computation but also how they contributed.</p>
      <p>
        The Guarded Chase Forest. Note that, as shown in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], commutative semirings are not
enough to correctly model the propagation of provenance annotations that arise from
derivations in Datalog; this is the case because derivations in Datalog (and therefore in
Datalog+/– ontologies) can be infinite, which requires semirings with infinite sums. For
this reason, we focus on guarded Datalog+/– ontologies, and instead of computing the
provenance annotations over the chase itself, we compute them over the necessary finite
part of the guarded chase forest (introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). It is sufficient to consider only the
finite part of the chase, as for every derivation outside, there is an isomorphic one inside
the finite part (which follows from the results in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). In the following, we show how
the guarded chase forest can be extended to consider provenance annotations.
      </p>
      <p>
        The guarded chase forest for D and contains (i) a node nA for each atom A 2 D,
having label(nA) = A, and (ii) for any two atoms A; B 2 chase(D; ) there are two
nodes nA and nB with label(nA) = A and label(nB) = B along with an arrow from
nA to nB iff B is obtained from A and possibly other atoms by a one-step application
of a TGD 2 with A as guard. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], it has been shown that whenever homomorphic
images of a CQ Q(X) are contained in chase(D; ), then they are also contained in
a finite, initial portion of the guarded chase forest, whose size is determined only by
the query and the schema. Furthermore, for guarded Datalog+/– ontologies, the whole
derivation of the query atoms is also contained in such a portion of the forest. This
means that we can construct the provenance annotations based on this finite portion
of the chase forest. For this, we associate with each node n a provenance annotation
ann ch(n); the forest is computed in the following way: first, for all nodes n labeled
with atoms A 2 D, we have ann ch(n) = id(A). Then, each time the chase rule is
applied via a TGD : (X; Y) ! 9Z (X; Z) to atoms labeling the nodes of the
guarded chase forest constructed thus far (let these nodes be n1; : : : ; nk), the
annotation of the new node n0 labeled with the atom to which (X; Z) is mapped is defined
as ann ch(n0) = Q16i6k ann ch(ni).
      </p>
      <p>The guarded depth of an atom A in the guarded chase forest for D and , denoted
depth(A), is the minimum length of a path from D to a node labeled with A in the forest.
Then, the guarded chase of level up to k &gt; 0 for D and , denoted g chasek(KB ),
is the set of all atoms in the guarded chase forest of depth at most k. Given a CQ
Q, we denote with g chase (KB ; Q) the part of the chase forest needed to answer
Q; as pointed out above, it has been shown that for guarded Datalog+/– ontologies
g chase (KB ; Q) is finite and its size (i.e., ) depends only on Q and KB .
Definition 5. Let KB = (D; ) be a Datalog+/– ontology and (S[X]; +; ; 0; 1) a
commutative doubly idempotent semiring of polynomials. We define the annotation
function Ann : H ! S[X] as follows: for each atom A 2 H, if KB j= A then Ann(A) =
Pn2g chase (KB;A)s:t: label(n)=A ann ch(n), otherwise Ann(A) = 0.</p>
      <p>Note that H can be infinite as can the guarded chase forest; however, the support
for function Ann is defined as supp(Ann) = fA j Ann(A) 6= 0g, which is clearly finite
in g chase (KB ; Q). Note that the annotations depend on the query and the schema.
A Note about Idempotent Operators. Though in general the assumption of idempotent
+ and operators is not made in the context of lineage expressions, for our purposes
it makes sense to adopt it since: (i) The + operator is used to express the fact that
an answer can be derived in more than one way—if two terms in the polynomial are
identical, we only need to keep track of one of them since they both represent the
same information regarding which reports should be considered (cf. the derivation of
atom accom(h1) in Example 7). (ii) The operator is used to keep track of which
atoms participated in a single derivation of an answer—if identical atoms appear in
an expression, we only need to keep track of the atom’s presence since, as before, we
only need to know which reports are associated with the atom and not how many times
the atom participated in the derivation; thus, we have that a a = a. Therefore, all
exponents and coefficients in our lineage expressions will be equal to 1.
Example 7. Consider the ontology KB = (D; ) obtained from the one of Example 1
by adding the following atoms to the database:
id5 : pub(p1); id8 : locIn(h1; oxford; st giles);
id6 : pub(p2); id9 : locIn(p1; oxford; st giles);
id7 : pub(p3); id10 : locIn(bb1; oxford; queen st);
id11 : locIn(p2; oxford; st giles);
id12 : locIn(p3; oxford; queen st):
Then, each ground atom in D is annotated with its id, i.e., Ann(apthotel(h1)) = id1,
Ann(hotel(h2)) = id2, etc. The following atoms, along with their annotations, are
derived from D and :</p>
      <p>Ann(accom(h1)) = id1 + id1 = id1
Ann(accom(bb1)) = id3
Ann(hotel(h1)) = id1</p>
      <p>Ann(accom(h2)) = id2
Ann(accom(hs1)) = id4
Ann(apartment(h1)) = id1
Query answers with lineage. Given the formalism described above, we can now
associate useful information about how query answers are computed.</p>
      <p>Definition 6. Let KB = (D; ) be a Datalog+/– ontology and Q be a CQ of the form
q(X) = 9Y Vik=1 pi(Xi; Yi). The annotation of a query answer A 2 ans(Q; KB ) is
defined as follows:</p>
      <p>Ann(A) =</p>
      <p>X
k</p>
      <p>Y Ann( pi(Xi; Yi))
2subs(Q;KB)^A= q(X) i=1
, where subs(Q; KB ) denotes the set of all substitutions such that KB j= pi(Xi; Yi)
for all i 2 f1; : : : ; kg.</p>
      <p>Example 8. Consider again the ontology from Example 7 and the query myaccom(X) =
9S; Y accom(X) ^ locIn(X; oxford; S) ^ pub(Y ) ^ locIn(Y; oxford; S), asking for
accommodations in Oxford such that there is a pub on the same street. There are two
query answers, namely myaccom(h1) with annotation (id1 id8 id5 id9) + (id1
id8 id6 id11) and myaccom(bb1) with annotation (id3 id10 id7 id12).</p>
      <p>Thus, each query answer A is associated with an annotation computed as follows:
Ann = Pin=1 Qjk=1 Annij . Each expression Qjk=1 Annij (1 6 i 6 n) keeps track
of one way of deriving A; then, all possible ways are combined with a summation.
We define S-Ann(Ann) = Qjk=1 Annij j 1 6 i 6 n . We also define the set of
annotated answers as ans(Q; KB ) = fhA1; Ann(A1)i; : : : ; hAn; Ann(An)ig, where
ans(Q; KB ) = fA1; : : : ; Ang. The set of possible set of answers is defined as:
fhA1; Ann01i; : : : ; hAn; Ann0nig j Ann0i 2 S-Ann(Ann(Ai)) for 1 6 i 6 n
Example 9. Returning to Example 8, There are two possible sets of answers, namely:
hmyaccom(h1); (id1 id8 id5 id9)i; hmyaccom(bb1); (id3 id10 id7 id12)i
hmyaccom(h1); (id1 id8 id6 id11)i; hmyaccom(bb1); (id3 id10 id7 id12)i :
We can now apply the framework proposed in the previous section to each possible
set of answers, and then combine the results obtained for all of them. But before that, we
need a way of determining the features of the query answers and the reports associated
with them. Of course, these should be obtained on the basis of the query structure and
the available provenance information.</p>
      <p>Each predicate symbol can be associated with a sequence of features. In the
following, given a conjunctive query Q, we assume there is an arbitrary but fixed criterion to
determine a sequence of features FQ for the query answers and assume a way of
generating a report for each query answer A on the basis of Ann(A) (in Propositions 2 and 3,
we assume that both can be accomplished in polynomial time). A concrete approach is
illustrated in the following example.</p>
      <p>Example 10. Consider the set of possible answers of Example 9:
hmyaccom(h1); (id1
id8
id5
id9)i; hmyaccom(bb1); (id3
id10
id7
id12)i
Suppose the features of accom are loc; cl; pri; br; net, while those of pub are food;
drink. The features of the query answers might be defined as the union of the
aforementioned ones, i.e., loc; cl; pri; br; net, food; drink—in general, there can be different
reasonable ways of combining the features of the predicate symbols in the query (in
our case, we considered only predicates accom and pub and took the union of their
features). The reports for the query answers may be derived by merging the reports of the
accom- and pub-atoms that contributed to each query answer. For instance, suppose we
have two observers and their reports are as follows:
1(id1) = h1; 0:7; 0:5; 1; i
1(id5) = h0:5; 1i
1(id3) = h0:8; 0:3; 0:2; 0:5; 0:7i
1(id7) not defined
: : :
2(id1) = h0:3; 0:4; 1; ; i
2(id5) = h0:7; 0:2i
2(id3) = h0:4; 0:5; 0:5; 0:6; 0:1i
2(id7) = h0:1; 0:8i
: : :
Thus, the reports of the first and second observers for the query answers are:
1(myaccom(h1)) = h1; 0:7; 0:5; 1; ; 0:5; 1i
1(myaccom(bb1)) = h0:8; 0:3; 0:2; 0:5; 0:7; ; i
2(myaccom(h1)) = h0:3; 0:4; 1; ; ; 0:7; 0:2i
2(myaccom(bb1)) = h0:4; 0:5; 0:5; 0:6; 0:1; 0:1; 0:8i
At this point, we can apply the framework of the previous section.</p>
      <p>Once we get a preference relation for each possible set of answers, they can be
combined using a preference aggregation operator, and a total order over this set can
easily be derived (e.g., as before, via topological sorting). Thus, eventually, we obtain a
ranking over the query answers.</p>
      <p>The following proposition provides an upper bound on the (data) complexity of
computing query answer rankings. The exponential time upper bound is due to the fact
that the number of possible sets of answers can be exponential in the worst case.
Proposition 2. Computing a query answer ranking can be done in exponential time in
the data complexity.</p>
      <p>Alternatively, we can use the annotations in a different way. For instance, we can
use a function that maps the id of an atom to a value in the [0; 1] interval such that it
compiles all reports (from different observers) associated with the atom into a single
score (in Proposition 3 below, we assume that such a function can be computed in
polynomial time); this score may represent, for instance, the overall relevance of the
atom for the user based on the reports and the relative importance of each dimension.
The scores can be combined through the min (resp., max) operator when (resp., +)
is encountered in the annotation. This yields the application of a commutative semiring
(R+61, min, max, 1, 0). This evaluation of reports is called extensional.</p>
      <p>The following proposition provides an upper bound on the (data) complexity of
computing query answer rankings under the extensional evaluation of reports.
Proposition 3. Computing a query answer ranking under the extensional evaluation of
reports can be done in polynomial time in the data complexity.</p>
      <p>The polynomial time complexity of the extensional approach follows from the fact
that each of the following tasks can be accomplished in polynomial time: computing
the provenance information, mapping atoms’ ids to scores, and combining the scores.</p>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>
        The study of preferences has been carried out in many disciplines; in computer science,
the developments that are most relevant to our work is in the incorporation of
preferences into query answering mechanisms. To date (and to our knowledge), the state
of the art in this respect is centered around relational databases and, recently, in
ontological languages for the Semantic Web [15]. The seminal work in preference-based
query answering was that of [14], in which the authors extend the SQL language to
incorporate user preferences. The preference formula formalism was introduced in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
as a way to embed a user’s preferences into SQL. An important development in this
line of research is the well-known skyline operator, which was first introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
A recent survey of preference-based query answering formalisms is provided in [17].
The main difference is that in this paper we have ontologies, i.e., roughly speaking, we
are combining preferences in DBs with ontology-based data access. The problem of
evaluating ranked top-k queries in the context of ontology-based access over relational
databases was considered in [18]. Studies of preferences related to our approach have
also been done in classical logic programming [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] as well as answer set programming
frameworks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The present work can be considered as a further development of the PrefDatalog+/–
framework presented in [15], where we develop algorithms to answer skyline queries,
and their generalization to k-rank queries, over classical Datalog+/– ontologies. The
main difference between PrefDatalog+/– and the work presented here is that
PrefDatalog+/– assumes that a model of the user’s preferences is given at the time the query
is issued. On the other hand, we make no such assumption here; instead, we assume
that the user only provides some very basic information regarding her preferences over
certain features, and has access to a set of reports provided by other users in the past. In
a sense, this approach is akin to building an ad hoc model on the fly at query time and
using it to provide a ranked list of results.</p>
      <p>
        Related to our approach is also the one presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], where preferences are
created from provenance annotations of RDF data. There, the usage of different
provenance annotation dimensions is considered, each yielding a total preference order
represented by a semiring. These preferences are then aggregated according to common
social choice theory preference aggregation methods. Another line of related research
are constraint-based formalisms for modeling preferences based on semirings, like the
one presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        This work is also related to the study and use of provenance in information systems
and, in particular, the Semantic Web and social media [
        <xref ref-type="bibr" rid="ref1 ref10">16, 10, 1</xref>
        ]. Here, we use a kind
of “how”-provenance [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to study how to propagate report annotations of atoms in an
ontology to create preference-based ranked results of ontological queries. Representing
the “how”-provenance in this manner allows us to map the semiring into others
depending on the manipulations that we wish to perform during the propagation of reports. To
our knowledge, this is the first study of a direct application of provenance of reports of
this kind found in online reviews to query answering.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Summary and Outlook</title>
      <p>In this paper, we have studied the problem of preference-based query answering in
Datalog+/– ontologies under the assumption that the user’s preferences are informed by
a set of subjective reports representing opinions of others—such reports model the kind
of information found, for instance, in online reviews of products, places, and services.</p>
      <p>We first introduced a basic approach to rank atoms in an ontology by combining
reports, report authors’ preferences, and the user’s preferences. Then, we extended our
framework to deal with dependencies and query answering using provenance
information to keep track of which reports should be considered to evaluate query answers,
as well as new information derived from dependencies. Representing the provenance
by means of a semiring enabled us to adapt our framework depending on the kind of
report propagation that we wish to carry out. One direction for future work involves
investigating further semirings for manipulating reports during their propagation.</p>
      <p>Much work remains to be done in this line of research, such as expressing general
reports that apply to sets of tuples, and exploring the application of existing techniques
to gather reports from actual information available in Web reviews. We also plan to
experimentally evaluate our framework over synthetic and real-world data.
Acknowledgments. This work was supported by EPSRC grant EP/J008346/1 (“PrOQAW”), an
EU (FP7/2007-2013) Marie-Curie Intra-European Fellowship, the ERC grant 246858
(“DIADEM”), and a Yahoo! Research Fellowship.
14. Lacroix, M., Lavency, P.: Preferences: Putting more knowledge into queries. In: Proc. VLDB.</p>
      <p>vol. 87, pp. 1–4 (1987)
15. Lukasiewicz, T., Martinez, M.V., Simari, G.I.: Preference-based query answering in</p>
      <p>Datalog+/– ontologies. In: Proc. IJCAI. pp. 1017–1023 (2013)
16. Moreau, L.: The foundations for provenance on the Web. Found. Trends Web Sci. 2(2/3),
99–241 (2010)
17. Stefanidis, K., Koutrika, G., Pitoura, E.: A survey on representation, composition and
application of preferences in database systems. TODS 36(3), 19:1–19:45 (2011)
18. Straccia, U.: On the top-k retrieval problem for ontology-based access to databases. In:
Flexible Approaches in Data, Information and Knowledge Management, pp. 95–114 (2013)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barbier</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gundecha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , H.:
          <article-title>Provenance Data in Social Media</article-title>
          . Morgan and Claypool (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Beeri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vardi</surname>
          </string-name>
          , M.Y.:
          <article-title>The implication problem for data dependencies</article-title>
          .
          <source>In: Proc. ICALP</source>
          . pp.
          <fpage>73</fpage>
          -
          <lpage>85</lpage>
          (
          <year>1981</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bistarelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pini</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venable</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          :
          <article-title>Bipolar preference problems: Framework, properties and solving techniques</article-title>
          .
          <source>In: Recent Adv. in Constraints</source>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>92</lpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Bo¨rzso¨nyi,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kossmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Stocker</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>The skyline operator</article-title>
          .
          <source>In: Proc. ICDE</source>
          . pp.
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Brewka</surname>
          </string-name>
          , G.:
          <article-title>Preferences, contexts and answer sets</article-title>
          .
          <source>In: Proc. ICLP</source>
          . p.
          <volume>22</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The providence of provenance</article-title>
          .
          <source>In: Proc. BNCOD</source>
          . pp.
          <fpage>7</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Cal`ı,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Kifer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Taming the infinite chase: Query answering under expressive relational constraints</article-title>
          .
          <source>In: Proc. KR</source>
          . pp.
          <fpage>70</fpage>
          -
          <lpage>80</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Cal`ı,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Lukasiewicz</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>A general Datalog-based framework for tractable query answering over ontologies</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>14</volume>
          ,
          <fpage>57</fpage>
          -
          <lpage>83</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chomicki</surname>
          </string-name>
          , J.:
          <article-title>Preference formulas in relational queries</article-title>
          .
          <source>TODS</source>
          <volume>28</volume>
          (
          <issue>4</issue>
          ),
          <fpage>427</fpage>
          -
          <lpage>466</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dividino</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Gro¨ner, G.,
          <string-name>
            <surname>Scheglmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thimm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Ranking RDF with provenance via preference aggregation</article-title>
          .
          <source>In: Proc. EKAW'12</source>
          . pp.
          <fpage>154</fpage>
          -
          <lpage>163</lpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Govindarajan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jayaraman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mantha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Preference queries in deductive databases</article-title>
          .
          <source>New Generat. Comput</source>
          .
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <fpage>57</fpage>
          -
          <lpage>86</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Provenance semirings</article-title>
          .
          <source>In: Proc. PODS</source>
          . pp.
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haddawy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Toward case-based preference elicitation: Similarity measures on preference structures</article-title>
          .
          <source>In: Proc. UAI</source>
          . pp.
          <fpage>193</fpage>
          -
          <lpage>201</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>