=Paper= {{Paper |id=Vol-2400/paper-05 |storemode=property |title=Flexible Score Aggregation (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-2400/paper-05.pdf |volume=Vol-2400 |authors=Paolo Ciaccia,Davide Martinenghi |dblpUrl=https://dblp.org/rec/conf/sebd/CiacciaM19 }} ==Flexible Score Aggregation (Extended Abstract)== https://ceur-ws.org/Vol-2400/paper-05.pdf
                           Flexible Score Aggregation
                                            (Discussion Paper)


                             Paolo Ciaccia1 and Davide Martinenghi2
               1
                Università di Bologna, Italy, email: paolo.ciaccia@unibo.it
         2
             Politecnico di Milano, Italy, email: davide.martinenghi@polimi.it

        Abstract. Ranking objects according to different criteria is a central
        issue in many data-intensive applications. Yet, no existing solution deals
        with the case of partially specified score aggregation functions (e.g., a
        weighted sum with no precisely known weight values).
        We address multi-source top-k queries with constraints (rather than pre-
        cise values) on the weights. Our solution is instance optimal and provides
        increased flexibility with negligible overhead wrt classical top-k queries.

1     Introduction
Looking for information relevant to decide about the goodness of items of inter-
est (hotels, restaurants, products, services, etc.) has become a common activity,
supported by the so-called multi-source top-k queries [6]. For instance, the prob-
lem of choosing where to eat (possibly with constraints on price, food type, and
location) may be addressed by aggregating different pieces of information from
several sources. It is custom to assume that each such source provides a ranked
list of results (from best to worst), each coming with a (local) score. Accordingly,
objects can be either retrieved sequentially via a sorted access or by providing
the object’s identifier and obtaining the corresponding local score (random ac-
cess). Then, for each object o in the dataset, a monotone scoring function f is
used to aggregate the local scores yielding the (overall) score f (o).
                      ahjdfbeicg                i agjchfdeb               cfeidabghj
                   R1 9 8 8 7 7 6 6 5 4 4    R2 10 9 8 8 7 6 5 5 5 4   R3 9 8 7 7 6 5 5 5 4 3

                           Fig. 1: A set of ten restaurants ranked by three sources.

Example 1. Consider the scenario depicted in Figure 1, where ten restaurants,
a, . . . , j, are ordered by d = 3 rankings. Assume that the overall score is a
                                               Pd
weighted sum of the local scores, i.e., f (o) = i=1 wi · o[i], where o[i] is the score
assigned by Ri to object o (e.g., c[2] = 7), and that the k = 2 best restaurants
are sought. In particular, using the weight vector W1 = (1/3, 1/3, 1/3), the top-2
restaurants are a and i, with f (a) ≈ 7.67 and f (i) ≈ 7.33. A different choice of
weights, say W2 = (0.2, 0.3, 0.5), results in i and c being the top-2 restaurants.

   Choosing precise weights in the scoring function is admittedly difficult, and
might be the result of learning from users, e.g., via crowdsourcing [4]. A more
    Copyright c 2019 for the individual papers by the papers authors. Copying permit-
    ted for private and academic purposes. This volume is published and copyrighted by
    its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
viable alternative is to specify weights only partially through a set of constraints.
With reference to Example 1, instead of having fixed weights, it might be in-
teresting to “explore” all the alternatives arising when weights vary around W2 ,
by, say, ±0.1. Then, only a, c, f, and i can be the possible (top-2) answers.
    We study the problem of answering multi-source top-k queries with partially
specified weights. This leads to a family F of scoring functions for ranking ob-
jects. Existing solutions to the problem with a single scoring function are Fagin’s
algorithm (FA) [6] and the threshold algorithm (TA) [7]. The latter was shown
to perform better and to achieve optimal I/O performance. Our proposal, FSA,
solves the general case and reduces to TA for classical top-k queries (i.e, when
F = {f }) and is instance optimal for any F. When no constraints are present,
FSA behaves like FA and returns the so-called k-skyband [9], the case k = 1 being
the skyline of the dataset [1].
    This paper summarizes the main contributions presented in [3], namely 1) a
provably correct and instance optimal algorithm, FSA, for answering multi-source
top-k queries, 2) several optimization opportunities for improving the execution
time of FSA, 3) experimental evidence that FSA enjoys the flexibility of partially
specified weights while incurring little overhead wrt top-k queries.

2    Problem statement
Consider a dataset D = {o1 , . . . , oN } where each object o ∈ D has d ≥ 2 partial
scores, (o[1], . . . , o[d]). For any scoring function f and dataset D, a top-k query
on D using f returns a set of k objects with the highest scores according to
f (ties arbitrarily broken). Given objects o and o0 , if o[i] ≥ o0 [i] holds for all
i = 1, . . . , d, at least once strictly, then o dominates o0 , written o  o0 . Notice
that o  o0 implies f (o) ≥ f (o0 ) ∀f ∈ M (M being the set of all monotone
scoring functions). The skyline of D is the set of undominated objects in D:
                     Sky(D) = {o ∈ D | @o0 ∈ D. o0  o},                           (1)
but can also be viewed as the set of objects that can be top-1 [8], i.e.:
          o ∈ Sky(D) ⇐⇒ ∃f ∈ M. ∀o0 ∈ D. o0 6= o → f (o) > f (o0 ).
     The k-skyband of D, written Skyk (D), is the set of all objects in D that are
dominated by less than k objects [9], and thus includes all possible top-k sets.
     As a generalization of dominance for a family F of monotone scoring functions
([2]), we say that object o F-dominates object o0 , o 6= o0 , denoted by o F o0 ,
iff ∀f ∈ F. f (o) ≥ f (o0 ) and ∃f ∈ F. f (o) > f (o0 ).
     We define the non-dominated restricted skyband of D wrt F, denoted ndk (D; F),
as the set of objects in D that are F-dominated by less than k objects:
       ndk (D; F) = {o ∈ D | @o1 , . . . , ok ∈ D. o1 F o ∧ . . . ∧ ok F o}      (2)
   Clearly, ndk (D; F) ⊆ Skyk (D). When F reduces to a single function f , then
ndk (D, {f }) includes all possible top-k results for f (whereas, in case of ties,
top-k queries may nondeterministically discard tuples). From (2) it is evident
that if o 6∈ ndk (D; F), then no scoring function f ∈ F can make o top-k. We
now address the problem of efficiently computing ndk (D; F) in a multi-source
scenario, for any given dataset D and set of monotone scoring functions F.
3   Flexible score aggregation with FSA

Fagin’s algorithm FA was the first to address multi-source top-k queries for ar-
bitrary scoring functions. It consists of three phases: 1) sorted accesses are exe-
cuted; 2) random accesses are executed on all objects met in phase 1; 3) scores
of all fetched objects are computed using a scoring function f and the top-k
result determined. The stopping condition for phase 1 is to have seen at least k
objects on all the d rankings. With the scenario depicted in Figure 1, FA would
stop at depth 7 (after 7 sorted accesses per ranking), after seeing a and f on
all rankings. The FA algorithm can be adapted to compute Skyk (D), since it
stops phase 1 independently of f . Therefore, the set S of objects seen when the
algorithm stops includes the top-k objects according to any possible monotonic
scoring function (and, thus, Skyk (D), which, in turn, includes ndk (D; F)).
     The TA algorithm uses a threshold value f (τ ) computed by applying the
specific scoring function f to the tuple τ of last scores seen by sorted access on
each ranking, which we here call the threshold point. A further difference with
respect to FA is that random accesses are not delayed to a separate phase: after
performing d sorted accesses (one per ranking), all the necessary random accesses
are executed. The algorithm stops when at least k seen objects have a global
score no worse than f (τ ). Thus, for each object o in the result, f (o) ≥ f (τ )
holds. In Figure 1, with k = 2 and weights W2 = (0.2, 0.3, 0.5) TA would stop at
depth 4, where f (τ ) = 7.3, since f (i) = 7.5 and f (c) = 7.4.
     Being tailored to a specific f , the stopping criterion of TA is (much) more
efficient than the one used by FA, but now, when TA stops, S is only guaranteed
to contain the top-k objects according to f , but not necessarily according to
other scoring functions.
     Let us say that o weakly F-dominates o0 , o 6= o0 iff ∀f ∈ F. f (o) ≥ f (o0 ).
Then, we can unify FA and TA according to the following observation.

Observation 1 FA stops executing sorted accesses when k seen objects o1 , . . . , ok
weakly M-dominate τ . TA stops when k seen objects weakly {f }-dominate τ .

    Our proposal, FSA, applies the notion of F-dominance wrt τ , which includes
(M-)dominance and {f }-dominance as special cases. The extraction performed
by FSA stops as soon as k objects are seen that F-dominate τ , as all unseen
objects are also necessarily F-dominated by these k objects. Notice that FSA
uses F-dominance rather than weak F-dominance for not discarding ties from
ndk (D; F). The pseudocode of FSA is presented in Figure 1.
    Note that the descent performed by FSA alternates a single sorted access on
the i-th ranking with d − 1 random accesses on the other rankings. This differs
from the strategy adopted by TA, which, at each depth, first performs d sorted
accesses (one per ranking), thereby retrieving up to d unseen objects, and then
completes the scores for these objects with up to d · (d − 1) random accesses.
However, both algorithms stop at the same depth. Also note that FA performs
all random accesses at the end; however, such an approach would not be possible
for TA or FSA, since {f }-dominance and F-dominance generally require knowing
all the scores of the objects being compared.
    Algorithm 1: FSA algorithmic pattern for computing ndk .
 Input: Rankings R1 , . . . , Rd for D, family of scoring functions F, integer k > 0.
 Output: ndk (D; F).
  1. let S := ∅ // Seen objects
  2. do
  3.   for i in 1 . . . d
  4.      let ho, σi := sortedAccess(Ri ) // Do a sorted access in each ranking Ri
  5.      τ [i] := σ // Update score for threshold τ
  6.      if o ∈ / S then // Object o is new
  7.          o[i] := σ // Save score for o
  8.          for j in 1 . . . d // Extract all other scores for o via random access
  9.             if j 6= i then o[j] := randomAccess(Rj , o)
 10.          if @o1 , . . . , ok ∈ S. o1 F o ∧ . . . ∧ ok F o then handle(o)
 11. while @o1 , . . . , ok ∈ S. o1 F τ ∧ . . . ∧ ok F τ // Stop if k objects F τ
 12. clean() // Remove from S all objects F-dominated by ≥ k others
 13. return S




    Let I = hD, F, ki indicate an instance of the problem of computing ndk (D; F)
wrt a set of monotone scoring functions F and let I indicate the class of all such
problems. An algorithm A is correct (for the computation of ndk ) if, for each
instance I = hD, F, ki ∈ I, A returns ndk (D; F).
Theorem 1. FSA is correct for the computation of ndk .
   Performance in top-k contexts is commonly characterized by the number of
objects accessed by sorted access by an algorithm A on an instance I, which
we indicate as sumDepths(A, I). Algorithm A is instance optimal if there exist
constants c1 and c2 such that sumDepths(A, I) ≤ c1 · sumDepths(A0 , I) + c2 for
every correct algorithm A0 and instance I ∈ I.
Theorem 2. FSA is instance optimal.


4     Algorithmic variants
We now discuss efficient implementations of P  FSA and focus on the case in which
                                                    d
each scoring function is of the form f (o) = i=1 wi gi (o[i]), where the wi ’s are
weights subject to a set C of linear constraints and the gi ’s are monotone. This
form includes, e.g., weighted linear sums and weighted Lp norms. The constraints
may be of several kinds (see [5] for an overview), such as
 1. weak rankings: w1 ≥ w2 ≥ . . . ≥ wd (relative importance of attributes);
 2. ratio bounds: w̄i (1−ε) ≤ wi ≤ w̄i (1+ε), 1 ≤ i ≤ d (spread around a weight).
    Without loss of generality, we consider the space of normalized weight
                                                                     Pd vectors
W ⊆ [0, 1]d , such that, for each W = (w1 , . . . , wd ) ∈ W, we have i=1 wi = 1;
W(C) denotes the subset of W of weight vectors satisfying constraints C.
    F-dominance test. There are two main approaches to efficiently check F-
dominance [2]. One approach requires solving a linear programming problem
encoding the notion of F-dominance. This may be expensive, since every F-
dominance test requires solving a different LP problem. A better approach, which
we choose for FSA, is based on the F-dominance region of an object o, i.e., the
set of all points that o F-dominates. Such a region needs to be computed only
once for all the tests in which o is the candidate F-dominant object. Testing
membership to the F-dominance region is done by i) computing the vertices of
the convex polytope W(C), and ii) checking the following inequalities:
                     d                         d
                             (`)                       (`)
                     X                         X
                           wi gi (o[i]) ≥            wi gi (o0 [i]),      ` ∈ {1, . . . , q},                  (3)
                     i=1                       i=1

                                   (`)        (`)
where each W (`) = (w1 , . . . , wd ), for 1 ≤ ` ≤ q, is a vertex of W(C), and at
least one inequality is strict. Computing vertices may be expensive, but needs
to be done only once per dataset.
    Memoization of the vertex scores. We call the lhs of (3) the vertex
score of o for W (`) , denoted by v` (o) (similarly for o0 in the rhs). Each vertex
score, once computed, is remembered for subsequent tests (3) involving the same
object. We adopt this technique for FSA and call it memoization.
    Sorting. In FSA, we keep the seen objects topologically sorted wrt the F-
dominance relation, so that, in a sequence o1 , . . . , on , no object oi can be F-
dominated by an object oj if j > i. We do so by sorting according to weights
in the centroid of W(C). Then, when checking whether an object o should be
added to the set S of seen objects (line 10 of Algorithm 1), we only look for k
objects F-dominating o among those preceding o in the topological sort.
    Pruning. In FSA, object removal can occur within either handle or clean.
With lazy pruning, we do not remove objects in handle and wait to do it when
clean is called. With eager pruning, when an object o is inserted in S, handle
also removes all objects in S that are F-dominated by o and were already F-
dominated by k − 1 other objects; clean does nothing.


                                                                           depth = 3        depth = 4    depth = 5
      ` v` (a) v` (i) v` (c) v` (h) v` (f) v` (j) v` (g) v` (e) v` (d) v` (τ3 )  V`     v` (τ4 )  V`     v` (τ5 ) V`
      1 6.6 7.7 7.9 5.0 7.0 5.0 5.8 6.3 5.8                             7.4     [f, a]   7.3     [f, a]   6.4 []
      2 7.0 8.0 7.7 5.2 6.7 5.5 6.1 6.1 5.7                             7.5     [a, f]   7.4     [a, f]   6.5 []
      3 7.4 7.8 7.2 5.6 6.6 6.0 6.0 6.0 5.8                             7.6 [a, c, f]    7.4     [c, f]   6.6 []
      4 7.4 7.3 6.9 5.8 6.8 6.0 5.6 6.1 6.0                             7.6 [a, i, c, f] 7.3     [c, f]   6.6 []
      5 6.6 7.2 7.6 5.2 7.2 5.0 5.4 6.4 6.0                             7.4 [i, f, a]    7.2      [a]     6.4 []
      6 7.0 7.0 7.1 5.6 7.1 5.5 5.3 6.3 6.1                             7.5 [c, f, a, i] 7.2 [c, f, a, i] 6.5 []
Table 1: Vertex scores for Examples 1 and 2, along with threshold point’s vertex scores and vertex
lists at depth 3, 4, and 5.




    Vertex lists. We observe that τ is F-dominated by an object o iff all vertex
scores of o are no less than the corresponding vertex scores of τ , and at least
one is greater. We then maintain, for each vertex W (`) , a list V` of seen ob-
jects whose vertex score is still lower than τ ’s vertex score v` (τ ). An object o is
added to V` within the call to handle(o) and, in V` , the objects are kept sorted
by vertex score. Additionally, for each object o, we maintain a mask of q bits
B1 (o), . . . , Bq (o) such that B` (o) = 1 iff v` (o) ≥ v` (τ ), plus one extra bit B>(o)
that is set to 1 iff at least one inequality holds strictly. Analogously, we also main-
tain a list V> of those objects o such that B>(o) = 0. In this way, when checking
which objects F-dominate τ , it suffices to consider, for each ` ∈ {1, . . . , q}, only
the objects in V` ∪ V> and only if v` (τ ) has changed since the last check.

Example 2. Consider again the scenario of Example 1, with k = 2 and weight
vector (0.2, 0.3, 0.5) ±0.1 on each weight. The vertices of W(C) are W (1) =
(0.1, 0.3, 0.6), W (2) = (0.1, 0.4, 0.5), W (3) = (0.2, 0.4, 0.4), W (4) = (0.3, 0.3, 0.4),
W (5) = (0.2, 0.2, 0.6), W (6) = (0.3, 0.2, 0.5), with centroid (0.2, 0.3, 0.5). At
depth 3, the seen objects are a, i, c, f (which will all appear in the result)
and h, j, g, e, which are all F-dominated by both i and c and thus not re-
tained. The objects are inserted sortedly in S according to the centroid, with
the order: i, c, a, f. Table 1 shows the vertex scores, orderly computed as
Pd       (`)
   i=1 wi o[i] (b is not even accessed). To see that, e.g., i F h we observe that
v` (i) ≥ v` (h), for 1 ≤ ` ≤ q (and v1 (i) > v1 (h)). At this point, the seen objects
S already coincide with nd2 , but FSA cannot stop yet. The threshold point is
τ3 = (8, 8, 7); its vertex scores are shown in Table 1, along with the correspond-
ing vertex lists (of objects that still have a vertex score lower than τ3 ’s). The
bit masks B>(o)B1 (o) . . . B6 (o) are computed accordingly; e.g., we have 1110010
for c since c has a worse vertex score than τ3 on vertices W (3) , W (4) , and W (6) ,
and thus is present in vertex lists V3 , V4 , and V6 . At depth 4, τ4 = (7, 8, 7), but
still no object F-dominates it. Finally, at depth 5, τ5 = (7, 7, 6), which is now
F-dominated by all of i, c, a, f, and then FSA can stop. The subsequent clean
procedure does not eliminate anything, since none of the remaining objects is
F-dominated by k = 2 other objects, thus nd2 (D; F) = {i, c, a, f}.


5     Experiments
In this section we evaluate the performance of our implementations of FSA. In
particular, we quantify the overhead incurred by the increased flexibility of FSA
when compared to the computation of plain k-skybands and top-k queries. For
our experiments, we consider three kinds of data distribution (anticorrelated,
uniform, and real)1 and scoring functions taken from weighted L1 norms with
ratio bounds constraints applied on the weights (1/d(1 − ε) ≤ wi ≤ 1/d(1 + ε)).
We report on both effectiveness and efficiency in different scenarios varying by
number of rankings d, spread ε, and k, with defaults 4, 20%, and 10, resp.
    In order to evaluate the effectiveness of our approach, we compare it to k-
skybands, which also retrieve non(-F)-dominated objects but in a way that is
agnostic with respect to the set of scoring functions F. The introduction of F
drastically helps to reduce the result set. Figure 2 shows this effect for NBA.
In Figure 2a, we vary the spread parameter ε used in ratio bounds constraints
and keep default values for all other parameters. As can be seen, the result sets
returned by ndk and Skyk coincide (985 objects) only when the spread is so high
1
    NBA: 190862 tuples reporting measures of basketball players’ performance, available
    at www.quantifan.com; ANT and UNI have 100K tuples.
that each weight can vary in the entire [0, 1] interval (“full”). For all other values
of ε, the result set of ndk is much smaller (from 71 objects when ε = 50%, to only
12 for ε = 1%). Note that, when ε = 0% (“none”), all weights equal 1/d exactly
and, thus, there is only one scoring function in F; in that case, the cardinality of
the result set is exactly k = 10. Figure 2b also reports the cardinality of the result
sets, but for different values of k (defaults otherwise). The result set returned
by Skyk is always almost two orders of magnitude larger than that of ndk , the
ratio becoming larger as k increases (from 157/2 when k = 1 to 6076/191 when
k = 100). Finally, Figure 2c shows this when the number of rankings d varies
(defaults otherwise). As d increases, the ratio between the cardinalities of the
results increases (from 89/15 when d = 2 to 27403/63 when d = 8). Similar
trends can be observed for the UNI and ANT datasets (omitted in the interest
of space). Overall, this shows that, thanks to F, ndk is able to keep the result
set within a reasonable cardinality, even for larger or more challenging problem
instances, while the cardinality of Skyk soon diverges.



                        ⨯⨯⨯⨯⨯⨯⨯⨯◇                                                             ⨯ ⨯⨯
                                                                                                                                             ���


                                                                                                                                                                 ⨯ ⨯
                 ����                                                            ���



                                                                                         ⨯ ⨯⨯
                  ���




                                                                                                                              �����������
                                                                                                                                             ���
   �����������




                                                                  �����������




                                                                                ����

                  ���
                                              ◇        ◇ ��   �                    ⨯
                                                                                 ���             ◇
                                                                                                  ◇◇
                                                                                                          ◇        ◇ ��   �
                                                                                                                                            ����
                                                                                                                                                       ⨯⨯                 ◇ ��
                                                                                                                                                                             �


                                                       ⨯                                                           ⨯                               ⨯◇ ◇ ◇               ◇ ⨯���
                   ��

                             ◇            ◇                ����                   ��
                                                                                         ◇ ◇                           ����                  ���
                                                                                                                                                                  ◇              �

                   ��   ◇◇◇◇
                        ���� �% �% �% ��%��%��% ����
                                                                                   �   ◇� �   �   �� ��   �� ���
                                                                                                                                              ��
                                                                                                                                                   �   �    �     �     �
                                ε (������)                                                        �                                                        ����������

                    (a) Spread ε varies.                                                  (b) k varies.                       (c) Number of rankings d
                                                                                                                              varies.
 Fig. 2: Cardinality of results for the NBA dataset. Varying parameter: ε in (a), k in (b), d in (c).



   We consider two variants of the FSA pattern: VEL+ and VEE+, using lazy and,
resp., eager pruning, and test them in terms of execution time.

Effect of number of rankings d. Both variants exhibit sub-second execution
times in default conditions. The “vertex lists” optimization grants savings of
up to one order of magnitude, especially for more complex problem instances.
Indeed, for d > 4 the number q of vertices increases dramatically, from q = 6
with d = 4 to q = 70 with d = 8. Yet, our optimization allows skipping many
(already verified) vertex score comparisons between objects and the threshold
point, thus leading to a linear behavior as d varies.

Effect of k and spread ε. Figure 3 shows execution times as k and ε vary
in all datasets. Clearly, in all scenarios, the problem becomes more challeng-
ing as k increases and as ε increases, since, in both cases, more objects need
to be extracted and compared. VEL+ and VEE+ show similar performance, with
VEE+ slightly prevailing on more challenging instances (e.g., when k = 100 or
ε =“full”). In such cases, many objects are kept by a lazy strategy and thus the
call to the clean procedure weighs more heavily than the eager pruning of ob-
jects. In the other cases, the extra cost of pruning objects early is balanced with
the reduced cost of calling clean. We also observe that, in most scenarios, VEE+
and VEL+ incur sub-second execution times, and only exceed 2s when ε =“full”
(a case in which the result coincides with the k-skyband) in the ANT dataset.


               ���
                                                 △                        ���                               △
                                                                                                            ◇                                                          ◇
               ���
                                                 ◇                                                                                   ����
                                                                                                                                     ����
                                                                                                                                                                       △
    ���� (�)




                                                               ���� (�)




                                                                                                                          ���� (�)
                                                                          ���
               ���
                                             △
                                             ◇        △ ���+              ���                           △
                                                                                                        ◇        △ ���+              ����                          ◇
                                                                                                                                                                   △        △ ���+
               ���
                                                      ◇ ���+                                                     ◇ ���+                           ◇                         ◇ ���+
                           △ ◇
                             △                                                        △ ◇
                                                                                        △                                            ����
               ���
                     △ ◇
                     ◇   △ ◇
                       △ ◇
                                                                          ���
                                                                                △ ◇
                                                                                ◇   △ ◇
                                                                                  △ ◇                                                ����
                                                                                                                                          △◇
                                                                                                                                          ◇   △◇
                                                                                                                                            △ ◇ △ △
                      �    �   �   ��   ��   �� ���                             �    �    �   ��   ��   �� ���                               �    �    �   �� ��   �� ���
                                   �                                                          �                                                            �

                          (a) UNI: k varies.                                        (b) ANT: k varies.                                           (c) NBA: k varies.
                                              △
                                                                          ��
                                                                                                         △                                                          △
               ����
               ����                           ◇                           ��
                                                                                                         ◇
                                                                                                                                     ���
                                                                                                                                                                    ◇
                                                               ���� (�)




                                                                                                                          ���� (�)
    ���� (�)




                                                                          ��                                                         ���
               ����
               ����                                   △ ���+              ��                                     △ ���+
                                                                                                                                     ���                                    △ ���+
               ����                                   ◇ ���+               �                                     ◇ ���+                                                     ◇ ���+
               ����   ◇
                      △◇ △◇
                        △◇  △◇
                           △◇ △◇
                               △                                           �    △◇
                                                                                ◇ △◇
                                                                                   △◇ △◇
                                                                                     △◇  △
                                                                                        △◇
                                                                                                                                     ���
                                                                                                                                           △◇
                                                                                                                                           ◇  △◇
                                                                                                                                             △◇  △◇
                                                                                                                                                △◇  △
                                                                                                                                                   △◇
                      �����% �% �% ��%��%��% ����                              ���� �% �% �% ��%��%��% ����                                ���� �% �% �% ��%��%��% ����
                               ε (������)                                                ε (������)                                                   ε (������)

     (d) UNI: spread ε varies.      (e) ANT: spread ε varies.          (f) NBA: spread ε varies.
    Fig. 3: CPU times for computing ndk . Datasets: UNI in (a),(d); ANT in (b),(e); NBA in (c),(f).




6              Conclusions
We have presented a framework for multi-source top-k queries defined by scoring
functions with partially specified weights. This provides the flexibility required
in many application scenarios (e.g., online services, crowdsourcing, etc.). Our so-
lution, FSA, is a provably instance optimal algorithmic pattern that encompasses
both top-k queries (attaining comparable performance) and k-skyband queries.

References
1. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages
   421–430, 2001.
2. P. Ciaccia and D. Martinenghi. Reconciling skyline and ranking queries. PVLDB,
   10(11):1454–1465, 2017.
3. P. Ciaccia and D. Martinenghi. FA + TA < FSA: Flexible score aggregation. In
   CIKM 2018, pages 57–66, 2018.
4. E. Ciceri et al. Crowdsourcing for top-k query processing over uncertain data.
   TKDE, 28(1):41–53, 2016.
5. Y. S. Eum et al. Establishing dominance and potential optimality in multi-criteria
   analysis with imprecise weight and value. Computers & Op. Res., 28:397–409, 2001.
6. R. Fagin. Combining fuzzy information from multiple systems. In PODS, pages
   216–226, 1996.
7. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware.
   In PODS, 2001.
8. N. Meneghetti et al. Output-sensitive evaluation of prioritized skyline queries. In
   SIGMOD, pages 1955–1967, 2015.
9. D. Papadias et al. Progressive skyline computation in database systems. TODS,
   30(1):41–82, 2005.