<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User's Preferences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matu´ˇs Ondreiˇcka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslav Pokorny´</string-name>
          <email>jaroslav.pokorny@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Software Engineering, Faculty of Mathematics and Physics, Charles University</institution>
          ,
          <addr-line>Malostransk ́e n ́am. 25, 118 00 Praha 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>142</fpage>
      <lpage>153</lpage>
      <abstract>
        <p>In this article we focus on efficient solving of searching the best K objects in more attributes according to user's preferences. Local preferences are modelled with one of four types of fuzzy function. Global preferences are modelled concurrently with an aggregation function. We focused on searching the best K objects according to various user's preferences without accessing all objects. Therefore we deal with the use of TA-algorithm and MD-algorithm. Because of local preferences we used B+-trees during computing of Fagin's TA-algorithm. For searching the best K objects MD-algorithm uses multidimensional B-tree, which is also composed of B+-trees. We developed an MXT-algorithm and a new data structure, in which MXT-algorithm can effectively find the best K objects by user's preferences without accessing all the objects. We show that MXT-algorithm in some cases achieves better results in the number of accessed objects than TA-algorithm and MD-algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Nowdays, huge amounts of data in various web systems grow exponentially.
Users try to find various objects in this data, such as laptops, cars, apartments,
holidays, etc. These objects have various attributes. According to the values of
attributes, users search for the best objects for them. However, each user prefers
objects with other values of attributes [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Most users look for only a few objects that are best suited to their preferences.
Therefore, it is advantageous according to the user’s preferences directly to find
the best K objects. Making it possible to find the best K objects from the set
of objects X, it must be possible to judge which objects are better or worse for
a user U . Every user prefers objects with own preferences, which are modelled
locally by means of a fuzzy function and globally by means of an aggregation
function [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this paper we assume that the set of objects of the same type
is stored in one data structure. The problem of searching the best K objects
according to values of different attributes in the same time is indicated as a
top-K problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In the next, text we describle using B+-tree [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for sorting objects acording
to fuzzy function for supporting the local preferences [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        We deal also with methods and data structures for effective solution of
topK problem [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We discuss the problem of finding the best K objects without
accessing all the objects via Fagin’s TA-algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and MD-algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For
application of local preferences TA- and MD-algorithm are using data structures
based on B+-trees [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These structures are independent from user’s preferences.
Moreover, it is possible to update these structures easily and quickly.
      </p>
      <p>We developed a MXT-algorithm and a new data structure based on
B+trees, in which MXT-algorithm can effectively find the best K objects by user’s
preferences without accessing all the objects. We will show advantages of the
new MXT-algorithm and a comparison with the results of other algorithms. We
show that MXT-algorithm in some cases achieves better results in the number
of accessed objects than TA- and MD-algorithm.</p>
      <p>A background concerning modelling user preferences is contained in
Section 2. Section 3 presents use of B+-tree for approaching the top-K problem.
Sections 4 and 5 are devoted to explaining principles of Fagin’s TA-algorithm
and MD-algorithm. In Section 6 we describe our new MXT-algorithm and new
data structure, which is used during computation of MXT-algorithm. Section
7 presents the results of the tests and compares MD-algorithm with Fagin’s
algorithms. Finally, Section 8 provides some suggestions for a future work.
2</p>
      <p>User’s preferences
In this article, we suppose a set of objects X with m attributes A1, ..., Am.
Every object x ∈ X has m values aix, ..., axm of corresponding attributes. When
searching for the best K objects, a user U chooses user’s preferences, which
determine suitability of the object x ∈ X in dependence with its m values of
attributes a1 , ..., axm for user U . In accordance to user U preferences, it is possible
x
to find the best K object for user U . We are using a model of user preferences,
which is based on concatenation of local preferences and global preferences. At
first, user U decides how he prefers objects according to each of attributes Ai,
i = 1, ..., m, and then chooses a relationship between these attributes.</p>
      <p>
        Local preferences reflect how the object is preferred according to only one
attribute. In this work, we used a mapping fi: dom(Ai) → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. In general, we
differentiate two possible attribute types.
      </p>
      <p>
        Nominal attribute has a finite range of possible values, usually strings. For a
nominal attribute Ai user U has to rate each value of the attribute. For example,
it is mark of some products. Local preference of user U for the nominal attribute
Ai is represented as mapping fiU (x): aix → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], where i = 1, ..., m.
      </p>
      <p>
        Ordinal attribute has some natural value ordering, other than lexical ordering.
Typical examples are integer numbers. Domain of ordinal attribute is subset of
continuous interval. The local preference of user U for the ordinal attribute Ai
is represented by fuzzy function [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We denoted it also as fiU (x) the user’s fuzzy
function for i-th attribute. In this case, fuzzy functions have a scheme fiU (x):
aix → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], where i = 1, ..., m.
      </p>
      <p>For the worst object x ∈ X, fiU (x) = 0 holds and for the best one, fiU (x)
= 1. Comparing fiU (x1) and fiU (x2) of two objects x1 and x2, it is possible to
decide which of them is suitable for the user U according to the value of i-th
attribute. When fiU (x1) &gt; fiU (x2), then x1 is more suitable. When fiU (x1) =
fiU (x2), then x1 and x2 are equally suitable.</p>
      <p>
        Article [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] describes four types of user’s fuzzy functions, which are sufficient
for entering user’s preference of ordinal attributes. It is showed in Figure 1. There
are nondecreasing function, nonincreasing function, function in the form of hill,
and function in the form of valley. In next text we suppose only these four types
of user’s fuzzy functions.
      </p>
      <p>
        Global preferences express how the user U prefers objects from X according
to all attributes A1, ..., Am. On the set of objects X, we consider aggregation
function @ with m variables p1, ..., pm specified as @(p1, ..., pm) : [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]m → [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
For the user U with his user fuzzy functions f1U , ..., fmU , a user rating function
@U originates by means of substitution of pi = fiU (x), i = 1, ..., m. Then, for
every x ∈ X @U (x) = @(f1U (x), ..., fmU (x)). With @U (x) it is possible to evaluate
global rating of each object x ∈ X.
      </p>
      <p>With aggregation function, a user U can define the mutual relations of the
attributes. For example, we can use arithmetic average. For implementation of
user’s influence to the aggregate function, it is possible to use weighted average,
where weights u1, ..., um of single attributes A1, ..., Am determine how the user
prefers single attributes, @(x) = u1·p1+...+um·pm . When the user does not care
u1+...+um
about i-th attribute Ai when rating the objects, he can then put 0 into the
aggregate function, i.e. ui = 0.</p>
      <p>In this work, every user U express his preference as functions f1U , ..., fmU and
@U . Every object x ∈ X has its own special global rating @U (x), a measure of
pertinence for the user U . In this way, it is possible to find the best K objects
for the user U . When there are more objects with the same rating as rating of
the best K-th object, a random object is chosen.
3</p>
      <p>
        Usage of B+-tree
For application of local user’s preferences we need a data structure, from which it
is possible to obtain objects according to user’s fuzzy function f U in descending
order according to f U . Let the objects of the set X be indexed in the B+-tree
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] according to the values of attribute A [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Every object x ∈ X is indexed by the key k ∈ dom(A), a value of which is
equal to a value of attribute ax of object x, i.e. k = ax. For a key k, in general,
more objects with the same A attribute value can exist in X. Therefore, we use
the object array, in which are stored objects with the same value of the key k.
For every key k there is a pointer to corresponding object array.</p>
      <p>
        Since the leaf nodes of the B+-tree are linked in two directions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is
possible to cross the B+-tree through the leaf level and to get all the keys.
According to course of user’s fuzzy function f U , which is one of four types [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
we can obtain objects in the following different ways (Figure 2).
1. Nondecreasing function. In this case, we have to cross the leaf level of the
B+tree from the right to the left. It is possible to get the pairs (x, f U (x)) in the
descending order according to the user’s preference f U , because ax ≤ ay ⇒
f U (x) ≤ f U (y) holds.
2. Nonincreasing function. In this case, we have to cross the leaf level of the
B+tree from the left to the right. It is possible to get the pairs (x, f U (x)) in the
descending order according to the user’s preference f U , because ax ≥ ay ⇒
f U (x) ≤ f U (y) holds.
3. Function in the form of hill. In this case, we must find key k in leaf level
of B+-tree. Key k is the nearest point from maximum of fuzzy function f U .
From the key k we cross the leaf level in two ways concurrently from k to
both borders of the B+-tree as in previous two cases.
4. Function in the form of valley. In this case, we have to cross the leaf level
of the B+-tree in two ways concurrently from both borders to the key k in
to inside of leaf level. Key k is the nearest point from minimum of f U .
      </p>
      <p>
        In general, f U cannot be monotone in its domain. In this case, the domain
can be divided into continuous intervals (also leaf level of B+-tree), where f U
is monotone on each of these intervals. From the intervals, objects are obtained
concurrently according to f U as well as for monotone fuzzy function [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
4
      </p>
      <p>
        Fagin’s TA-algorithm
Fagin et al. describe in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] top-K algorithms, which solve top-K problem without
searching of all objects in set X. Algoritms assume that the objects from the set
X are, together with values of their m attributes A1, ..., Am, stored in m lists
L1, ..., Lm. Each i-th list Li contains pairs (x, aix). Lists L1, ..., Lm are sorted
in descending order according to the values of attributes a1 , ..., axm of objects
x
x ∈ X. The aggregate function @ must be monotone according to the ordering
in lists [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], e.g. weighted average. Then Fagin’s algorithms can find the best
K objects with regard to existing aggregate function @ without making |X| · m
accesses into the lists L1, ..., Lm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Fagin’s algorithm TA (threshold algorithm) needs to access the lists L1, ...,
Lm sequentially and also directly. With the sequential access, TA searches the
lists step by step from the top to the bottom and obtains pairs (x, aix). For every
object x, which is detected for the first time in obtained pair (x, aix), TA obtains
the missing attribute values of the object x by a direct access to the other lists.</p>
      <p>
        TA uses the list TK , in which it keeps the best actual K objects ordered
according to their value of @(x). During the run of TA a threshold Th is counted,
which is calculated by means of inserting the last seen values of attributes
last, ..., almast in the lists L1, ..., Lm, respectively, at the sequential access to the
a1
aggregate function Th = @(al1ast, ..., almast). Rating of the K-th best object in TK
we denote MK . When Th ≤ MK , TA is able to stop and returns TK as the list
of the best K objects. TA can finish before it comes to the end of all the lists
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. TA-algorithm is described by the following pseudo-code.
      </p>
      <p>
        Input: Lists L1, ..., Lm, Aggregation @, int K;
Output: List TK;
var List TK; {temporary list of objects}
begin
while(|TK| &lt; K or Th &gt; MK )do
(x, aix) = getNextPair(L1, ..., Lm); {it obtains next pair from
aliast = aix; one of the lists L1, ..., Lm} [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
Th = @(al1ast, ..., almast);
if(x ∈/ TK)then
get the missing attribute values of the object x;
if(|TK| &lt; K)then
      </p>
      <p>insert object x to the list TK on the right place according to @(x);
else
if(@(x) &gt; MK )then
begin
delete K-th object from the list TK;
insert object x to the list TK on the right place by @(x);
end;
endwhile;
end.
4.1</p>
      <p>Application of local preferences
Original Fagin’s algorithms offer the possibility of rating objects only globally
with an aggregate function @ and to find the best K object for the user U only
according to his global preference. For the support of the local preferences, it
is necessary that every i-th list Li contains pairs (x, fiU (x)) in descending order
according to user’s fuzzy function fiU (x).</p>
      <p>
        For application of local user’s preferences, we use B+-tree (see Section 3).
We use as the lists L1, ..., Lm a list of m B+-trees B1, ..., Bm, where in
B+tree Bi all objects are indexed by values of i-th attribute Ai. Moreover, this
structure is common for all users. A particular user U only specifies his local
preferences with fuzzy functions f1U , ..., fmU and global preferences with function
@U . Then the algorithm sequentially obtains pairs (x, fiU (x)) from B+-trees
B1, ..., Bm according to preferences of user U [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Algorithm TA also uses the direct access to the lists L1, ..., Lm, where for
object x it is needed to obtain its unknown value aix from Li. B+-tree is not able
to make this operation, because it is not possible to obtain the value directly.
For a realization of direct access we can use, for example, an associative array.
5</p>
      <p>
        Multidimensional B-tree and MD-algorithm
In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] we describe a top-K algorithm, which solves top-K problem with using
the multidimensional B-tree [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] (MDB-tree) without searching all the objects.
MDB-tree with m levels allows to index set of objects X by attributes A1, ...,
Am. Each level of MDB-tree is composed from B+-trees containing key values
from domain of one attribute. It is showed in Figure 3. Ordering of attributes in
levels of MDB-tree is very important [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (see Section 6).
      </p>
      <p>We use the mark ρ(ki) for the pointer of the key k in B+-tree in i-th level of
MDB-tree. If i &lt; m, then ρ(ki) refers to B+-tree in (i + 1)-th level of MDB-tree.
If i = m, i.e. B+-tree is in the last level of MDB-tree, then ρ(ki) refers to object
array, where objects with the same values of all the m attributes are stored.</p>
      <p>
        For explicit identification of B+-tree in MDB-tree, we use the sequence of
keys, which is called tree identifier [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An identifier of B+-tree in i-th level is
(k1, ..., ki−1). Tree identifier (∅) identifies B+-tree in the first level of MDB-tree.
      </p>
      <p>
        Furthermore, in MDB-tree we use the best rating B(S) of B+-tree S [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. By
the best rating B(S) of B+-tree S with identifier (k1, ..., ki−1) in the i-th level
of MDB-tree, we understand a rating of not yet known object x, calculated with
@(a1x, ..., axm), where the first i − 1 attribute’s values of object x are k1, ..., ki−1
and values of other attributes are 1, i. e. B(S) = @(k1, ..., ki−1, 1, ..., 1).
      </p>
      <p>
        MD-algorithm
Obtaining all the objects from MDB-tree is possible with a recursive procedure,
which searches MDB-tree in depth-first search. The work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] includes a sequence
of statements with proofs, which shows that it is possible to find best K objects
in MDB-tree with the recursive procedure findTopK according to a monotone
aggregate function @ and without getting all the objects.
      </p>
      <p>Let the keys from the B+-tree S with the identifier (k1, ..., ki−1) be the keys
obtained one by one by a run of procedure findTopK. The pointer ρ(p) refers
to B+-tree P in the next level or to the object array P . Let MK be the rating
of the K-th best object in temporary list TK of best K objects. If B(P ) ≤ MK
holds, then the procedure findTopK(MDB-tree, (k1, ..., ki−1)) can stop in S.</p>
      <p>The following pseudo-code describes MD-algorithm.</p>
      <p>Input: MDBtree MDB-tree, Aggregation @, int K;
Output: List TK;
var List TK; {temporary list of objects}
begin</p>
      <p>findTopK(MDB-tree, (∅), @, K); return TK;
end.
procedure findTopK(MDBtree MDB-tree, Identifier (k1, ..., ki−1),</p>
      <p>Aggregation @, int K);
while(there is next key in B+-tree)do {with identifier (k1, ..., ki−1)}
ki = getNextKey(MDB-tree, (k1, ..., ki−1));</p>
      <p>{ρ(ki) refers to B+-tree P or it refers to the object array P }
if(|TK| = K and B(P ) ≤ MK )then return;
if(P is B+-tree)then</p>
      <p>findTopK(MDB-tree, (k1, ..., ki−1, ki), @, K);
if( P is object array)then
while(there is the next object x in P )do
if(|TK| &lt; K )then</p>
      <p>insert object x to TK to the right place according to @(x);
else
if(@(x) &gt; MK )then begin
delete K-th object from the list TK;
insert object x to the list TK on the right place by @(x);
end;
endwhile;
endwhile;
end.
procedure getNextKey(MDBtree MDB-tree, Identifier (k1, ..., ki−1));
choose the next key ki with next highest value of Ai
in B+-tree of MDB-tree with identifier (k1, ..., ki−1);
return ki ;
end.</p>
    </sec>
    <sec id="sec-2">
      <title>Application of local preferences</title>
      <p>From B+-tree we can obtain keys in descending order according to user’s fuzzy
function fiU (x). Because MDB-tree is composed from B+-trees, it is possible to
use application of the local user preferences directly in MD-algorithm by finding
the K best objects in MDB-tree. The following procedure getNextKey changes
the original MD-algorithm.
procedure getNextKey(MDBtree MDB-tree, Identifier (k1, ..., ki−1),</p>
      <p>FuzzyFunction fiU );
choose the next key ki with next highest value of i-th fuzzy function
in B+-tree of MDB-tree with identifier (k1, ..., ki−1);
return ki;
end.
6</p>
      <p>Combination of TA and MD-algorithm
Here we describe a new top-K algorithm, which is based on combination of
MDalgorithm and Fagin’s TA-algorithm.</p>
      <p>
        MD-algorithm has the best results, when the objects stored in MDB-tree are
distributed regularly (uniform distribution) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. When attributes of objects have
different size of their actual domains, order of attributes in levels of MDB-tree is
very important for efficiency of MD-algorithm. It is better for MD-algorithm to
build MDB-tree with smaller actual domains in its higher levels and attributes
with bigger actual domains in its lower levels. When most of the object’s
attributes have their actual domains of big sizes, the usage of MD-algorithm is not
suitable solution of top-K problem. In this case, the usage of TA-algorithm is
more suitable. In general, we can suppose according to size of attribute’s domains
of a known object set, which of the algorithms will achieve better results.
      </p>
      <p>Example 1. We have set of 8 822 flats for rent in Prague. Flats have four
important attributes for users, District, Type, Area, and Price. These attributes
have sizes of domains, kdom(District)k = 10, kdom(T ype)k = 10, kdom(Area)k
= 229, kdom(P rice)k = 411, respectively. When a user prefers attributes District
and Type, then it is better to store flats in MDB-tree and to use MD-algorithm.
On the other hand, when a user prefers attributes Area and Price, then it is
better to use TA-algorithm and to store flats in Fagin’s lists.</p>
      <p>In general, the attribute with a small domain size is nominal attribute and
attribute with big domain size is ordinal attribute (see Section 2). It is valid also
in Example 1, attributes District and Type are nominal attributes, Area and
Price are ordinal attributes.
6.1</p>
    </sec>
    <sec id="sec-3">
      <title>MXT-algorithm</title>
      <p>Therefore we developed a new top-K algorithm, MXT-algorithm, which is based
on combination of MD-algorithm and Fagin’s TA-algorithm. MXT-algorithm
uses a new data structure, which is composed of MDB-tree and Fagin’s sorted
lists. This data structure is shown in Figure 4.</p>
      <p>We suppose a set of objects X with m attributes A1, ..., Am. Attributes
A1, ..., An are nominal attributes and An+1, ..., Am are ordinal attributes.
Attributes A1, ..., An are stored in MDB-tree with n levels. Instead of the following
m − n levels of MDB-tree, groups of m − n Fagin’s sorted lists are used. These
lists contain pairs (x, aix) with values of attributes An+1, ..., Am.</p>
      <p>MXT-algorithm is developed on the base of MD-algorithm. First n attributes
A1, ..., An are searched in the same way as during the computation of
MDalgorithm. In every B+-tree there are references into the groups of m − n Fagin’s
sorted lists. In each of these groups a new instance of TA-algorithm is run.</p>
      <p>The efficiency of MXT-algorithm is based on idea that we do not need to
obtain the best K objects from each the group of m − n Fagin’s sorted lists.
We need to obtain only objects with better rating as MK (rating of the K-th
best object in global list TK ). There, it is sufficient that local threshold Thlocal
(in each the group of m − n Fagin’s sorted lists) with MK is compared. Local
TA-algorithm is able to stop earlier in group of Fagin’s sorted lists, Thlocal ≤ MK
holds earlier than in original TA-algorithm. (see Figure 4, dotted line).</p>
      <p>The following pseudo-code describes the MXT-algorithm, where procedure
getNextKey is the same as in the MD-algorithm (see Section 5).
Input: MDBtree MDB-tree, Aggregation @, int K;
Output: List TK;
var List TK; {temporary list of objects}
begin
findTopK(MDB-tree, (∅), @, K);
return TK;
end.
procedure findTopK(MDBtree MDB-tree, Identifier (k1, ..., ki−1),</p>
      <p>Aggregation @, int K);
while(there is next key in B+-tree)do {with identifier (k1, ..., ki−1)}
ki = getNextKey(MDB-tree, (k1, ..., ki−1));</p>
      <p>{ρ(ki) refers to B+-tree P or it to group of Fagin’s lists P }
if(|TK| = K and B(P ) ≤ MK )then return;
if(P is B+-tree)then</p>
      <p>findTopK(MDB-tree, (k1, ..., ki−1, ki), @, K);
if( P is group of Fagin’s lists)then
while(|TK| &lt; K or Thlocal &gt; MK )do
(x, aix) = getNextPair(L1, ..., Lm); {it obtains next pair from
aliast = aix; one of the lists L1, ..., Lm}
Thlocal = @(al1ast, ..., almast);
if(x ∈/ TK)then
get the missing attribute values of the object x;
if(|TK| &lt; K)then</p>
      <p>insert object x to the list TK on the right place by @(x);
else
if(@(x) &gt; MK )then
begin
delete K-th object from the list TK;
insert object x to the list TK oaccording to @(x);
end;
endwhile;
endwhile;
end.
6.2</p>
      <p>Application of local preferences
Analogously to MD-algorithm in attributes A1, ..., An it is possible to use
application of the local user preferences. Procedure getNextKey changes
MXTalgorithm in the same way as in the MD-algorithm (see Section 5.1).</p>
      <p>In attributes An+1, ..., Am it is also possible to use application of the local user
preferences. Analogously to TA-algorithm, we use as the group of m − n Fagin’s
lists Ln+1, ..., Lm, a list of m − n B+-trees Bn+1, ..., Bm, and for a realization of
direct access we can use, for example, an associative array.
7 Implementation and Experiments
We implemented top-K TA-algorithm, MD-algorithm and MXT-algorithm using
lists based on B+-trees. The implementation has been developed in Java and it
uses data structures created in memory. Important for us was the number of
accesses into used data structures during calculation of top-K algorithms.</p>
      <p>We tested TA-, MD- and MXT-algorithm. During the tests, we used uniform
preferences. We used user’s fuzzy functions with course ”f (x)=x” as user’s local
preferences, and we used the arithmetic average as user’s global preference.</p>
      <p>At first, we tested two sets of 100 000 objects with 5 attributes with normal
and regular (uniform) distribution of attribute values. We used TA-algorithm,
MD-algorithm and three variants of MXT-algorithm, i.e. MXT 3, MXT 2, MXT
1, because attribute types were not important there. For example, variant MXT
3 uses first 3 attributes as nominal attributes, which are stored as MDB-tree
with 3 levels and other 2 attributes are stored as groups of 2 Fagin’s sorted lists.
Figure 5 shows results of this test.</p>
      <p>The best results have been achieved with MXT 3 and MD-algorithm for the
set of objects with the regular distribution of the attributes values. Test for sets
of objects with normal distribution of attribute values has shown that the new
MXT-algorithm can in some cases also achieve worst results.</p>
      <p>Afterwards, we tested the sets of 8 822 flats for rent in Prague (see Section 6,
Example 1). There were two nominal attributes with a small domain size and two
ordinal attributes with a big domain size. We used TA-algorithm, MD-algorithm
and the most suitable variant of MXT-algorithm (as MXT 2 in previous test).
Figure 6 shows results of this test for uniform preferences, and for real user’s
preferences, where user prefers flats of some types in specific districts, with
smaller prices and bigger areas. These two ordinal attributes were preferred
according to non-monotone fuzzy functions.</p>
      <p>The best results has been achieved with MXT-algorithm for both of used
preferences. The test has shown that MXT-algorithm is most efficient solution of
top-K problem for some cases. Especially it is efficient for a set of objects, which
has nominal attributes with a small domain size and several ordinal attributes
with a big domain size.
We developed a new MXT-algorithm, which can effectively find the best K
objects by user’s preferences without accessing all the objects. We implemented
top-K algorithms TA-, MD- and MXT-algorithm with support of user’s
preferences. Results of MXT-algorithm have shown to be comparable with those
obtained by others used top-K algorithms.</p>
      <p>MXT-algorithm is based on combination TA- and MD-algorithm. According
to type of object attributes it is possible to store a set of objects in MDB-tree, in
Fagin’s lists or in data structure, which MXT-algorithm uses. Moreover,
MXTalgorithm can find the best K objects in each of these data structures. In this
meaning, TA- and MD-algorithm are extreme cases of MXT-algorithm.</p>
      <p>
        Motivation of future research can be to find application of developed
algorithms in various data structures. The article [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] deals with solving of the
top-K problem in XML. Application of our algorithms in XML might be also
interesting.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ondreiˇcka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pokorny</surname>
          </string-name>
          ´ J.:
          <article-title>Extending Fagin's algorithm for more users based on multidimensional B-tree</article-title>
          .
          <source>In: Proc. of ADBIS</source>
          <year>2008</year>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Atzeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caplinskas</surname>
          </string-name>
          , and H.
          <string-name>
            <surname>Jaakkola</surname>
          </string-name>
          (Eds.),
          <source>LNCS 5207</source>
          , Springer-Verlag Berlin Heidelberg,
          <year>2008</year>
          , pp.
          <fpage>199214</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bayer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCreight</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Organization and Maintenance of Large Ordered Indices</article-title>
          ,
          <source>Acta Informatica</source>
          , Vol.
          <volume>1</volume>
          ,
          <issue>Fasc</issue>
          . 3, 1972 pp.
          <fpage>173</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Eckhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Pokorny´,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Vojt´aˇs,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>A system recommending top-k objects for multiple users preference</article-title>
          .
          <source>In Proc. of 2007 IEEE International Conference on Fuzzy Systems, July 24-26</source>
          ,
          <year>2007</year>
          , London, England, pp.
          <fpage>1101</fpage>
          -
          <lpage>1106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lotem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naor</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimal aggregation algorithms for middleware</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>66</volume>
          (
          <year>2003</year>
          ), pp.
          <fpage>614</fpage>
          -
          <lpage>656</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Gursky´,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Lencses</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , Vojt´aˇs,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Algorithms for user dependent integration of ranked distributed information</article-title>
          .
          <source>Proceedings of TED Conference on e-Government (TCGOV</source>
          <year>2005</year>
          ), Pages
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bruno</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gravano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Marian</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Evaluating top-k queries over web-accessible databases</article-title>
          .
          <source>in: Proc. of International Conference on Data Engineering (ICDE)</source>
          ,
          <year>2002</year>
          , pp.
          <fpage>369</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gravano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marian</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimizing Top-k Selection Queries over Multimedia Repositories</article-title>
          .
          <source>IEEE Trans. On Knowledge and Data Engineering</source>
          ,
          <year>August 2004</year>
          (Vol.
          <volume>16</volume>
          , No. 8) pp.
          <fpage>992</fpage>
          -
          <lpage>1009</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Scheuerman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ouksel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multidimensional B-trees for associative searching in database systems</article-title>
          .
          <source>Information systems</source>
          , Vol.
          <volume>34</volume>
          , No.
          <volume>2</volume>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Vojt´aˇs,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Fuzzy logic aggregation for semantic web search for the best (top-k) answer</article-title>
          ,
          <source>Capturing Intelligence</source>
          , Volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>Chapter</surname>
            <given-names>17</given-names>
          </string-name>
          ,
          <year>2006</year>
          , Pages
          <fpage>341</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gursky</surname>
            ´ P., Vanekov´a
            <given-names>V.</given-names>
          </string-name>
          ,
          <article-title>Pribolov´a J.: Fuzzy User Preference Model for Topk Search</article-title>
          .
          <source>Proceedings of IEEE World Congress on Computational Intelligence (WCCI)</source>
          ,
          <source>Hong Kong, FS0377</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Marian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amer-Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koudas</surname>
          </string-name>
          , N.:
          <article-title>Adaptive Processing of Top-k Queries in XML</article-title>
          .
          <source>Data Engineering</source>
          ,
          <year>2005</year>
          .
          <source>ICDE 2005. Proceedings. 21st International Conference on 05-08 April 2005 Page(s)</source>
          :
          <fpage>162</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>