<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Diversity in spatio-textual data objects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andreas Alamanos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akrivi Vlachou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Communication Systems Engineering (ICSD), University of the Aegean</institution>
          ,
          <addr-line>Karlovasi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, more and more services rely on the user's location to provide relevant information to the user. Many of the location-based services (LBS) allow users to search for points of interest (POIs), such as restaurants, hotels, etc., based on their preferences and the distance from them. In such applications, the queries posed by the users actually include spatial and textual information. In this paper, we address the problem finding the most popular points of interest based on a set of users queries but at the same time our result set should be of high diversity in order to represent the preferences of all users. We first provide an appropriate problem definition, so that the selected points of interest are dissimilar to each other but also popular for the users. We evaluate experimentally our approach and our experimental evaluation shows that in all cases our approach succeeds to retrieve objects of high diversity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Spatio-textual Queries</kwd>
        <kwd>Diversity</kwd>
        <kwd>Top- Queries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, many applications such as location-based
services (LBS), allow the users to pose queries based on
their location. Usually, in order to avoid overwhelming the
users with many object suggestions, a restricted set of 
objects is presented to the user. In many applications, users
express their preferences by providing textual description
(keywords) and  objects are retrieved and sorted based
on the distance to the user and the textual similarity. Such
queries are know as spatial-keyword search queries and
location-based services (LBS) allow users to search for
points of interest (POIs), such as restaurants, hotels, etc., by
processing spatial-keyword search queries.</p>
      <p>Example. Consider for example, a tourist that looks for
a "nearby Italian restaurant that serves pizza”. Figure 1
depicts a spatial area containing user locations (query
points) and restaurants (points of interest). Each restaurant
has textual information in the form of keywords extracted
from its menu, such as pizza or steak, which describes
additional characteristics of the restaurant. The tourist
also specifies a spatial constraint (in the figure depicted
as a range around his location) to restrict the distance of
restaurants to his position. Obviously, the best option for
a tourist 2 that poses the aforementioned query is the
restaurant 3, because its within a given range and contains
the given keywords. On the other hand, for the tourist 1,
the best option is 1. In the general case, many diferent
users with diferent locations and diferent preferences
expressed as keywords are using location based services.</p>
      <p>
        Even though spatial-keyword queries have been studied
before [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], in this paper, we address a diferent problem.
The focus of this paper is to provide an approach to analyze
the preferences of a set of users that are indirectly expressed
by the queries they have been posed. The points of interests
that have been retrieved and presented to the users are
popular for these users. Since many diferent queries may
have posed by the same or diferent users, we focus on
selecting a restricted set of  points of interests that are
popular and at the same time are diverse, in order to cover
the preferences of as many users as possible.
      </p>
      <p>In this paper, we assume that there exists a query log file
and we formulate a novel approach (Diversified  Selection
Problem ) that retrieves a set of  points of interest
of high diversity. The concept of diversification has been
introduced in several systems, to avoid presenting to the
user similar objected that may fail to trigger his interest.
Our primary objective is to select a set of spatio-textual
objects that are popular based on the preferences of a set
of users, but at the same time cover their interests. Thus,
we consider as candidate objects, all objects that have been
retrieved and presented to the users through the queries
they have posed. This objects are considered popular, since
that objects match the users preferences. Then, based on
the similarity of the candidate objects, our goal is to select
those objects that are dissimilar to each other, i.e, maximize
diversity of the retrieved set.</p>
      <p>To this end, we first define the notion, of similarity and
diversity for spatio-textual objects (Section 3) and then,
formulate our problem statement (Section 3.3) and provide an
appropriate algorithm (Section 4). In our experimental
evaluation (Section 5), we study the performance of our approach
using varying number of queries, number of retrieved
objects of interest () and number of querying keywords. We
compare our approach against a naive approach which takes
into account only the popularity of the objects. Our
experimental evaluation shows that in all cases the Diversified
 Selection Problem  succeed to retrieve objects of
high diversity.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The spatial keyword query is extensively studied. Surveys
of the diferent proposed approaches are provided in [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a categorisation of the indexing methods is
provided, based on the spatial or the text prioritisation of the
underlying structures, as text-first or spatial-first
combination of spatial and text indices. The authors of [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] introduce
the spatial-first IR 2-Tree indexing method which
embodies superimposed text signatures in a plain IR-Tree [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for
solving the top- spatial keyword searching. The authors
of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] attempted to resolve eficiency issues of the IR 2-Tree,
by providing a spatial inverted index. A text-first index for
top-k spatial keyword queries, named S2I, was introduced
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The algorithm maps each keyword to a diferent
aggregate R-tree which stores the objects with the given
term. An experimental study for 12 geo-textual indexing
structures was carried out in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Their results ofer a usage
manual for the examined algorithms.
      </p>
      <p>
        A variant of spatial-keyword search is proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The process, called top- MULTI query, involves expanding
the top-k query to encompass additional data types beyond
text and spatial categories. This is done in a manner that
ensures the original top-k spatial-keyword search is a specific
case within this broader process.
      </p>
      <p>
        The outcomes of diversification in spatial queries for
identifying top-k results have drawn notable attention in the
literature, with several solutions focusing on an
incremental way of retrieving the results. Diversification is mostly
bound on the dissimilarity in the context of content, novelty
and coverage [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Two greedy algorithms for query
result diversification are introduced in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Greedy Marginal
Contribution (GMC) and Greedy Randomized with
Neighborhood Expansion (GNE). The former incrementally builds
the results by selecting the element with the highest
maximum marginal contribution, whilst the latter diversifies
the results by choosing a random element, among the top
ranked ones, is chosen.
      </p>
      <p>
        The methodology employed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] involves utilizing a
graph representation and diversification approach on top-k
results. The authors propose a set of new functions aiming to
more eficient search on big graphs. The objectives outlined
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] align with the aforementioned goals; nevertheless,
the authors place particular emphasis on querying data
initially presented as a knowledge graph and subsequently
diversifying the sub-graphs.
      </p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] aim to accomplish the diversification
of the results based on the users’ known preferences. In
pursuit of this objective, a diversity is defined, where each
object is represented by its reverse top- result set, and
retrieves the  objects that maximize their diversity value.
In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the authors diversify their results using normalized
relevance, coverage, and execution time evaluation
metrics. They introduce the PrefDIv algorithm, an incremental
method that eliminates similar items originally retrieved
until it reaches the threshold .
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] a diversification framework is presented that
considers spatial and contextual similarity together with
contextual and spatial proportionality. Proportionality is
obtained by evaluating the characteristics of the retrieved
items within designated categories and ensuring a
proportional representation of these objects. In [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] an alternative
approach is followed combining spatial proximity with
social diversity.
1–6
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Statement</title>
      <p>In this section, we provide the necessary definitions and our
novel problem statement.</p>
      <sec id="sec-3-1">
        <title>3.1. Preliminaries</title>
        <p>
          Let  = {1, 2, ..,  } a set of spatio-textual objects,
deifned by a spatial and a textual part. The primary objective
of the spatial-keyword search is to retrieve the  objects
that match the given preferences and are ranked by the
minimal distance from a given query location. Diferent query
types have been proposed in the related literature [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Our
diversity approach can be applied on any query type, but
for sake of simplicity we assume that the retrieved objects
are described by at least one query term, while the ranking
is based on spatial distance.
        </p>
        <p>Definition 1 (Keyword kNN Query ( )). A  
 = &lt;, , , &gt; takes four parameters, where ., . define
the coordinates of a spatial point, . is a set of keywords, and
. is the number of objects to retrieve. The result  of a
  query is a set of  objects such that ∀ ∈  (̸ ∃′ ∈
( −  : (′, ) ≤ (, ) ∧ {. ⋂︀ ′.} ̸= ∅),
where  a spatial distance function.</p>
        <p>Thus, the result set of a   query is a set of  objects
, such that at least one query term . is contained in
the textual part of each object and the objects are ranked
according to their distance to the query location.</p>
        <p>Given a set of  query result sets  = {1, 2, .., }
that correspond to the preferences of the users that pose
queries, the goal is to retrieve  objects that are interesting
for all the  users. Thus, the candidate objects are the objects
retrieved by their queries and a subset of  are selected
based on their dissimilarity in order to provide a set of high
diversity.</p>
        <p>
          Even though in the following we assume that  
queries are posed, our approach can support any
spatiotextual query[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], since only the query result sets are needed
to be stored and not the queries themselves. This is also
a benefit as far as privacy issues are concerned since the
actual query q that may contain sensitive information, such
as the user location, does not need to be stored.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Motivating Example</title>
        <p>In Table 1 we depict a small dataset1 containing 12
restaurants in Athens, GA, USA. Each tuple corresponds to a
restaurant and has an artificial ID ( ), a unique restaurant
identifier (RID), the restaurant’s name, its longitude and
latitude and a set of keywords that describe the cuisine of
the restaurant. During the example we will refer to each
object  as . Given a query  and its query result  , we
define as  the ranking position of an object .</p>
        <p>Let us assume that a user is at the location (− 83.35,
33.95) looks for the 3 closest restaurants that serve
American cuisine. Thus, the user poses a query 1 = (
− 83.35, 33.95, {}, 3) and the result set is
1 ={DePalma’s Italian Cafe - East Side (9), DePalma’s
Italian Cafe - Downtown (8), Last Resort Grill (1)}. Thus,
it holds that 91 = 1, 71 = 2 and 11 = 3.
Table 2 shows the result sets of three diferent queries.
1https://www.kaggle.com/datasets/shrutimehta/
zomato-restaurants-data/</p>
        <p>In our scenario, the aim is to select  restaurants that
are popular but also diverse in the sense that they cover the
interest for all users, whose preferences are expressed by
the queries they have posed. Assuming  = 2, based on
the result sets in Table 2 we could conclude that the most
popular objects are {DePalma’s Italian Cafe - East Side (9),
DePalma’s Italian Cafe - Downtown (7)}, but these objects
fail to take into account diversity.</p>
        <p>Thus, a naive way is to find the  most popular
objects by selecting the  objects that have the highest
∑︀∀ (| | −  ). This means to select the objects
that are ranked high in as many as possible query result
sets. Even though this approach selects popular objects,
it fails to handle diversity. Thus, objects that are highly
ranked in similar queries may be selected, while other users
may not represented by the select objects. In our example,
the third user is not interested in any restaurant of the
selected set {DePalma’s Italian Cafe - East Side (9), DePalma’s
Italian Cafe - Downtown (8)}.</p>
        <p>Definition 2 (Naive  Selection Query ( )). Given
a set of queries {} and an integer , the   set is a
set that satisfy the following two conditions:
1.   ⊆ ⋃︀  and | | = .
2. For any  ∈   and  ∈ ⋃︀  −  ,
 ≥ .</p>
        <p>The naive selection query does not take into account
diversity. In following we address this problem.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Problem Statement</title>
        <p>
          Diversifying query result sets is an important problem [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
since in many real-life applications the users only inspect a
small set of  objects. In the following, we first define the
notion of similarity of two spatio-textual objects ,  .
        </p>
        <p>Definition 3 (Similarity of spatio-textual objects ,  :).
The similarity (,  ) of two spatio-textual objects , 
is defined as:
where (,  ) measures the spatial similarity and
is defined as (,  ) = (_−_(, )) ,
(,  ) is a spatial distance function and _
the maximum distance between any two objects in .
 (,  ) measures the textual similarity, such as
Jaccard index. Parameter  defines the relative importance
of the spatial and textual similarity, otherwise it is set to 0.5.
Example. Given the objects 1 and 2 of Table 1,
spatial similarity is computed based on the Harversine
distance. The Harversine distance between 1 and 2
is (1, 2) = 1.4 km, while the maximum distance
among all objects in the example set is _ = 10.06
km. Consequently, the spatial similarity is (1, 2)
= (10.1006.−061.4) , resulting in a value of 0.86. Similarly,
the textual similarity is calculated by assessing the
intersection and union of two sets of keywords (1.,2.)
where the term "Southern" is the common element.
The textual similarity component is then determined
as:  (1, 2) = ||∪∩ || = 13 ≈ 0.333.
Subsequently,the overall similarity is computed as the weighted
sum of the spatial and textual similarity components:
(1, 2) =  * 0.86+(1−  )* 0.333 = 0.59 ( = 0.5).</p>
        <p>We extend the notion of spatial-textual objects similarity
for a set of objects ,  = {1, 2, .. }.</p>
        <p>Definition 4 (Similarity of points 1, 2, .. ). We define
the similarity  for a set of  objects ,  = {1, 2, .. } as
() =  * () + (1 −  ) (),</p>
        <p>
          (),  () ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
where () = (_− _((,.. ))) , where
((, .. )) is used to denote the average pairwise
distance between the  objects, and  () a textual
similarity function such as the extended version of the
Jaccard index [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>In order to avoid overwhelming the users with many
object suggestions, in the majority of applications a restricted
set of  objects is presented to the user. The concept of
diversification has been introduced in several systems, to
avoid presenting to the user similar objected that may fail
to trigger his interest. In the current paper, our primary
objective is to select a set of spatio-textual objects that are
popular based on the preferences of a set of users, but at
the same time cover all their interests. Thus, we consider
as candidate objects a collection of spatio-textual query
result sets that express the user preferences and the selected
objects maximize their dissimilarity, i.e, diversity.
Definition 5 ( Diversity of a set  = {1, 2, ..}). We
define as diversity of a set  = {1, 2, ..}</p>
        <p>() = 1 − ({1, 2, .. })
Example. The assessment of object similarity of objects 1,
2 and 3 (Table 1) involves calculating the average
Haversine spatial distance, yielding 4.7 km. The spatial similarity
is then computed as ({1, 2, 3}) = (10.06− 4.7) ,
10.06
resulting in 0.52. Similarly, the textual similarity is
calculated by assessing the intersection and union of the
three keywords sets. The intersection, yields zero, thus
 ({1, 2, 3}) = 0. This indicates a complete
dissimilarity, as no shared elements exist among the sets.
Subsequently,the overall similarity is ({1, 2, 3}) = 0.26.
Thus, the is ({1, 2, 3}) = 1 − 0.26 = 0.74
Definition 6 ( Diversified  Selection Problem ).
Given a set of spatio-textual points  = {1, 2, ..}, and an
integer  where 1 ≤  ≤ | |, the diversified  selection
problem result set, denoted as , is a set of objects that
satisfy the following two conditions:
1.  ⊆  and || = 
2. ∄′ such that ′ ̸= ,
|′| =  and (′) &gt;
().</p>
        <p>The above definition ensures that the diversity
() is maximised compared to all other subset of
size .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithm</title>
      <p>In order to solve the Diversified  Selection Problem
 we propose a greedy algorithm (Algorithm 1).
Given a set of candidate objects  = ⋃︀  (i.e., the
objects that belong to the query result sets), our algorithm first
inspects all pairs of candidate objects and computes their
dissimilarity (). From those pairs it selects the pair with
the higher dissimilarity. Thereafter, it inspects the triples
that contain the two selected objects and one of the
remaining candidate objects, and selects the objects that result in
the higher (). This processes is repeated until  objects
are selected.</p>
      <p>Algorithm 1 takes as input the result sets of the user’s
queries and the parameter  that defines the number of
returned objects. In the initiation step (lines 2–9), all objects
undergo pairwise cross-comparison with each other with
respect to their dissimilarity and the most promising pair of
objects is selected. Thereafter (10–19), one point is selected
in each repetition in such a way that the dissimilarity is
maximized. This process is repeated until  points are
selected and finally, the selected set ℳ is returned (line 20).</p>
      <p>
        For the initiation step, our algorithm performs ||2
comparisons, while for the remaining steps the comparisons are
 * || , which results in a complexity of (||2). In order
to reduce the algorithmic complexity our algorithm could
avoid the first step and select a random point as the first
selected candidate. Obvious, the diversity of the selected
objects is smaller in this case. Alternative, a spatio-textual
index structure [
        <xref ref-type="bibr" rid="ref19 ref3 ref8">8, 3, 19</xref>
        ] could be used to speed up the
comparison and to prune pairs of candidates that cannot lead to
high diversity.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Evaluation</title>
      <p>In the following, we first present our experimental setup
and then we describe our experimental results</p>
      <sec id="sec-5-1">
        <title>5.1. Experimental Setup</title>
        <p>Given a dataset, we generate a set of queries and store the
results of those queries. Note that the size of the dataset
does not influence the performance of our approach but
only the size of the  set. Thus, we evaluate the parameters
that influence this size.
(a) Varying 
(b) Varying 
(c) Varying ||
Dataset: The dataset contains real data, which were
obtained from factual.com and describes restaurants for 13 US
states (≈ 79K objects). In more details we collected
restaurant that are annotated with their location. Moreover, for
the collected restaurants we added textual description of the
served food, mentioned as “cuisine”. The number of distinct
values of keywords for the cuisine is around 130 and each
restaurant description may contain one or more keywords.
Evaluated Approaches: We compare the following
approaches:
1. Random ( ), selecting  points of interest
randomly
2. Naive  Selection Query ( ), a naive
approach that takes into account the ranking position
of the points of interest.
3. Diversified  Selection Problem (), our
approach to select  diverse points of interest.</p>
        <p>Query generation: For each experiments we generate ||
Keyword kNN Query ( ) queries by using the following
approach: for each query we randomly (uniformly) pick a
latitude and longitude that falls in the area that each defined
by the minimum and maximum values of coordinates of
the points of interest in our dataset. The  is set to a given
value per experiment. In order to make sure that at least
one point of interest exists with the given keywords, the
keywords are generated by picking a random (uniformly)
point of interest and selecting at most || keywords of the
selected point of interest. All queries in each experiments
have the same  and || parameters, while the location and
the keywords vary per query.</p>
        <p>Experimental parameters: We vary the parameters of
the Keyword kNN Query ( ) parameters in our
experiments. The parameters are  the number of retrieved data
objects and || the number of given terms per query. In
addition, for the Diverse  Selection Problem  we
vary as parameters the number of queries || that are stored
in the log file, while the  parameter is set to 3. Table 3
overviews the parameters, while the default values are
depicted with bold. Finally, the Haversine distance is used as
a spatial distance in the experimental evaluation.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Experimental Result</title>
        <p>In Figure 3, we illustrate the time required to identify the
 = 3 points of interest using our  algorithm. As
expected, the number of retrieved data  increases the time
for computing the , since the number of the
candidate objects increase.</p>
        <p>In the next set of experiments, we compared the three
approaches based on the diversity (()) of the  selected
objects. We depict the diversity of the retrieved result set
for the three diferent approaches, i.e., the value 1 is the
maximum value and indicates high diversity.</p>
        <p>In the first experiment we vary the parameter  (|| =
50, || = 3) and measure the diversity that should be as high
as possible since we aim to diverse result sets. Figure 2a
depicts the results of this experiment. We notice that  
and   have similar results as none of those takes into
account the similarity during the selection process. More
interestingly, it seams that   retrieves even less
diverse points than  . This is because   favours
objects that appears in the result set of popular or
similar queries. These objects even though they are popular,
are interested only for a fraction of users. On the other
hand,  succeeded much larger diversity values, and
as expected is not influenced by the  value.</p>
        <p>In the second experiment we vary || ( = 5, || = 50)
and the results are depicted in Figure 2b. We notice that
the diversity decreases with . The main reason is that the
 objects retrieved by the Keyword kNN Query ( )
have a smaller spatial distance, since more objects satisfy
the keyword criteria. Thus, the candidate objects are closer
in spatial space leading to smaller values of diversity, i.e.
higher similarity.</p>
        <p>Finally, Figure 2c shows the results for varying || ( = 5,
|| = 3). Again,  outperforms the other approaches
and is not influenced by the value of , while the diversity
of   and   is smaller in all cases.</p>
        <p>To summarise, the experimental evaluation shows that in
all cases our approach manages to retrieve a set of  points
of interest with high diversity.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In this paper, we address the challenge of identifying the
most popular objects according to query log file, while also
ensuring that select objects demonstrates a significant level
of diversity. To this end, we propose a novel approach
(Diversified  Selection Problem ) that retrieves a
set of  points of interest that are popular and diverse at
the same time. In our experimental evaluation, we study
the performance of our approach using varying number of
queries (), number of retrieved objects of interest () and
number of querying words (). We compare the diversity
of the selected points against a set of randomly selected
points and a a naive approach which takes into account only
the popularity of the objects. Our experimental evaluation
shows that in all cases the Diversified  Selection Problem
 succeeds to retrieve objects of high diversity.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Yiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mamoulis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vaitis</surname>
          </string-name>
          ,
          <article-title>Top-k spatial preference queries</article-title>
          ,
          <source>in: 2007 IEEE 23rd International Conference on Data Engineering</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>1076</fpage>
          -
          <lpage>1085</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Cong,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <article-title>Location- and keyword-based querying of geo-textual data: a survey</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>30</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Cong,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Spatial keyword query processing: An experimental evaluation</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>6</volume>
          (
          <year>2013</year>
          )
          <fpage>217</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Spatial keyword search: a survey</article-title>
          ,
          <source>GeoInformatica</source>
          <volume>24</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>I. De Felipe</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hristidis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Rishe</surname>
          </string-name>
          ,
          <article-title>Keyword search on spatial databases</article-title>
          ,
          <source>in: 2008 IEEE 24th International conference on data engineering</source>
          , IEEE,
          <year>2008</year>
          , p.
          <fpage>656</fpage>
          -
          <lpage>665</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Cao</surname>
          </string-name>
          , G. Cong,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ooi</surname>
          </string-name>
          ,
          <article-title>Collective spatial keyword querying</article-title>
          ,
          <source>in: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>373</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <article-title>Fast nearest neighbor search with keywords</article-title>
          ,
          <source>IEEE transactions on knowledge and data engineering 26</source>
          (
          <year>2013</year>
          )
          <fpage>878</fpage>
          -
          <lpage>888</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Rocha-Junior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gkorgkas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jonassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nørvåg</surname>
          </string-name>
          ,
          <article-title>Eficient processing of top-k spatial keyword queries</article-title>
          ,
          <source>in: Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science</source>
          , Berlin, Heidelberg,
          <year>2011</year>
          , p.
          <fpage>205</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.-Y.</given-names>
            <surname>Kwon</surname>
          </string-name>
          , K.-Y. Whang,
          <article-title>Scalable and eficient processing of top-k multiple-type integrated queries</article-title>
          ,
          <source>World Wide Web</source>
          <volume>19</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Drosou</surname>
          </string-name>
          , E. Pitoura,
          <article-title>Search result diversification</article-title>
          ,
          <source>ACM SIGMOD Record</source>
          <volume>39</volume>
          (
          <year>2010</year>
          )
          <fpage>41</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Razente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C. N.</given-names>
            <surname>Barioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hadjieleftheriou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Traina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          ,
          <article-title>On query result diversification</article-title>
          ,
          <source>in: 2011 IEEE 27th International Conference on Data Engineering</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1163</fpage>
          -
          <lpage>1174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Diversifying top-k results</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>5</volume>
          (
          <year>2012</year>
          )
          <fpage>1124</fpage>
          -
          <lpage>1135</lpage>
          . doi:
          <volume>10</volume>
          .14778/2350229.2350233.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.-C.</given-names>
            <surname>Fu</surname>
          </string-name>
          , R. Liu,
          <article-title>Diversified top-k subgraph querying in a large graph</article-title>
          ,
          <source>in: Proceedings of the 2016 International Conference on Management of Data</source>
          ,
          <year>2016</year>
          , p.
          <fpage>1167</fpage>
          -
          <lpage>1182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Gkorgkas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doulkeridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nørvåg</surname>
          </string-name>
          ,
          <article-title>Finding the most diverse products using preference queries</article-title>
          .,
          <source>in: 18th International Conference on Extending Database Technology</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chrysanthis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Labrinidis</surname>
          </string-name>
          ,
          <article-title>Preferential diversity</article-title>
          ,
          <source>in: Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kalamatianos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Fakas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mamoulis</surname>
          </string-name>
          ,
          <article-title>Proportionality in spatial keyword search</article-title>
          ,
          <source>in: Proceedings of the 2021 International Conference on Management of Data</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>885</fpage>
          -
          <lpage>897</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maropaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doulkeridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nørvåg</surname>
          </string-name>
          ,
          <article-title>Diversifying top-k point-of-interest queries via collective social reach</article-title>
          ,
          <source>in: Proceedings of International Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>2149</fpage>
          -
          <lpage>2152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L. da Fontoura</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <article-title>Further generalizations of the jaccard index</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tsatsanifos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachou</surname>
          </string-name>
          ,
          <article-title>On processing top-k spatiotextual preference queries</article-title>
          .,
          <source>in: 18th International Conference on Extending Database Technology</source>
          ,
          <year>2015</year>
          , p.
          <fpage>433</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>