<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Current Approaches to Search Result Diversication</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrico Minack</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Demartini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Nejdl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>L3S Research Center, Leibniz Universitt Hannover</institution>
          ,
          <addr-line>30167 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the growth of the Web and the variety of search engine users, Web search eectiveness and user satisfaction can be improved by diversication. This paper surveys recent approaches to search result diversication in both full-text and structured content search. We identify commonalities in the proposed methods describing an overall framework for result diversication. We discuss dierent diversity dimensions and measures as well as possible ways of considering the relevance / diversity trade-o. We also summarise existing eorts evaluating diversity in search. Moreover, for each of these steps, we point out aspects which are missing in current approaches as possible directions for future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In the last years, the Web has become the largest and most consulted public
source of information, and Web search emerged as the primary technique for
nding relevant information on the Web. Search engines usually provide a long
list of results that contains thousands of entries, where the most relevant results
tend to be quite similar [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In particular for informational queries [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], users
reading through a list of relevant but redundant pages quickly stop as they
do not expect to learn more. The phenomenon of saturated user satisfaction
is a well-understood and extensively studied eld in economics called law of
diminishing marginal returns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        The amount of data on the Web is growing exponentially, and so does the
amount of relevant results for a query. Given that most search engine users only
look at the rst page of available results, to improve user satisfaction, this search
result list should be optimised to contain both relevant and diverse results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
fairly representing the thousands of relevant results. This task is also known as
search result diversication .
      </p>
      <p>For an ambiguous query like Jaguar, a search result list should contain
results about the car, the animal, the operating system and other senses. In case
of an unambiguous query like nuclear power plant, the list should be diverse
in the contained information: objective and opinionated sites, supportive and
opposing thoughts, related topics and subtopics. It is easy to see how this can
be a computational expensive process that is dicult to run at query time.</p>
      <p>
        The goal of this paper is to survey recent approaches in this area,
identifying commonalities and dierences between these works. We also present possible
open questions not yet addressed by state-of-the-art techniques. Here, we
focus on the eld of search result diversication, however, we want to point to
other elds where similar problems have been addressed and solutions might be
adaptable. For example, recommender systems provide a list of items which are
interesting (i. e., relevant) and novel (i. e., diverse from the ones the user already
knows) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Another example is image or video search where near-duplicate
results are removed [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], or multiple senses of ambiguous queries are covered [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Dynamic clustering algorithms on image features are used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to provide
visually diverse result sets. In general, clustering algorithms may provide adaptable
(dis)similarity measures that are used to create sets of items with high intra-set
and low inter-set similarity [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>In this paper, we compare current work in search result diversication. To
the best of our knowledge, there is no such recent comparison. First, we identify
common aspects and dierent notions of diversity in all proposed approaches.
We show how the trade-o between relevance and diversity is solved, which is
an NP-hard optimisation problem. As last step, search eectiveness is evaluated
not only in terms of relevance but also of diversity. Finally, we point out open
problems and areas which can be improved.</p>
      <p>The rest of the paper is structured as follows. In Section 2, we dene the
problem of search result diversication. Section 3 presents dimensions and types
of diversity, and how approaches measure them. Further in Section 4, we show
the strategies and algorithms of balancing between relevance and diversity,
eciently. The evaluations of the eectiveness of current approaches are described
in Section 5. We conclude by discussing open research questions in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Search Result Diversication: Problem Denition</title>
      <p>
        Search result diversication is an optimisation problem aiming to nd k items
which are the subset of all relevant results that contains both most relevant
and most diverse results. Usually, increasing the diversity in the subset leads
to a decrease in relevance; therefore, the optimal trade-o between relevance
and diversity needs to be found. Looking at previous work on search result
diversication, it is possible to notice that, in order to achieve the optimisation
goal, three components are usually adopted. Here, we follow the notion and
structure of a general result diversication approach presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]:
Relevance Measure: It provides a relevance score for each results which
creates an initial ranking of the items.
      </p>
      <p>Diversity Measure: This measure reects the dissimilarity between two given
items, or the overall dissimilarity of a set of results.</p>
      <p>Diversication Objective: The objective denes the way both measures are
merged into a single score that has to be maximised.</p>
      <p>
        The rst step of result diversication is to rank the items by a relevance
score as a normal retrieval task. In Information Retrieval (IR), several models
and relevance measures have been developed. In result diversifying systems, such
standard techniques have been used to rank items by their relevance. For
example, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] uses a vector space model to represent items and queries, while [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
exploits language models and KL-divergence as relevance functions.
      </p>
      <p>The second and actually diversifying component is the measure of diversity.
Such a measure provides means to represent the dissimilarity of two results
or the dissimilarity within a whole set of results with a single value. Dierent
types of diversity and proposed diversity measures will be described in Section 3.</p>
      <p>
        The third component, the diversication objective, formalises the strategy to
nd a trade-o between the two measures in order to diversifying a result set.
This optimisation is known to be NP-hard [
        <xref ref-type="bibr" rid="ref10 ref3">3,10</xref>
        ], so there is a need to develop
ecient algorithms. In Section 4, we will see what diversication objectives and
algorithms current approaches employ to eciently diversify search results.
      </p>
      <p>Finally, the quality of the result set has to be evaluated using standardised
metrics, repeatable experiments and publicly available datasets. In Section 5, we
give detailed information about the evaluation eorts of the reviewed works.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Notions of Diversity</title>
      <p>We rst introduce to some properties of diversity and take a look at the various
kinds of diversity known to exist in information sources. We then review notions
of diversity considered in recent work.
3.1</p>
      <sec id="sec-3-1">
        <title>Dimensions of Diversity</title>
        <p>
          Considering Web search, two levels of diversity can be found [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]: (1) query
terms may be ambiguous, which is word sense diversity, and (2) for a specic
word sense, the available information sources may be diverse. Dierent causes
of diversity in such information sources are known to be, e. g., educational,
cultural, spatio-temporal [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], or simply the goal of communication. These become
manifest in an orthogonal dimension, the type of diversity: e. g., conicting
information [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], opposing opinions and sentiment [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], ideological perspectives [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
or text genre [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Further, as the usage of the term diversity is itself diverse,
diversity is studied from dierent perspectives in elds like ecology, geography,
psychology, linguistics, sociology, economics, and communication [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          This diversity in information sources should not be ignored or avoided.
Instead, it should be seen as a rich feature that, handled explicitly and being
exploited, could lead to better ways to deal with diverse information sources [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Measures of Diversity</title>
        <p>We saw that there are many dimensions of diversity that can be considered
for diversication. We will now investigate which notions of diversity current
approaches consider and how they are measured. Note that the term similarity
can be used interchangeably to denote the same concept as of dissimilarity:
dissimilarity = 1 similarity, where similarity 2 [0; 1].</p>
        <p>
          Semantic Distance. Gollapudi et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] reuse the known min-hashing
scheme sketching algorithm, which produces sketches similar to random term
samples using a number of dierent hashing functions. They use the Jaccard
similarity between those sketches as the dissimilarity measure, i. e., one
minus the fraction of the cardinality of the intersection and the union of the two
sketches. This dissimilarity measure diversies based on content dissimilarity.
        </p>
        <p>
          Categorical Distance. Additionally, [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] presents a categorical distance
where dissimilarity is based on the distance between the category of the results
within a taxonomy. As a distance measure, the weighted tree distance measure
is used. In case of multiple categories being assigned, the shortest distance from
each category of one result to the categories of the other result is added up after
weighting with the minimal probability that any of the respective two categories
is assigned. This measure emphasises word senses diversication.
        </p>
        <p>
          Agrawal et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] also use categories, derived from query click logs. However,
they abstain from using an inter-result dissimilarity measure. They directly use
the information about the categories in their diversication objective.
        </p>
        <p>
          Vee et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] introduce a diversity order for relational databases being an
order among attributes ( e. g., for cars: M ake M odel Colour : : :). This
order expresses that certain attributes have higher priority to be diversied than
others (e. g., rst M ake is diversied, then M odel). They show how result tuples
can be seen as paths in a tree of values, where the paths satisfy the diversity
order. Tuples that have a longer path from the root in common are more similar
than others. Therefore, this measure is similar to a tree distance measure.
        </p>
        <p>
          Novel Information. In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], unigram language models are used to represent
results. The authors dene functions that quantify novel information a new result
conveys additionally to an (the) existing result(s) using the KL-divergence. This
measure diversies in a general sense regarding content dissimilarity.
Conclusion. The diversity measure used by a system denes the kind of diversity
the system can handle. However, none of the presented works focus on their
diversity measure. The measures are mentioned very briey without motivation.
        </p>
        <p>Looking at these diversity measures, two groups can be observed. One group
measures dissimilarity based on content similarity, whereas the other group uses
metadata about the content ( e. g., the categories), which are not extracted from
the content but taken from additional information sources ( e. g., user click logs).
Still, no measure exploits intrinsic properties of the results, e. g., the genre (blog
post, a news article, a manual) or the sentiment regarding the query topic.
Therefore, these kinds of diversity are not yet exploited explicitly for search
result diversication.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Relevance / Diversity Optimisation Problem</title>
      <p>The relevance and diversity of a search result set can be maximised using various
strategies. The main challenge for all these strategies is to select those results
that add more diversity to the set, probably at the cost of relevance. Finding a
good compromise is the primary goal.
4.1</p>
      <sec id="sec-4-1">
        <title>Diversication Objectives</title>
        <p>
          Gollapudi et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] combine the relevance measure and the dissimilarity in three
dierent ways: max-sum, max-min, and an average dissimilarity like measure.
These set selection functions are to be maximised.
        </p>
        <p>
          Max-sum Diversication. The rst objective in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] combines the sums
of the relevance and diversity measure as a weighted sum.
        </p>
        <p>Max-min Diversication. The second objective targets at maximising the
sum of the minimum relevance and minimum dissimilarity within the set.</p>
        <p>Average Dissimilarity Diversication. Their third objective adds the
original relevance for a result with the average dissimilarity regarding all other
results in the set. The sum over the whole set is to be maximised.</p>
        <p>
          Max-sum of max-score Diversication. Similarly to max-sum
diversication, [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] maximises the sum of dissimilarity of the result set, but it only
produces sets that have the maximal relevance sum. Therefore, it does not nd
sets with higher diversity scores but slightly lower relevance sum.
        </p>
        <p>
          Max-product Diversication. Based on the already chosen results, Zhai
et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] select the next result by maximising the parameterised product of the
relevance of the next result and its dissimilarity to the chosen results.
        </p>
        <p>
          Categorical Diversication. Agrawal et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] use a relevance measure
that considers the categories of a document and query. The result set is diversied
so that its results cover all categories, weighted by their probability to occur.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Diversication Algorithms</title>
        <p>
          The problem of search result diversication is NP-hard [
          <xref ref-type="bibr" rid="ref10 ref3">3,10</xref>
          ]. Therefore,
approximation algorithms have to exploit inherent structural properties of the solution
space to achieve adequate system response times. IR systems based on inverted
lists are proven to be unable to directly provide diverse results [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. In the
following, we present algorithms used to eciently nd top- k diverse search results.
        </p>
        <p>
          Gollapudi et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] show that their max-sum and max-min diversication
objectives can be casted to a facility dispersion problem for which approximation
algorithms exist. Agrawal et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] use a Greedy algorithm that starts with an
empty list of results and select the next result with the highest marginal utility
until k results are selected. The marginal utility measures the probability that
the result satises a category the current result set does not yet satisfy. Similarly,
Zhai et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] uses the same Greedy algorithm, but with their function that
represents the novel information being introduced by the next document. Vee
et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] cluster results into buckets based on their diversity order and selects
results from those buckets in order to retrieve balanced diverse results.
Conclusion. Apparently, most approaches nd a solution for the diversication
problem using Greedy approximation algorithms. All optimisation algorithms
work online on the relevant results provided by the retrieval phase. Therefore, the
presented works do not investigate the applicability of oine pre-computation
or special data structures that could improve online performance.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluating Diversity in Search</title>
      <p>This section presents methods for evaluating diversity-aware search techniques.
We describe datasets used and evaluation metrics designed for this purpose.
5.1</p>
      <sec id="sec-5-1">
        <title>Datasets for Diversity-aware Search</title>
        <p>
          In previous works, dierent types of datasets have been used. Gollapudi et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
use Wikipedia disambiguation pages as ground truth for the word senses. They
also use a structured dataset in the context of product disambiguation
evaluating the goodness of a measure based on a product taxonomy. In [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], the authors
use 10,000 queries and top 50 retrieved results from a commercial search engine,
judgements obtained with the Amazon Mechanical Turk 1, and the Open
Directory Project (ODP) 2 taxonomy to classify results. Zhai et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] use topics from
the Text REtrieval Conference (TREC) Interactive Track where assessors
identify a list of subtopics for each topics and mark the relevance of retrieved results
with respect to each subtopic. Vee et al. [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] have based their experiments on
a structured dataset using Yahoo! Autos. They perform experiments
generating keyword and structured queries measuring response times for dierent cases.
Real and synthetic structured data are used in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. They create feature vectors
they want to retrieve back as a set of diverse results.
        </p>
        <p>As we have seen, previous work use dierent and non-standard datasets. In
order to create a benchmark for diversity in search, in the Web Track at TREC
2009 the new Diversity Task started. We notice that the notion of diversity
used is rather a topical diversity. This leaves open the aspect of evaluating other
dimensions as, e. g., diversity of opinions (see Section 3.1).</p>
        <p>Conclusion. As we can see, in most cases two main types of datasets have been
used: classical textual documents to be ranked ( i. e., TREC-like tasks) and
structured datasets (i. e., for Database-like search task). In both cases, the goal is to
provide the user with a smaller set of relevant and diverse results. While we
have also seen that standard benchmarks are being created, there is still need
for creating benchmarks for specic diversication tasks.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Diversity-aware Evaluation Measures</title>
        <p>In order to evaluate the eectiveness of proposed diversity-aware search
approaches, new metrics need to be designed. In most cases, adaptation from
already existing metrics have been done.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], an evaluation framework for novelty and diversity is proposed. They
see information needs and results as sets of information nuggets, and relevance
is dened as a function of the nuggets contained in the user’s need and
previous results. Moreover, as graded relevance seems a reasonable assumption for
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>1 Amazon Mechanical Turk: http://www.mturk.com/</title>
        <p>
          2 ODP Open Directory Project: http://www.dmoz.org/
such task, they propose -NDCG: an adaptation of the well-known NDCG
metric proposed in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. They experiment on past TREC collections showing the
feasibility of the proposed approach.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] S-Recall at k is dened as the percentage of subtopics covered by
one of the rst k results. Values of S-Recall at k cannot be directly compared
among topics having a dierent number of subtopics, that is, this metric does not
account the diculty of a certain topic. For this reason they dene, S-Precision
at recall r which is the ratio between the minimal rank at which the system has
Srecall r and such minimal rank obtained by an optimal system. Additionally, for
penalising redundancy ( i. e., low diversity) in the ranking, they dene weighted
S-precision at recall r taking into account the cost of presenting a result to the
user as well as the cost of processing a subtopic in a result.
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] the authors propose an adaptation of common metrics taking into
account the user intent. They consider ambiguous queries to belong to dierent
categories (i. e., senses) and relevance to be rated dierently for dierent
categories. They take into account the popularity of each query’s category ( e. g.,
for the query Jaguar the car sense might be more prominent than the animal
sense) computing a distribution on the categories for a query.
        </p>
        <p>
          In the database query scenario, the evaluation is usually based on comparing
the approximation done by the system against the optimal result (see, e. g.,
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) which can be computed (but this computation is NP-hard).
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Conclusion</title>
      <p>
        In this paper, we surveyed recent advances in search result diversication. We
found that all approaches t well in the notation and structure of a general
diversication system as given in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Quite a number of diversity measures and
diversication objectives are already available. However, the reviewed notions
of diversity are still limited to content or category similarity, though a range
of more specic diversity types exists. Further, no new (dis)similarity measures
were developed, but rather existing metrics ( e. g., Sketching, KL-divergence)
were reused. Here we see potential for further advances.
      </p>
      <p>
        Moreover, it would be interesting to design ranking functions that directly
focus on diversity rather then to see diversication as a re-ranking step. Even if
Vee et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] show that no inverted list based system can produce a relevant
and diverse ranking of results, we still believe that the retrieval of diverse and
relevant results may benet from an integrated retrieval phase, as well as data
structures supporting result diversication.
      </p>
      <p>Finally, regarding the evaluation metrics, there have been adaptations of
widely used and well understood metrics such as NDCG. Standard benchmarks
created for other purposes or proprietary datasets are used, but no dataset for
diversity in search is available yet. We believe that dierent dataset for dierent
notions of diversity ( e. g., opinions, topics, or genre) should be constructed.
Acknowledgment. This work was supported by the European Seventh Framework
Programme FP7 (Grant 231126, Project LivingKnowledge).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Carbonell, J.,
          <string-name>
            <surname>Goldstein</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries</article-title>
          .
          <source>In: Proceedings of SIGIR '98</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1998</year>
          )
          <fpage>335336</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Broder</surname>
            ,
            <given-names>A.Z.</given-names>
          </string-name>
          :
          <article-title>A Taxonomy of Web Search</article-title>
          .
          <source>SIGIR Forum</source>
          <volume>36</volume>
          (
          <issue>2</issue>
          ) (
          <year>2002</year>
          )
          <fpage>310</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollapudi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halverson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ieong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Diversifying Search Results</article-title>
          .
          <source>In: Proceedings of WSDM '09</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
          <fpage>514</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cormack</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vechtomova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashkan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bttcher</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , MacKinnon, I.:
          <article-title>Novelty and Diversity in Information Retrieval Evaluation</article-title>
          .
          <source>In: Proceedings of SIGIR '08</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
          <fpage>659666</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>17</volume>
          (
          <issue>6</issue>
          ) (
          <year>2005</year>
          )
          <fpage>734749</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hauptmann</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          :
          <article-title>Practical Elimination of Near-Duplicates from Web Video Search</article-title>
          .
          <source>In: Proceedings of MULTIMEDIA '07, ACM 218227</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slaney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van</surname>
            <given-names>Zwol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Resolving Tag Ambiguity</article-title>
          .
          <source>In: Proceeding of MM '08</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
          <fpage>111120</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. van Leuken,
          <string-name>
            <given-names>R.H.</given-names>
            ,
            <surname>Pueyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.G.</given-names>
            ,
            <surname>Olivares</surname>
          </string-name>
          , X.,
          <string-name>
            <surname>van</surname>
            <given-names>Zwol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Visual Diversication of Image Search Results</article-title>
          .
          <source>In: Proceedings of WWW '09</source>
          . (
          <year>2009</year>
          )
          <fpage>341350</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flynn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Data Clustering: a Review</article-title>
          .
          <source>ACM Computing Surveys</source>
          <volume>31</volume>
          (
          <issue>3</issue>
          ) (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gollapudi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An Axiomatic Approach for Result Diversication</article-title>
          .
          <source>In: Proceedings of WWW '09</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
          <fpage>381390</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haritsa</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Providing Diversity in K-Nearest Neighbor Query Results</article-title>
          .
          <source>In: Proceedings of PAKDD '04. (May 2628</source>
          <year>2004</year>
          )
          <fpage>404413</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laerty</surname>
          </string-name>
          , J.:
          <article-title>Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval</article-title>
          .
          <source>In: Proceedings of SIGIR '03</source>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abouammoh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navarro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multiple Approaches to Analysing Query Diversity</article-title>
          .
          <source>In: Proceedings of SIGIR '09</source>
          , ACM
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maltese</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madalli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denecke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skoutas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marenzi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Foundations for the Representation of Diversity, Evolution, Opinion and Bias</article-title>
          .
          <source>Report D1</source>
          .1,
          <string-name>
            <given-names>Living</given-names>
            <surname>Knowledge European Project</surname>
          </string-name>
          (to appear in
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soderland</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weld</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          :
          <article-title>Open Information Extraction from the Web</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>51</volume>
          (
          <issue>12</issue>
          ) (
          <year>2008</year>
          )
          <fpage>6874</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Opinion Mining and Sentiment Analysis</article-title>
          .
          <source>In: Foundations and Trends in Information Retrieval</source>
          . Volume
          <volume>2</volume>
          . (
          <year>2008</year>
          )
          <fpage>1135</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          :
          <article-title>Identifying Ideological Perspectives in Text and Video</article-title>
          .
          <source>PhD thesis</source>
          ,
          <source>Language Tech. Inst.</source>
          , School of Comp. Sci., Carnegie Mellon University (
          <year>Oct 2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Biber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Multi-Dimensional Approach to Linguistic Analyses of Genre Variation: An Overview of Methodology and Findings</article-title>
          .
          <source>Computers and the Humanities</source>
          <volume>26</volume>
          (
          <year>1993</year>
          )
          <fpage>331345</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimmick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <source>The Conceptualization and Measurement of Diversity. Communication Research</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ) (
          <year>2003</year>
          )
          <fpage>6079</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Giunchiglia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Managing Diversity in Knowledge</article-title>
          . In Ali,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Dapoigny</surname>
          </string-name>
          , R., eds.
          <source>: IEA/AIE</source>
          <year>2006</year>
          , LNAI 4031, Springer-Verlag Berlin Heidelberg (
          <year>2006</year>
          )
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Vee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shanmugasundaram</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhat</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahia</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Ecient Computation of Diverse Query Results</article-title>
          .
          <source>In: Proceedings of ICDE '08. 228236</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Jrvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keklinen</surname>
          </string-name>
          , J.:
          <article-title>Cumulated Gain-Based Evaluation of IR Technique</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS) 20(4)</source>
          (
          <year>2002</year>
          )
          <fpage>422446</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>