<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Current Approaches to Search Result Diversication</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Enrico</forename><surname>Minack</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<orgName type="institution">Leibniz Universität Hannover</orgName>
								<address>
									<postCode>30167</postCode>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gianluca</forename><surname>Demartini</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<orgName type="institution">Leibniz Universität Hannover</orgName>
								<address>
									<postCode>30167</postCode>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Wolfgang</forename><surname>Nejdl</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">L3S Research Center</orgName>
								<orgName type="institution">Leibniz Universität Hannover</orgName>
								<address>
									<postCode>30167</postCode>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Current Approaches to Search Result Diversication</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E4A3B432F04F5F1797A11D2B0882F4E2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T16:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With the growth of the Web and the variety of search engine users, Web search eectiveness and user satisfaction can be improved by diversication. This paper surveys recent approaches to search result diversication in both full-text and structured content search. We identify commonalities in the proposed methods describing an overall framework for result diversication. We discuss dierent diversity dimensions and measures as well as possible ways of considering the relevance / diversity trade-o. We also summarise existing eorts evaluating diversity in search. Moreover, for each of these steps, we point out aspects which are missing in current approaches as possible directions for future work.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the last years, the Web has become the largest and most consulted public source of information, and Web search emerged as the primary technique for nding relevant information on the Web. Search engines usually provide a long list of results that contains thousands of entries, where the most relevant results tend to be quite similar <ref type="bibr" target="#b0">[1]</ref>. In particular for informational queries <ref type="bibr" target="#b1">[2]</ref>, users reading through a list of relevant but redundant pages quickly stop as they do not expect to learn more. The phenomenon of saturated user satisfaction is a well-understood and extensively studied eld in economics called law of diminishing marginal returns <ref type="bibr" target="#b2">[3]</ref>.</p><p>The amount of data on the Web is growing exponentially, and so does the amount of relevant results for a query. Given that most search engine users only look at the rst page of available results, to improve user satisfaction, this search result list should be optimised to contain both relevant and diverse results <ref type="bibr" target="#b3">[4]</ref>, fairly representing the thousands of relevant results. This task is also known as search result diversication.</p><p>For an ambiguous query like Jaguar, a search result list should contain results about the car, the animal, the operating system and other senses. In case of an unambiguous query like nuclear power plant, the list should be diverse in the contained information: objective and opinionated sites, supportive and opposing thoughts, related topics and subtopics. It is easy to see how this can be a computational expensive process that is dicult to run at query time.</p><p>The goal of this paper is to survey recent approaches in this area, identifying commonalities and dierences between these works. We also present possible open questions not yet addressed by state-of-the-art techniques. Here, we focus on the eld of search result diversication, however, we want to point to other elds where similar problems have been addressed and solutions might be adaptable. For example, recommender systems provide a list of items which are interesting (i. e., relevant) and novel (i. e., diverse from the ones the user already knows) <ref type="bibr" target="#b4">[5]</ref>. Another example is image or video search where near-duplicate results are removed <ref type="bibr" target="#b5">[6]</ref>, or multiple senses of ambiguous queries are covered <ref type="bibr" target="#b6">[7]</ref>.</p><p>Dynamic clustering algorithms on image features are used in <ref type="bibr" target="#b7">[8]</ref> to provide visually diverse result sets. In general, clustering algorithms may provide adaptable (dis)similarity measures that are used to create sets of items with high intra-set and low inter-set similarity <ref type="bibr" target="#b8">[9]</ref>.</p><p>In this paper, we compare current work in search result diversication. To the best of our knowledge, there is no such recent comparison. First, we identify common aspects and dierent notions of diversity in all proposed approaches.</p><p>We show how the trade-o between relevance and diversity is solved, which is an NP-hard optimisation problem. As last step, search eectiveness is evaluated not only in terms of relevance but also of diversity. Finally, we point out open problems and areas which can be improved.</p><p>The rest of the paper is structured as follows. In Section 2, we dene the problem of search result diversication. Section 3 presents dimensions and types of diversity, and how approaches measure them. Further in Section 4, we show the strategies and algorithms of balancing between relevance and diversity, eciently. The evaluations of the eectiveness of current approaches are described in Section 5. We conclude by discussing open research questions in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Search Result Diversication: Problem Denition</head><p>Search result diversication is an optimisation problem aiming to nd k items which are the subset of all relevant results that contains both most relevant and most diverse results. Usually, increasing the diversity in the subset leads to a decrease in relevance; therefore, the optimal trade-o between relevance and diversity needs to be found. Looking at previous work on search result diversication, it is possible to notice that, in order to achieve the optimisation goal, three components are usually adopted. Here, we follow the notion and structure of a general result diversication approach presented in <ref type="bibr" target="#b9">[10]</ref>:</p><p>Relevance Measure: It provides a relevance score for each results which creates an initial ranking of the items.</p><p>Diversity Measure: This measure reects the dissimilarity between two given items, or the overall dissimilarity of a set of results.</p><p>Diversication Objective: The objective denes the way both measures are merged into a single score that has to be maximised.</p><p>The rst step of result diversication is to rank the items by a relevance score as a normal retrieval task. In Information Retrieval (IR), several models and relevance measures have been developed. In result diversifying systems, such standard techniques have been used to rank items by their relevance. For example, <ref type="bibr" target="#b10">[11]</ref> uses a vector space model to represent items and queries, while <ref type="bibr" target="#b11">[12]</ref> exploits language models and KL-divergence as relevance functions.</p><p>The second and actually diversifying component is the measure of diversity.</p><p>Such a measure provides means to represent the dissimilarity of two results or the dissimilarity within a whole set of results with a single value. Dierent types of diversity and proposed diversity measures will be described in Section 3.</p><p>The third component, the diversication objective, formalises the strategy to nd a trade-o between the two measures in order to diversifying a result set. This optimisation is known to be NP-hard <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b9">10]</ref>, so there is a need to develop ecient algorithms. In Section 4, we will see what diversication objectives and algorithms current approaches employ to eciently diversify search results.</p><p>Finally, the quality of the result set has to be evaluated using standardised metrics, repeatable experiments and publicly available datasets. In Section 5, we give detailed information about the evaluation eorts of the reviewed works.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Notions of Diversity</head><p>We rst introduce to some properties of diversity and take a look at the various kinds of diversity known to exist in information sources. We then review notions of diversity considered in recent work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dimensions of Diversity</head><p>Considering Web search, two levels of diversity can be found <ref type="bibr" target="#b12">[13]</ref>: (1) query terms may be ambiguous, which is word sense diversity, and (2) for a specic word sense, the available information sources may be diverse. Dierent causes of diversity in such information sources are known to be, e. g., educational, cultural, spatio-temporal <ref type="bibr" target="#b13">[14]</ref>, or simply the goal of communication. These become manifest in an orthogonal dimension, the type of diversity: e. g., conicting information <ref type="bibr" target="#b14">[15]</ref>, opposing opinions and sentiment <ref type="bibr" target="#b15">[16]</ref>, ideological perspectives <ref type="bibr" target="#b16">[17]</ref>, or text genre <ref type="bibr" target="#b17">[18]</ref>. Further, as the usage of the term diversity is itself diverse, diversity is studied from dierent perspectives in elds like ecology, geography, psychology, linguistics, sociology, economics, and communication <ref type="bibr" target="#b18">[19]</ref>.</p><p>This diversity in information sources should not be ignored or avoided. Instead, it should be seen as a rich feature that, handled explicitly and being exploited, could lead to better ways to deal with diverse information sources <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Measures of Diversity</head><p>We saw that there are many dimensions of diversity that can be considered for diversication. We will now investigate which notions of diversity current approaches consider and how they are measured. Note that the term similarity can be used interchangeably to denote the same concept as of dissimilarity:</p><formula xml:id="formula_0">dissimilarity = 1 − similarity, where similarity ∈ [0, 1].</formula><p>Semantic Distance. Gollapudi et al. <ref type="bibr" target="#b9">[10]</ref> reuse the known min-hashing scheme sketching algorithm, which produces sketches similar to random term samples using a number of dierent hashing functions. They use the Jaccard similarity between those sketches as the dissimilarity measure, i. e., one minus the fraction of the cardinality of the intersection and the union of the two sketches. This dissimilarity measure diversies based on content dissimilarity. Categorical Distance. Additionally, <ref type="bibr" target="#b9">[10]</ref> presents a categorical distance where dissimilarity is based on the distance between the category of the results within a taxonomy. As a distance measure, the weighted tree distance measure is used. In case of multiple categories being assigned, the shortest distance from each category of one result to the categories of the other result is added up after weighting with the minimal probability that any of the respective two categories is assigned. This measure emphasises word senses diversication.</p><p>Agrawal et al. <ref type="bibr" target="#b2">[3]</ref> also use categories, derived from query click logs. However, they abstain from using an inter-result dissimilarity measure. They directly use the information about the categories in their diversication objective.</p><p>Vee et al. <ref type="bibr" target="#b20">[21]</ref> introduce a diversity order for relational databases being an order among attributes (e. g., for cars: M ake ≺ M odel ≺ Colour ≺ . . .). This order expresses that certain attributes have higher priority to be diversied than others (e. g., rst M ake is diversied, then M odel). They show how result tuples can be seen as paths in a tree of values, where the paths satisfy the diversity order. Tuples that have a longer path from the root in common are more similar than others. Therefore, this measure is similar to a tree distance measure.</p><p>Novel Information. In <ref type="bibr" target="#b11">[12]</ref>, unigram language models are used to represent results. The authors dene functions that quantify novel information a new result conveys additionally to an (the) existing result(s) using the KL-divergence. This measure diversies in a general sense regarding content dissimilarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion.</head><p>The diversity measure used by a system denes the kind of diversity the system can handle. However, none of the presented works focus on their diversity measure. The measures are mentioned very briey without motivation.</p><p>Looking at these diversity measures, two groups can be observed. One group measures dissimilarity based on content similarity, whereas the other group uses metadata about the content (e. g., the categories), which are not extracted from the content but taken from additional information sources (e. g., user click logs). Still, no measure exploits intrinsic properties of the results, e. g., the genre (blog post, a news article, a manual) or the sentiment regarding the query topic.</p><p>Therefore, these kinds of diversity are not yet exploited explicitly for search result diversication. <ref type="bibr" target="#b3">4</ref> The Relevance / Diversity Optimisation Problem</p><p>The relevance and diversity of a search result set can be maximised using various strategies. The main challenge for all these strategies is to select those results that add more diversity to the set, probably at the cost of relevance. Finding a good compromise is the primary goal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Diversication Objectives</head><p>Gollapudi et al. <ref type="bibr" target="#b9">[10]</ref> combine the relevance measure and the dissimilarity in three dierent ways: max-sum, max-min, and an average dissimilarity like measure.</p><p>These set selection functions are to be maximised.</p><p>Max-sum Diversication. The rst objective in <ref type="bibr" target="#b9">[10]</ref> combines the sums of the relevance and diversity measure as a weighted sum.</p><p>Max-min Diversication. The second objective targets at maximising the sum of the minimum relevance and minimum dissimilarity within the set.</p><p>Average Dissimilarity Diversication. Their third objective adds the original relevance for a result with the average dissimilarity regarding all other results in the set. The sum over the whole set is to be maximised.</p><p>Max-sum of max-score Diversication. Similarly to max-sum diversication, <ref type="bibr" target="#b20">[21]</ref> maximises the sum of dissimilarity of the result set, but it only produces sets that have the maximal relevance sum. Therefore, it does not nd sets with higher diversity scores but slightly lower relevance sum.</p><p>Max-product Diversication. Based on the already chosen results, Zhai et al. <ref type="bibr" target="#b11">[12]</ref> select the next result by maximising the parameterised product of the relevance of the next result and its dissimilarity to the chosen results.</p><p>Categorical Diversication. Agrawal et al. <ref type="bibr" target="#b2">[3]</ref> use a relevance measure that considers the categories of a document and query. The result set is diversied so that its results cover all categories, weighted by their probability to occur.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Diversication Algorithms</head><p>The problem of search result diversication is NP-hard <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b9">10]</ref>. Therefore, approximation algorithms have to exploit inherent structural properties of the solution space to achieve adequate system response times. IR systems based on inverted lists are proven to be unable to directly provide diverse results <ref type="bibr" target="#b20">[21]</ref>. In the following, we present algorithms used to eciently nd top-k diverse search results.</p><p>Gollapudi et al. <ref type="bibr" target="#b9">[10]</ref> show that their max-sum and max-min diversication objectives can be casted to a facility dispersion problem for which approximation algorithms exist. Agrawal et al. <ref type="bibr" target="#b2">[3]</ref> use a Greedy algorithm that starts with an empty list of results and select the next result with the highest marginal utility until k results are selected. The marginal utility measures the probability that the result satises a category the current result set does not yet satisfy. Similarly, Zhai et al. <ref type="bibr" target="#b11">[12]</ref> uses the same Greedy algorithm, but with their function that represents the novel information being introduced by the next document. Vee et al. <ref type="bibr" target="#b20">[21]</ref> cluster results into buckets based on their diversity order and selects results from those buckets in order to retrieve balanced diverse results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion.</head><p>Apparently, most approaches nd a solution for the diversication problem using Greedy approximation algorithms. All optimisation algorithms work online on the relevant results provided by the retrieval phase. Therefore, the presented works do not investigate the applicability of oine pre-computation or special data structures that could improve online performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluating Diversity in Search</head><p>This section presents methods for evaluating diversity-aware search techniques.</p><p>We describe datasets used and evaluation metrics designed for this purpose.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Datasets for Diversity-aware Search</head><p>In previous works, dierent types of datasets have been used. Gollapudi et al. <ref type="bibr" target="#b9">[10]</ref> use Wikipedia disambiguation pages as ground truth for the word senses. They also use a structured dataset in the context of product disambiguation evaluating the goodness of a measure based on a product taxonomy. In <ref type="bibr" target="#b2">[3]</ref>, the authors use 10,000 queries and top 50 retrieved results from a commercial search engine, judgements obtained with the Amazon Mechanical Turk 1 , and the Open Direc- tory Project (ODP) 2 taxonomy to classify results. Zhai et al. <ref type="bibr" target="#b11">[12]</ref> use topics from</p><p>the Text REtrieval Conference (TREC) Interactive Track where assessors identify a list of subtopics for each topics and mark the relevance of retrieved results with respect to each subtopic. Vee et al. <ref type="bibr" target="#b20">[21]</ref> have based their experiments on a structured dataset using Yahoo! Autos. They perform experiments generating keyword and structured queries measuring response times for dierent cases.</p><p>Real and synthetic structured data are used in <ref type="bibr" target="#b10">[11]</ref>. They create feature vectors they want to retrieve back as a set of diverse results.</p><p>As we have seen, previous work use dierent and non-standard datasets. In order to create a benchmark for diversity in search, in the Web Track at TREC 2009 the new Diversity Task started. We notice that the notion of diversity used is rather a topical diversity. This leaves open the aspect of evaluating other dimensions as, e. g., diversity of opinions (see Section 3.1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion.</head><p>As we can see, in most cases two main types of datasets have been used: classical textual documents to be ranked (i. e., TREC-like tasks) and structured datasets (i. e., for Database-like search task). In both cases, the goal is to provide the user with a smaller set of relevant and diverse results. While we have also seen that standard benchmarks are being created, there is still need for creating benchmarks for specic diversication tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Diversity-aware Evaluation Measures</head><p>In order to evaluate the eectiveness of proposed diversity-aware search approaches, new metrics need to be designed. In most cases, adaptation from already existing metrics have been done.</p><p>In <ref type="bibr" target="#b3">[4]</ref>, an evaluation framework for novelty and diversity is proposed. They see information needs and results as sets of information nuggets, and relevance is dened as a function of the nuggets contained in the user's need and previous results. Moreover, as graded relevance seems a reasonable assumption for 1 Amazon Mechanical Turk: http://www.mturk.com/ 2 ODP Open Directory Project: http://www.dmoz.org/ such task, they propose α-NDCG: an adaptation of the well-known NDCG metric proposed in <ref type="bibr" target="#b21">[22]</ref>. They experiment on past TREC collections showing the feasibility of the proposed approach.</p><p>In <ref type="bibr" target="#b11">[12]</ref> S-Recall at k is dened as the percentage of subtopics covered by one of the rst k results. Values of S-Recall at k cannot be directly compared among topics having a dierent number of subtopics, that is, this metric does not account the diculty of a certain topic. For this reason they dene, S-Precision at recall r which is the ratio between the minimal rank at which the system has Srecall r and such minimal rank obtained by an optimal system. Additionally, for penalising redundancy (i. e., low diversity) in the ranking, they dene weighted S-precision at recall r taking into account the cost of presenting a result to the user as well as the cost of processing a subtopic in a result.</p><p>In <ref type="bibr" target="#b2">[3]</ref> the authors propose an adaptation of common metrics taking into account the user intent. They consider ambiguous queries to belong to dierent categories (i. e., senses) and relevance to be rated dierently for dierent categories. They take into account the popularity of each query's category (e. g., for the query Jaguar the car sense might be more prominent than the animal sense) computing a distribution on the categories for a query.</p><p>In the database query scenario, the evaluation is usually based on comparing the approximation done by the system against the optimal result (see, e. g., <ref type="bibr" target="#b10">[11]</ref>) which can be computed (but this computation is NP-hard).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion and Conclusion</head><p>In this paper, we surveyed recent advances in search result diversication. We found that all approaches t well in the notation and structure of a general diversication system as given in <ref type="bibr" target="#b9">[10]</ref>. Quite a number of diversity measures and diversication objectives are already available. However, the reviewed notions of diversity are still limited to content or category similarity, though a range of more specic diversity types exists. Further, no new (dis)similarity measures were developed, but rather existing metrics (e. g., Sketching, KL-divergence) were reused. Here we see potential for further advances.</p><p>Moreover, it would be interesting to design ranking functions that directly focus on diversity rather then to see diversication as a re-ranking step. Even if Vee et al. <ref type="bibr" target="#b20">[21]</ref> show that no inverted list based system can produce a relevant and diverse ranking of results, we still believe that the retrieval of diverse and relevant results may benet from an integrated retrieval phase, as well as data structures supporting result diversication.</p><p>Finally, regarding the evaluation metrics, there have been adaptations of widely used and well understood metrics such as NDCG. Standard benchmarks created for other purposes or proprietary datasets are used, but no dataset for diversity in search is available yet. We believe that dierent dataset for dierent notions of diversity (e. g., opinions, topics, or genre) should be constructed.</p></div>		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment. This work was supported by the European Seventh Framework</head><p>Programme FP7 (Grant 231126, Project LivingKnowledge).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries</title>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Goldstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR &apos;98</title>
				<meeting>SIGIR &apos;98</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page">335336</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A Taxonomy of Web Search</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Z</forename><surname>Broder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGIR Forum</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">310</biblScope>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Diversifying Search Results</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gollapudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halverson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ieong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of WSDM &apos;09</title>
				<meeting>WSDM &apos;09</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">514</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Novelty and Diversity in Information Retrieval Evaluation</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Clarke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kolla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">V</forename><surname>Cormack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vechtomova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ashkan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Büttcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Mackinnon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR &apos;08</title>
				<meeting>SIGIR &apos;08</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page">659666</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions</title>
		<author>
			<persName><forename type="first">G</forename><surname>Adomavicius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tuzhilin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">734749</biblScope>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Practical Elimination of Near-Duplicates from Web Video Search</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Hauptmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Ngo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of MULTIMEDIA &apos;07</title>
				<meeting>MULTIMEDIA &apos;07</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<biblScope unit="page">218227</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Resolving Tag Ambiguity</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Slaney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van Zwol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceeding of MM &apos;08</title>
				<meeting>eeding of MM &apos;08</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page">111120</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Visual Diversication of Image Search Results</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">H</forename><surname>Van Leuken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">G</forename><surname>Pueyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Olivares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van Zwol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of WWW &apos;09</title>
				<meeting>WWW &apos;09</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">341350</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Data Clustering: a Review</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Murty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Flynn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">An Axiomatic Approach for Result Diversication</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gollapudi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sharma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of WWW &apos;09</title>
				<meeting>WWW &apos;09</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">381390</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Providing Diversity in K-Nearest Neighbor Query Results</title>
		<author>
			<persName><forename type="first">A</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sarda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Haritsa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of PAKDD &apos;04</title>
				<meeting>PAKDD &apos;04</meeting>
		<imprint>
			<date type="published" when="2004">May 2628 2004</date>
			<biblScope unit="page">404413</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">X</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Laerty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR &apos;03</title>
				<meeting>SIGIR &apos;03</meeting>
		<imprint>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Multiple Approaches to Analysing Query Diversity</title>
		<author>
			<persName><forename type="first">P</forename><surname>Clough</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sanderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Abouammoh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Navarro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paramita</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR &apos;09</title>
				<meeting>SIGIR &apos;09</meeting>
		<imprint>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Giunchiglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Maltese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Madalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baldry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wallner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Denecke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Skoutas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Marenzi</surname></persName>
		</author>
		<title level="m">Foundations for the Representation of Diversity, Evolution, Opinion and Bias. Report D1.1, Living Knowledge European Project</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
	<note>to appear in</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Open Information Extraction from the Web</title>
		<author>
			<persName><forename type="first">O</forename><surname>Etzioni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Banko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soderland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page">6874</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Opinion Mining and Sentiment Analysis</title>
		<author>
			<persName><forename type="first">B</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">1135</biblScope>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Identifying Ideological Perspectives in Text and Video</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Lin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008-10">Oct 2008</date>
		</imprint>
		<respStmt>
			<orgName>Language Tech. Inst., School of Comp. Sci., Carnegie Mellon University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The Multi-Dimensional Approach to Linguistic Analyses of Genre Variation: An Overview of Methodology and Findings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Biber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers and the Humanities</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page">331345</biblScope>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">The Conceptualization and Measurement of Diversity</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Mcdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dimmick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communication Research</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">6079</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Managing Diversity in Knowledge</title>
		<author>
			<persName><forename type="first">F</forename><surname>Giunchiglia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEA/AIE 2006</title>
		<title level="s">LNAI</title>
		<editor>
			<persName><forename type="first">M</forename><surname>Ali</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Dapoigny</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">4031</biblScope>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Ecient Computation of Diverse Query Results</title>
		<author>
			<persName><forename type="first">E</forename><surname>Vee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shanmugasundaram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Yahia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICDE &apos;08</title>
				<meeting>ICDE &apos;08</meeting>
		<imprint>
			<biblScope unit="page">228236</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Cumulated Gain-Based Evaluation of IR Technique</title>
		<author>
			<persName><forename type="first">K</forename><surname>Järvelin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kekäläinen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">422446</biblScope>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
