<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Use of Query Similarity for Improving Presentation of News Verticals</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Annie</forename><surname>Louis</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Eric</forename><surname>Crestan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Pennsylvania</orgName>
								<address>
									<postCode>19104</postCode>
									<settlement>Philadelphia</settlement>
									<region>PA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Youssef</forename><surname>Billawala</surname></persName>
							<email>billawal@yahoo-inc.com</email>
						</author>
						<author>
							<persName><forename type="first">Rao</forename><surname>Shen</surname></persName>
							<email>raoshen@yahoo-inc.com</email>
						</author>
						<author>
							<persName><forename type="first">Fernando</forename><surname>Diaz</surname></persName>
							<email>fdiaz@yahoo-inc.com</email>
						</author>
						<author>
							<persName><forename type="first">Jean-Franc ¸ois</forename><surname>Crespo</surname></persName>
							<email>jfcrespo@yahoo-inc.com</email>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">Microsoft</orgName>
								<address>
									<postCode>81669</postCode>
									<settlement>Munich</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">Yahoo! Labs Sunnyvale</orgName>
								<address>
									<postCode>94089</postCode>
									<region>CA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">Yahoo! Labs Sunnyvale</orgName>
								<address>
									<postCode>94089</postCode>
									<region>CA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="institution">Yahoo! Labs Sunnyvale</orgName>
								<address>
									<postCode>94089</postCode>
									<region>CA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<orgName type="institution">Yahoo! Labs Sunnyvale</orgName>
								<address>
									<postCode>94089</postCode>
									<region>CA</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Use of Query Similarity for Improving Presentation of News Verticals</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C059325A646622E6CC2B353B5BA0D0CD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Users often issue web queries related to current news events. For such queries, it is useful to predict the news intent automatically and highlight the news documents on the search result page. An example query would be "election results" issued during the time of elections. These highlighted displays are called news verticals. Prior work has proposed several features for predicting whether a query has news intent. However, most approaches treat each query individually. So on a given day, very similar queries can be assigned opposite predictions. In our work, we explore how a system can utilize query similarity information to improve the quality of news verticals along two dimensions-prediction and presentation. We show via a study of actual search traffic that the accuracy of predicting queries into newsworthy and not newsworthy categories can be improved using query similarity. Further, we present a method to identify a canonical variant for a newsworthy query such that using the canonical query would retrieve better results from the news backend to show in the display. Use of the canonical query also has the advantage of creating a consistent presentation of results for query variants related to the same news event.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Consider the query "Michelle Obama visits Spain" issued to web search engines during the time of the First Lady's visit to Spain. If this query is issued on a news website, the results would contain all news articles about the event. On the other hand, a user may also issue such a query to a generic web search engine. If the search engine can predict the query as related to currently newsworthy events, latest news can be retrieved for the query and explicitly 9 * Work conducted at Yahoo! Labs.</p><p>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This article was presented at the workshop Very Large Data Search (VLDS) 2011. Copyright 2011. highlighted on the result page. Such news displays, also called news verticals now appear on the result pages of major search engines such as Google, Yahoo! and Bing. An example news vertical on the search result page of a generic search engine is shown in Figure <ref type="figure" target="#fig_0">1</ref>. Prior work has introduced several features for news intent prediction. These features characterize the bursty nature of a query during a particular time period <ref type="bibr">[8,</ref><ref type="bibr" target="#b16">16,</ref><ref type="bibr" target="#b9">9]</ref>. In this work, we show how to improve the presentation of news displays by making use of query similarity information.</p><p>We consider two properties related to news displays which can further benefit from information about similar queries.</p><p>Triggering accuracy: When two queries q and q are related to the same news event, they would have the same news intent. For example, the queries "Michelle Obama Spain visit" and "Spain vacation Michelle Obama" refer to the same event and both are newsworthy during that time. If a system can identify q and q as similar, it can make more accurate predictions. Besides accuracy, it is desirable that a consistent system would trigger a news display for both queries regardless of the actual lexical realization.</p><p>Retrieval of results for the display: Another use of query similarity is in the case when q and q are predicted to be newsworthy. The lexical form of q could retrieve very relevant content from the news backend while q might be a specific query with few matches. Knowledge that they are similar can help us choose from q, q and other similar queries, a variant which would retrieve the most relevant results for the event that they represent. This setup could also lead to uniformity in the results shown for different variants belonging to the same event. So query similarity can help in presentation quality and consistency.</p><p>The work on news intent prediction that is closest to ours is by Diaz and Arguello <ref type="bibr">[8,</ref><ref type="bibr" target="#b9">9]</ref> who seek to improve predictions by incorporating user feedback. In their method, user clicks on a query's news display are tracked and used as feedback not only for that query but also for similar ones. They compute similarity between queries based on the similarity in their search results. In this work, we show how to adjust the predictions themselves using similarity information. Their method also requires the results of the search which is expensive to compute. The selection of a canonical variant and the issue of presentation consistency has not been explored so far.</p><p>In this paper we augment a news intent predictor with query similarity information.</p><p>(a) We show that a simple lexical match method to get similar queries is highly accurate and useful for improving news intent predictions. We show that the improvement is pronounced even in a model which uses a rich set of features for the baseline predictions.</p><p>(b) We validate through a user study on actual search traffic, that the predictions get improved when we add query similarity information.</p><p>(c) We propose a method to identify a canonical variant for similar queries. We show that we can outperform the baseline approaches significantly for this task and achieve a better display of news results for different query variants.</p><p>Our study is insightful for both the quality of news display and the search user experience. Inconsistent triggering lowers accuracy. Recent work <ref type="bibr" target="#b4">[4,</ref><ref type="bibr" target="#b22">22]</ref> recognize that long and rare queries are also important components of user experience but do not always return (informative) results. So approaches have made use of query similarity to address the needs of tail queries. Our work continues along these lines for a different search task. From a user experience perspective, particularly with explicitly highlighted displays, it would be jarring if the choice to show a display and the presentation style were markedly different for closely related queries. It would be particularly difficult for users to find a certain article again using a different but equivalent query (referred to as the task of refinding <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b7">7]</ref>). We seek to add accuracy and quality in the displays with the aid of query similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">RELATED WORK</head><p>Prior work on news intent prediction have introduced a number of features <ref type="bibr">[8,</ref><ref type="bibr" target="#b16">16]</ref>. Three major resources are utilizedquery logs, news index and blogs. When a query appears with greater frequency in the documents belonging to a time period compared to recent past, it can be called newsworthy. Work by Diaz and Arguello <ref type="bibr">[8,</ref><ref type="bibr" target="#b9">9]</ref> extends beyond features and incorporates user feedback as well to adjust predictions. They obtain a prior news intent prediction based on features and then correct it as user feedback is obtained.</p><p>In most prior approaches, the vector of features for a query is not dependent on other queries issued the same day. Even when two queries are closely related, they could have varying frequencies in query logs and news documents. So their predictions could be quite different. We want to incorporate query similarity to create more accurate predictions. In Diaz and Arguello's work, they share user feedback from one query to those which are similar, so that similar queries can also take advantage of the feedback information. In this work, we seek to produce more accurate baseline predictions using query similarity. Our approach to do similarity-aware predictions is similar to techniques introduced to handle tail queries for advertisement display <ref type="bibr" target="#b5">[5,</ref><ref type="bibr" target="#b4">4,</ref><ref type="bibr" target="#b22">22]</ref> and query suggestions <ref type="bibr" target="#b1">[1,</ref><ref type="bibr" target="#b20">20,</ref><ref type="bibr" target="#b23">23,</ref><ref type="bibr" target="#b17">17,</ref><ref type="bibr" target="#b21">21]</ref>. There is also recent work which create specialized ranking models for a query by making use of similar queries in the training data <ref type="bibr" target="#b11">[11,</ref><ref type="bibr" target="#b18">18]</ref>.</p><p>Our approach to canonical query selection is reminscent of query modification methods to obtain better matches with documents or advertisements. Here, when queries do not match the bid phrases or documents, some substitution <ref type="bibr" target="#b15">[15]</ref>, deletion <ref type="bibr" target="#b14">[14]</ref>, and other modification <ref type="bibr" target="#b12">[12]</ref> is necessary. Our situation is different in that we desire a concise query which can be issued to get the most relevant results from the news index. So we need to judge the candidate queries based on both their match to original query as well as the news documents they retrieve from the news index. The idea is similar to work by Baeza-Yates et al. <ref type="bibr" target="#b1">[1]</ref> who ranked query suggestions not only by similarity but also considering the attractiveness or quality of the suggested query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">NEWS INTENT PREDICTOR</head><p>We consider a simple news predictor and then show how we add query similarity information to it. We predict newsworthy queries in an offline process. To identify newsworthy queries for day d, we use the query logs from day d − 1. The queries from immediately previous day are likely to be issued again. (In fact, one-third are repeated on average.) Each query from day d − 1 is associated with a binary prediction (newsworthy or not) and the newsworthy queries are are entered into a whitelist. On day d, if a query issued is in the whitelist, we trigger a news display. The list is refreshed daily. So the goal of the predictor is to identify queries that will continue to have news intent on the next day.</p><p>This situation is obviously incapable of handling new queries that were not seen the previous day. However, for our goal of evaluating the use of query similarity, this setup is simple and appropriate, also enabling us to deploy it for a user study. Although the system is simple, this offline setup enables us to use a rich set of features which would be prohibitively costly to compute at query issue time. So we are evaluating the power of query similarity over an already strong feature set. So we focus on this baseline model for our experiments, further approaches such as adding user feedback and handling new queries will improve the capabilities of the model we use.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data</head><p>The gold standard news intent judgements for a query was obtained empirically as was done in prior work. We logged query statistics in early 2008 from a commerical web search engine. A small portion of search users were shown news displays for every query they issued provided there was any matching content in the news backend. On average, 800,000 unique queries were issued per day by these users and 30,000 of them had matches in the news index. For these 30,000 queries, displays were shown without any attempt to verify their news intent. The display consisted of the titles of the three most relevant news documents. The click through rate (CTR) on the display for each query was logged and we consider these values as gold standard. A click on any of the three titles within the display is considered as a click on the display. The 30,000 queries per day collected for a two week period form our data set. We train a model to predict the CTR of a query and then apply a threshold to obtain a binary (newsworthy or not) prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Prediction model</head><p>As already outlined, our method predicts newsworthy queries for day d from the query traffic logged for day d − 1. So the target for training is the next day CTR of a query. Only those queries observed on two consecutive days can be used for training. We divide the data described above into training (around 100,000 unique queries) and test (60,000) sets.</p><p>Overall, we use around 70 features in our model. Most of them are inspired by prior work by König et al. <ref type="bibr" target="#b16">[16]</ref> and Diaz <ref type="bibr">[8]</ref> and use query logs and news index. Some of the distinguishing log-based features are: ratio between frequency of the query on this day and that during the previous week/days, number of times the query was issued on a news search engine, the output from a navigational intent predictor, the observed click rate on search results from news domain versus other webpages. The following features are among the most discriminative ones based on the news index: matches with titles and abstracts of news documents, the maximum score of the news documents retrieved, matches in different categories such as business, health and sports, the number of relevant documents in the backend compared to the number observed the previous week.</p><p>A Gradient Boosted Decision Trees (GBDT) <ref type="bibr" target="#b10">[10]</ref> training approach is used to predict the real-valued next day CTR. Then we apply a cutoff on the predicted value to obtain a binary decision (newsworthy or not). To decide on the cutoff, we consider the set of queries above different points on the predicted value. For each of these sets, we compute the average value of their gold standard next day CTR. The set that has an average value of 20% is taken and the corresponding threshold value is chosen as the cutoff. This choice guarantees that on average if this threshold is applied on the predicted value, the queries chosen as newsworthy will have an average next day CTR value of 20% (a reasonable CTR expectation for news events).</p><p>Next we explain how we add query similarity information to the predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">COMPUTING SIMILAR QUERIES</head><p>We use a simple metric that measures the match between bigrams in the two queries. Such a lexical measure is inadequate when we consider query variants such as "Jen-nifer Lopez babies" and "Marc Antony twins". So we also tried other methods which offer more flexibility for matching queries. These methods included the similarity in the URLs, titles and abstracts of the top results <ref type="bibr" target="#b2">[2,</ref><ref type="bibr" target="#b19">19,</ref><ref type="bibr" target="#b23">23,</ref><ref type="bibr">8]</ref>, tracking user sessions to find rephrased queries <ref type="bibr" target="#b23">[23,</ref><ref type="bibr" target="#b3">3]</ref> and a dictionary-based identification of similar named entities (Eg. Jennifer Lopez, Mariah Carey). We found that these looser notions (and their combinations) introduce several false positives for query similarity while lexically similar queries (with bigram match) had very high precision. For our task, particularly canonicalization, high precision is necessary and so we choose to use lexical match as our measure.</p><p>The words in the query are stemmed and each bigram of stems is a feature. We also include skip bigrams with gap 1, i.e. we form bigrams by ignoring atmost one intervening word. The feature value is the frequency of the bigram in the query multiplied by the inverse-document frequency (idf) computed on all the queries seen that day (recall that our processing is offline and all queries on day d − 1 are available). The vectors from two queries are compared using cosine similarity. A cutoff is applied on the similarity value and we consider those query pairs above the threshold as similar. A threshold value of 0.01 was picked by manually examining the query pairs from different threshold ranges. We also limit the number of similar queries to atmost 10. Some example matches are shown in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">PREDICTION RESCORING</head><p>Now, we add query similarity to the news intent predictions using a post-processing approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Algorithm</head><p>Let us call the predicted CTR for a query q as score(q). Next, we gather the most similar queries to q, kNN(q), using lexical similarity and applying the threshold we described earlier. Now, we want to modify score(q) such that if the majority of neighbours in kNN(q) are newsworthy, it is likely that q is also such or vice versa.</p><p>Some ways to modify score(q) are: assign the weighted average of score for queries in kN N (q), or assign the score of the most similar query from kNN(q), etc. During development testing, we found that the predicted scores from the baseline model were more accurate for high frequency queries. So setting score(q) to be the same as that of its variant in kNN(q) with maximum frequency (was issued most times) provided the best CTR adjustments which moved queries reliably across the boundary for newsworthy/ not newsworthy decision. (We evaluate this aspect using the true category of queries where the system proposed to make a change in category.) We also found that the queries predicted with high confidence in the newsworthy category were quite accurate and demoting scores led to a number of false negatives. So we only focus on adding queries to the whitelist.</p><p>Our rescoring method can be summarized as follows. Let T be the threshold on score(q) to mark newsworthy queries. Table <ref type="table">2</ref>: Human newsworthiness judgements modScore(q) = score(q), if |kNN(q)| = 0 or score(q) &gt; T = score( arg max q ∈kNN(q) frequency(q )), otherwise</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Added</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Evaluation</head><p>We employed both human judgements and system deployment for evaluating our approach. The human evaluation was done on a sample of our data. The deployment was done for a fraction of users of a commerical search engine and serves as a large scale study in the target setting. Both cases confirm the usefulness of our approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Human evaluation</head><p>We ran our rescoring method on one day of full web traffic in 2010 containing over 20 million unique queries. The default whitelist had 33,000 queries. When the scores were modified, the size increased to 57,000. We selected a random sample of 250 queries from those added to the whitelist and obtained annotations for their newsworthiness. To also confirm our choice to not remove queries from the whitelist, we also created a list of queries that would be demoted had we kept that option. A random sample of 250 queries from this list was also included for annotation.</p><p>The queries were presented on the next day to annotators. We did not use the same click data as for development because it is difficult for people to judge the newsworthiness of events from the past. We asked them to rate the news intent of queries on a 4-level scale a) very newsworthy, b) newsworthy, c) somewhat newsworthy d) not newsworthy. The division that the judges provided are shown in Table <ref type="table">2</ref>.</p><p>From the results, we see that 60% of queries added to the whitelist are clearly newsworthy. An additional 10% of the added queries are on the borderline. So a number of queries missed by the default predictor are added to the whitelist. We also see from the annotations for the removed queries, that removing queries from the whitelist seems not to be a good option. 30% of those removed are actually rated 'very newsworthy'. However, it should be remembered that these judgements are only on a small sample. We also had 50 queries simultaneously annotated by 3 judges. The pairwise annotator agreement (Kappa) is only 0.29 to 0.36. Collapsing the levels into two categories-'very newsworthy' and 'newsworthy' into one and the other two into a second category-increases the agreement to only 0.44 to 0.60 range. So it is rather difficult for human annotators to agree on the news level of events even for the most recent time frame. So we use this evaluation to validate our design and focus on system deployment to test if there are any significant improvements in the user experience. We studied two small fractions of search users. There were 37 million unique queries issued by each set of users on a given day. During a two week period, news verticals were shown for one set of users as per baseline predictions. For the other, the rescored predictions were used. It is important to note that both systems were employed simultaneously during the two weeks, so users would be querying for the same events in both tracks. The CTR and coverage (proportion of queries that were shown displays) for the two branches are shown in Table <ref type="table" target="#tab_2">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">User study</head><p>After rescoring, there was a 2.36% improvement in CTR compared to the baseline approach. At the same time, the system was also able to show displays for 3.13% more queries than the baseline. Approximately same number of users interacted with the baseline and new system during the two week period. The click rate of each user was used as a data point and a Wilcoxon test was used to compare these values between the two systems. Both CTR and coverage are significantly different with p-values lower than 0.0001.</p><p>These results show that after rescoring, we were able to show news displays for more queries at the same time improving the average user engagement on the displays.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">QUERY CANONICALIZATION</head><p>Making accurate predictions is only a first step towards display quality. The results retrieved for newsworthy queries should also be informative. Here again we can make use of information from similar queries. So we propose an approach to query canonicalization. For each newsworthy query, we identify a canonical variant which retrieves the most relevant results from the backend. This canonical query might be the original query itself or a semantically equivalent variant. In this way, quality results can be shown regardless of the query variant actually issued. Further this approach would ensure that similar queries have some uniformity and consistency in what results get highlighted for that news event.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Data and setup</head><p>This experiment is based on the data described in Section 3.1. We try to find a canonical variant for each query in the whitelist. Those queries not predicted as newsworthy will not trigger news displays and need not be considered. For each query in the whitelist, we obtain a set of candidates for canonicalization. We limit these candidates to other queries in the whitelist which are highly similar (using the same threshold on lexical similarity as we have done so far). The original query is always added as a candidate.</p><p>The task is to pick out the candidate that will have highest success as a canonical variant. The whitelist queries are compiled to be used on the next day. So we define the most successful query as the one with highest next day CTR. 1 However, we should avoid choosing very specific and rare queries as canonical. For example, a query such as "Obama Texas debate opposition response" is rather specific and could have a CTR value of 1, but might have been 9 1 Only queries overlapping between consecutive days were used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Original query</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lexical variants</head><p>Canonical query issued only once. We do not wish to canonicalize queries such as "Obama debate" and "Obama Clinton texas debate" to such a specific query. So we filter out candidates that were issued fewer than two times that day. (The original query is always retained as a candidate regardless of its frequency.) Then we also include the number of clicks with the CTR value while ranking the candidates: {next day CTR * log(next day clicks+1)}. The top variant in this ranking is considered canonical. Some examples from our data are shown in Table <ref type="table" target="#tab_3">4</ref>. We can see that the canonical variant chosen in this way is a crisp characterization of the event. Since we use lexical similarity, we can expect the candidate set to be of high precision. The candidates have at least one bigram match with the original query and so these matching keywords will still appear in the search results. We divide the data into a training set of 2876 examples and a test set of 1659. The breakdown of the number of candidates for different queries in our test set is below. (a) 2 candidates: 800 (b) 3 to 5 candidates: 636 (c) 6 and above: 223</p><p>Our definition of canonical queries holds for a period of one day only. For the whitelist queries, we identify and store the mappings of original-canonical queries. On the next day, when a query from this list is issued, the canonical variant can be used for retrieving news results in place of the original one. The list is refreshed at the end of the day. It is also worth noting that we wish to present canonical results for the news display only so that the relevant news updates are provided for all query variants. The regular search results would still provide more fine-grained distinctions (if any) as per the original query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Canonicalization methods</head><p>We test several baselines for this task: a) a random query from the candidate set, b) the original query itself (no canonicalization), c) the query with maximum frequency that day, d) the query with maximum score from the GBDT model (the model predicts the next day CTR and high CTR queries could be considered as more representative ones), e) maximum and f) minimum length queries.</p><p>We also propose a ranking approach to canonicalization. The target value {next day CTR * log(next day clicks + 1)} is used to train a ranker to predict the ordering of queries within a candidate set. For this purpose, we employ SVM-Rank <ref type="bibr" target="#b13">[13]</ref>, whose training approach seeks to minimize the number of pairwise wrong orderings over the training set. The regularization parameter was chosen using cross validation on the training set. The output of the ranker is a realvalued score for each candidate which can be used to rank them. The highest ranked query is marked as the canonical Content-based. These features are based on the relevance of the news documents that are present for a candidate query in the index. In addition, the readability of the titles from these news documents is also an important factor for quality of the news display. We retrieve the top 3 titles for the query from the news index and compute the following features: length of the titles, query is subsequence/substring of one of the titles (similarly for first title), whether there is a title string that is spelled fully in capital letters.</p><p>Informativeness of the query. Words used in queries could make some queries more informative compared to others. So we add some features based on query word frequency and word types. We compute the frequency of each query word and bigram over all the candidates in a set. Then for each query, we indicate whether it contains the top, second or third most frequent unigrams and bigrams of that cluster. Numbers or year also get mentioned in some queries, for example, "grammy awards 2008". To check the usefulness of such information, we add the features: query contains number/year, query's prefix/suffix is a number/year. We also indicate the presence of prepositions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Evaluation</head><p>The accuracies for predicting the canonical query is shown in Table <ref type="table" target="#tab_4">5</ref> for different baselines and the ranking approach. We also provide a breakdown of the results depending on the size of the candidate sets (columns 3-5): choosing the canonical variant from a larger set of options is bound to be a harder task.</p><p>Among the baselines, using the maximum frequency query produces the best results. It has an accuracy of 53% on all test examples, 10% higher than random choice. The maxi-mum frequency heuristic is also consistently better performing than the other baselines for larger sets. When a set contains only two candidates (original query and one other variant), random choice, the original query or using the top offline score query are equally efficient, giving 60% accuracy. The accuracy of these baselines degrades much more as the candidate set size increases.</p><p>The ranking approach outperforms the baselines for candidate sets of all sizes. There is a 3% improvement when all the test examples are considered. In closer look, one can see that the benefit of the ranking approach is more pronouned when there are more candidates. It gives a 10% improvement over the maximum frequency baseline for sets of size above 5 queries. In fact, the very popular queries are likely to have several variants and therefore a large candidate set. So the ranking approach would be the preferred setup for canonicalization in this case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">CONCLUSION</head><p>In this work, we have shown that our system which used query similarity had better user engagement and also increased the recall of newsworthy queries. There are several issues that we can address in future work. One is purging non-newsworthy queries from the whitelist. Since similarityonly rescoring was not accurate to do this, we have not addressed this aspect. Perhaps, a combination of query similarity to add queries and user feedback to quickly learn how to demote queries would be a good approach. We will explore these ideas in future.</p><p>In our approach, we have not only considered the prediction accuracy but also the presentation of the display. We have introduced the notion of a canonical query for news intent. As a simple baseline for canonical query, maximum frequency query is a good choice and the selection is further improved using query match and content features. But we have only worked with CTR data in this paper. We plan to conduct a user study to strengthen the result and help us to understand if canonical results are preferred by users.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A news display above search results</figDesc><graphic coords="1,350.17,254.88,172.38,144.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>2008 long beach grand prix long beach grand prix (0.57), f1 grand prix (0.03), long beach (0.02) 2008 nfl draft order 2008 nfl draft (0.30), 2008 draft (0.19), nfl draft (0.12), nfl draft 2008 (0.06) bad beef recall beef recall (0.15), recalled beef (0.02), massive beef recall (0.01), california beef recall (0.01) carly smithson controversy carly smithson (0.10), carly smithson american idol (0.01), carly hennessy smithson (0.01) Lexically similar variants for a query (similarity value within parentheses)</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Results from user study</figDesc><table><row><cell cols="3">Measure baseline rescored</cell></row><row><cell>CTR</cell><cell>10.43</cell><cell>10.67</cell></row><row><cell>Coverage</cell><cell>2.32</cell><cell>2.39</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Example original and canonical queries</figDesc><table><row><cell>1. jennifer lopez marc anthony</cell><cell>marc anthony, jennifer lopez, jennifer lopez baby, jennifer lopez babies, jennifer lopez twins</cell><cell>jennifer lopez twins</cell></row><row><cell>2. presidential candidates</cell><cell cols="2">republican presidential candidates, 2008 presidential candidates 2008 presidential candidates</cell></row><row><cell>3. shooting down satellite</cell><cell>satellite shoot down, US shoots down satellite, spy satellite shot down</cell><cell>satellite shoot down</cell></row><row><cell>4. nfl draft combine</cell><cell>nfl draft combine, nfl combine, nfl draft, 2008 nfl draft,nfl draft 2008, nfl scouting combine, nfl mock draft</cell><cell>nfl combine</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Accuracies for canoncial query selection from different candidate set sizes one. Our ranking approach combines a number of features with the baseline metrics.Simple metrics. Frequency of the query, offline model score, length of the candidate Relationship to original query. Is the original query?, is substring/superstring of original query?, is subsequence/ supersequence of original query?</figDesc><table><row><cell></cell><cell cols="4">All 2 cand. 3 to 5 above 5</cell></row><row><cell>Random</cell><cell>0.43</cell><cell>0.59</cell><cell>0.30</cell><cell>0.10</cell></row><row><cell>Self</cell><cell>0.43</cell><cell>0.59</cell><cell>0.32</cell><cell>0.17</cell></row><row><cell>Max. frequency</cell><cell>0.53</cell><cell>0.58</cell><cell>0.53</cell><cell>0.34</cell></row><row><cell>Max. offline score</cell><cell>0.44</cell><cell>0.59</cell><cell>0.33</cell><cell>0.19</cell></row><row><cell>Max. length</cell><cell>0.34</cell><cell>0.49</cell><cell>0.23</cell><cell>0.13</cell></row><row><cell>Min. length</cell><cell>0.42</cell><cell>0.56</cell><cell>0.32</cell><cell>0.16</cell></row><row><cell cols="2">Ranking approach 0.56</cell><cell>0.60</cell><cell>0.55</cell><cell>0.44</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Query recommendation using query logs in search engines</title>
		<author>
			<persName><forename type="first">R</forename><surname>Baeza-Yates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hurtado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mendoza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Current Trends in Database Technology -EDBT</title>
				<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Agglomerative clustering of a search engine query log</title>
		<author>
			<persName><forename type="first">D</forename><surname>Beeferman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Berger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of KDD</title>
				<meeting>of KDD</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="407" to="416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Query similarity by projecting the query-flow graph</title>
		<author>
			<persName><forename type="first">I</forename><surname>Bordino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Castillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Donato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gionis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="515" to="522" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Online expansion of rare queries for sponsored search</title>
		<author>
			<persName><forename type="first">A</forename><surname>Broder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ciccolo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Josifovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Metzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yuan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="511" to="520" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Robust classification of rare queries using web knowledge</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Z</forename><surname>Broder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fontoura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Josifovski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="231" to="238" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Information behaviour that keeps found things found</title>
		<author>
			<persName><forename type="first">H</forename><surname>Bruce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dumais</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Research</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="10" to="11" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Using web search engines to find and refind information</title>
		<author>
			<persName><forename type="first">R</forename><surname>Capra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pérez-Quiñones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page" from="36" to="42" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Integration of news content into web results</title>
		<author>
			<persName><forename type="first">F</forename><surname>Diaz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WSDM</title>
				<meeting>of WSDM</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="182" to="191" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Adaptation of offline vertical selection predictions in the presence of user feedback</title>
		<author>
			<persName><forename type="first">F</forename><surname>Diaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arguello</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="323" to="330" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Greedy function approximation: a gradient boosting machine</title>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of Statistics</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1189" to="1232" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Query dependent ranking using k-nearest neighbor</title>
		<author>
			<persName><forename type="first">X</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Arnold</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-Y</forename><surname>Shum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="115" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A unified and discriminative model for query refinement</title>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="379" to="386" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Training linear svms in linear time</title>
		<author>
			<persName><forename type="first">T</forename><surname>Joachims</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of KDD</title>
				<meeting>of KDD</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="217" to="226" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Query word deletion prediction</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Fain</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="435" to="436" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Generating query substitutions</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Madani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Greiner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="387" to="396" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Click-through prediction for news queries</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>König</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of SIGIR</title>
				<meeting>of SIGIR</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="347" to="354" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Query suggestion using hitting time</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Mei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Church</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of CIKM</title>
				<meeting>of CIKM</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="469" to="478" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning to select a ranking function</title>
		<author>
			<persName><forename type="first">J</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Macdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ounis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ECIR</title>
				<meeting>of ECIR</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="114" to="126" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A web-based kernel function for measuring the similarity of short text snippets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sahami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Heilman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="377" to="386" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">A web-based kernel function for measuring the similarity of short text snippets</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sahami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Heilman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="377" to="386" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Optimal rare query suggestion with implicit user feedback</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="901" to="910" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Improving recommendation for long-tail queries via templates</title>
		<author>
			<persName><forename type="first">I</forename><surname>Szpektor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gionis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Maarek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="47" to="56" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Mining search engine query logs for query recommendation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Nasraoui</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of WWW</title>
				<meeting>of WWW</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="1039" to="1040" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
