INTRODUCTION

Using Intent Information to Model User Behavior in Diversified Search (Abstract)

Aleksandr Chuklin

chuklin@yandex-team.ru 0 1 2

Pavel Serdyukov

1 2 0 ISLA, University of Amsterdam , The Netherlands 1 Maarten de Rijke 2 Yandex , Moscow , Russia

A result page of a modern commercial search engine often contains documents of different types targeted to satisfy different user intents (news, blogs, multimedia). When evaluating system performance and making design decisions we need to better understand user behavior on such result pages. To address this problem various click models have previously been proposed. In this paper we focus on result pages containing fresh results and propose a way to model user intent distribution and bias due to different document presentation types. To the best of our knowledge this is the first work that successfully uses intent and layout information to improve existing click models.

Click models Diversity User Behavior

INTRODUCTION

The idea of search result diversification appeared several years ago in the work by Radlinski and Dumais [ 8 ]. Since then all major commercial search engines addressed the problem of ambiguous queries either by the technique called federated / vertical search (see, e.g., [ 2 ]) or by making result diversification a part of the ranking process [ 1, 9 ]. In this work we focus on one particular vertical: fresh results, i.e., recently published webpages (news, blogs, etc.). Fig. 1 shows part of a search engine result page (SERP) in which fresh results are mixed with ordinary results in response to the query “Chinese islands”. We say that every document has a presentation type, in our example “fresh” (the first two documents in the figure) or “web” (the third, ordinary search result item). We will further refer to the list of presentation types for the current result page as a layout. We assume that each query has a number of categories or intents associated with it. In our case these will be “fresh” and “web”.

The full version of this paper appears in ECIR 2013 [ 4 ].

The main problem that we address in this paper is the problem of modeling user behavior in the presence of vertical results. In order to better understand user behavior in a multi-intent environment we propose to exploit intent and layout information in a click model so as to improve its performance. Unlike previous click models our proposed model uses additional information that is already available to search engines. We assume that the system already knows the probability distribution of intents / categories corresponding to the query. This is a typical setup for the TREC diversity track as well as for commercial search systems. We also know the presentation type of each document. We argue that this presentation may lead to some sort of bias in user behavior and taking it into account may improve the click model’s performance. 2.

CLICK MODELS

Click data has always been an important source of information for web search engines. It is an implicit signal because we do not always understand how user behavior correlates with user satisfaction: user’s clicks are biased. Following Joachims et al. [ 7 ], who conducted eye-tracking experiments, there was a series of papers that model user behavior using probabilistic graphical models. The most influential works in this area include the UBM model by Dupret and Piwowarski [ 6 ], the Cascade Model by Craswell et al. [ 5 ] and the DBN model by Chapelle and Zhang [ 3 ].

A click model can be described as follows. When a user submits a query q to a search engine she gets back 10 results: u1, . . . , u10. Given a query q we denote a session to be a set of events experienced by the user since issuing the query until abandoning the result page or issuing another query. Note that one session corresponds to exactly one query. The minimal set of random variables used in all models to describe user behavior are: examination of the k-th document (Ek) and click on the k-th document (Ck): Ek indicates whether the user looked at the document at rank k (hidden variables).

Ck indicates whether the user clicked on the k-th document (observed variables).

In order to define a click model we need to denote dependencies between these variables. For example, for the UBM model we define

P (Ek = 1 j C1; : : : ; Ck 1) =

Ek = 0 ) Ck = 0 P (Ck = 1 j Ek = 1) = auk ; (1) (2) (3) where kd is a function of two integer parameters: the current position k and the distance to the rank of previous click d = k P revClick = k maxfj j 0 j < k & Cj = 1g (we assume C0 = 1). Furthermore, auk is a variable responsible for the attractiveness of the document uk for the query q. If we know the a and parameters, we can predict click events. The better we predict clicks the better the click model is.

We propose a modification to existing click models that exploits information about user intent and the result page layout. As a basic model to modify we use the UBM click model by Dupret and Piwowarski [ 6 ]. However, our extensions can equally well be applied to other click models. We focus on HTML results that look very similar to the standard 10 blue links. We do not know beforehand that the user notices any differences between special (vertical) results and ordinary ones.

We add one hidden variable I and a set of observed variables fGkg to the two sets of variables fEkg and fCkg commonly used in click models:

I = i indicates that the user performing the session has intent i, i.e., relevance with respect to the category i is much more important for the user.

Gk = l indicates that the result at position k uses a presentation specific to the results with dominating intent l. For example, for the result page shown in Fig. 1 we have G1 = fresh, G2 = fresh , G3 = web. We will further refer to a list of presentation types fG1; : : : ; G10g for a current session as a layout.

A typical user scenario can be described as follows. First, the user looks at the whole result page and decides whether to examine the k-th document or not. We assume that the examination probability P (Ek) does not depend on the document itself, but depends on the user intent, her previous interaction with other results, the document rank k and the SERP layout. If she decides to examine the document (if Ek = 1) we assume that she is focused on that particular document. It implies that the probability of the click P (Ck = 1jEk = 1) depends only on the user intent I and the document relevance / attractiveness of the current document, but neither on the layout nor on the document position k. After clicking (or not clicking) the document the user moves to another document following the same “examine-then-click" scenario.

RESULTS

We used the UBM model as our baseline and ran experiments in order to answer the following research questions: How do intent and layout information help in building click models? How does the performance change when we use only one type of information or both of them? How does the best variation of our model compare to other existing click models?

The main contribution of our work is a framework of intentaware click models, which incorporates both layout and intent information. Our intent-aware modification can be applied to any click model to improve its perplexity. One interesting feature of an intent aware click model is that it allows us to infer separate relevances for different intents from clicks. These relevances can be further used as features for specific vertical ranking formulas. Another important property of intent-aware additions to click models is that by analyzing examination probabilities we can see how user patience depends on his/her intent and the search engine result page layout. Put differently, it allows us to use a click model as an ad-hoc analytic tool.

As to future work, we see a number of directions, especially concerning specific verticals in order to check that our method is also applicable to other verticals/intents. For instance, the mobile arena provides interesting research opportunities.

Sometimes, intents are very unique, like for instance for the query “jaguar” there are at least two intents: finding information about cars and finding information about animals. It is very unlikely that a search engine has a special vertical for these intents. However, we believe that knowledge of the user’s intent can still be used in order to better understand his/her behavior. Applying our ideas to these minor intents is an interesting direction for future work. 4.

[1] Agrawal , R. , Gollapudi , S. , Halverson , A. , Ieong , S. : Diversifying search results . In: WSDM. p. 5 . ACM ( 2009 )

[2] Arguello , J. , Diaz , F. , Callan , J. , Crespo , J.: Sources of evidence for vertical selection . In: SIGIR . pp. 315 - 322 . ACM ( 2009 )

[3] Chapelle , O. , Zhang , Y.: A dynamic bayesian network click model for web search ranking . In: WWW. ACM ( 2009 )

[4] Chuklin , A. , Seryukov , P., de Rijke, M.: Using Intent Information to Model User Behavior in Diversified Search . In: ECIR. Springer ( 2013 )

[5] Craswell , N. , Zoeter , O. , Taylor , M., Ramsey , B. : An experimental comparison of click position-bias models . In: WSDM . p. 87 . ACM ( 2008 )

[6] Dupret , G. , Piwowarski , B. : A user browsing model to predict search engine click data from past observations . In: SIGIR . pp. 331 - 338 . SIGIR '08, ACM ( 2008 )

[7] Joachims , T. , Granka , L. , Pan , B. , Hembrooke , H. , Gay , G.: Accurately interpreting clickthrough data as implicit feedback . In: SIGIR . p. 154 . ACM ( 2005 )

[8] Radlinski , F. , Dumais , S. : Improving personalized web search using result diversification . In: SIGIR. ACM ( 2006 )

[9] Styskin , A. , Romanenko , F. , Vorobyev , F. , Serdyukov , P. : Recency ranking by diversification of result set . In: CIKM . pp. 1949 - 1952 . ACM ( 2011 )