=Paper=
{{Paper
|id=None
|storemode=property
|title=Using Intent Information to Model User Behavior in Diversified Search
|pdfUrl=https://ceur-ws.org/Vol-986/paper_21.pdf
|volume=Vol-986
|dblpUrl=https://dblp.org/rec/conf/dir/ChuklinSR13
}}
==Using Intent Information to Model User Behavior in Diversified Search==
Using Intent Information to Model User Behavior in Diversified Search (Abstract)∗ Aleksandr Chuklin1,2 Pavel Serdyukov1 Maarten de Rijke2 1 Yandex, Moscow, Russia 2 ISLA, University of Amsterdam, The Netherlands chuklin@yandex-team.ru, pavser@yandex-team.ru, derijke@uva.nl ABSTRACT A result page of a modern commercial search engine often con- tains documents of different types targeted to satisfy different user intents (news, blogs, multimedia). When evaluating system perfor- mance and making design decisions we need to better understand user behavior on such result pages. To address this problem various click models have previously been proposed. In this paper we focus on result pages containing fresh results and propose a way to model user intent distribution and bias due to different document presen- tation types. To the best of our knowledge this is the first work that successfully uses intent and layout information to improve existing click models. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval Models General Terms Figure 1: Group of fresh results at the top followed by an ordi- Algorithms, Experiment, Theory nary search result item. Keywords The main problem that we address in this paper is the problem of Click models, Diversity, User Behavior modeling user behavior in the presence of vertical results. In order to better understand user behavior in a multi-intent environment we 1. INTRODUCTION propose to exploit intent and layout information in a click model so as to improve its performance. Unlike previous click models our The idea of search result diversification appeared several years proposed model uses additional information that is already avail- ago in the work by Radlinski and Dumais [8]. Since then all major able to search engines. We assume that the system already knows commercial search engines addressed the problem of ambiguous the probability distribution of intents / categories corresponding to queries either by the technique called federated / vertical search the query. This is a typical setup for the TREC diversity track as (see, e.g., [2]) or by making result diversification a part of the rank- well as for commercial search systems. We also know the presen- ing process [1, 9]. In this work we focus on one particular verti- tation type of each document. We argue that this presentation may cal: fresh results, i.e., recently published webpages (news, blogs, lead to some sort of bias in user behavior and taking it into account etc.). Fig. 1 shows part of a search engine result page (SERP) in may improve the click model’s performance. which fresh results are mixed with ordinary results in response to the query “Chinese islands”. We say that every document has a presentation type, in our example “fresh” (the first two documents 2. CLICK MODELS in the figure) or “web” (the third, ordinary search result item). We Click data has always been an important source of information will further refer to the list of presentation types for the current re- for web search engines. It is an implicit signal because we do not sult page as a layout. We assume that each query has a number of always understand how user behavior correlates with user satis- categories or intents associated with it. In our case these will be faction: user’s clicks are biased. Following Joachims et al. [7], “fresh” and “web”. who conducted eye-tracking experiments, there was a series of pa- pers that model user behavior using probabilistic graphical models. ∗The full version of this paper appears in ECIR 2013 [4]. The most influential works in this area include the UBM model by Dupret and Piwowarski [6], the Cascade Model by Craswell et al. [5] and the DBN model by Chapelle and Zhang [3]. Copyright is held by the author/owner(s). A click model can be described as follows. When a user sub- DIR’13, Delft, The Netherlands. mits a query q to a search engine she gets back 10 results: u1 , . . . , u10 . Given a query q we denote a session to be a set of events ex- • How do intent and layout information help in building click perienced by the user since issuing the query until abandoning the models? How does the performance change when we use result page or issuing another query. Note that one session corre- only one type of information or both of them? sponds to exactly one query. The minimal set of random variables • How does the best variation of our model compare to other used in all models to describe user behavior are: examination of the existing click models? k-th document (Ek ) and click on the k-th document (Ck ): • Ek indicates whether the user looked at the document at rank The main contribution of our work is a framework of intent- k (hidden variables). aware click models, which incorporates both layout and intent in- formation. Our intent-aware modification can be applied to any • Ck indicates whether the user clicked on the k-th document click model to improve its perplexity. One interesting feature of (observed variables). an intent aware click model is that it allows us to infer separate In order to define a click model we need to denote dependencies relevances for different intents from clicks. These relevances can between these variables. For example, for the UBM model we de- be further used as features for specific vertical ranking formulas. fine Another important property of intent-aware additions to click mod- els is that by analyzing examination probabilities we can see how P (Ek = 1 | C1 , . . . , Ck−1 ) = γkd (1) user patience depends on his/her intent and the search engine result Ek = 0 ⇒ Ck = 0 (2) page layout. Put differently, it allows us to use a click model as an P (Ck = 1 | Ek = 1) = auk , (3) ad-hoc analytic tool. As to future work, we see a number of directions, especially con- where γkd is a function of two integer parameters: the current po- cerning specific verticals in order to check that our method is also sition k and the distance to the rank of previous click d = k − applicable to other verticals/intents. For instance, the mobile arena P revClick = k − max{j | 0 ≤ j < k & Cj = 1} (we assume provides interesting research opportunities. C0 = 1). Furthermore, auk is a variable responsible for the attrac- Sometimes, intents are very unique, like for instance for the tiveness of the document uk for the query q. If we know the a and query “jaguar” there are at least two intents: finding information γ parameters, we can predict click events. The better we predict about cars and finding information about animals. It is very un- clicks the better the click model is. likely that a search engine has a special vertical for these intents. We propose a modification to existing click models that exploits However, we believe that knowledge of the user’s intent can still information about user intent and the result page layout. As a ba- be used in order to better understand his/her behavior. Applying sic model to modify we use the UBM click model by Dupret and our ideas to these minor intents is an interesting direction for future Piwowarski [6]. However, our extensions can equally well be ap- work. plied to other click models. We focus on HTML results that look very similar to the standard 10 blue links. We do not know before- 4. REFERENCES hand that the user notices any differences between special (vertical) results and ordinary ones. [1] Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversi- We add one hidden variable I and a set of observed variables fying search results. In: WSDM. p. 5. ACM (2009) {Gk } to the two sets of variables {Ek } and {Ck } commonly used [2] Arguello, J., Diaz, F., Callan, J., Crespo, J.: Sources of evi- in click models: dence for vertical selection. In: SIGIR. pp. 315–322. ACM • I = i indicates that the user performing the session has intent (2009) i, i.e., relevance with respect to the category i is much more [3] Chapelle, O., Zhang, Y.: A dynamic bayesian network click important for the user. model for web search ranking. In: WWW. ACM (2009) • Gk = l indicates that the result at position k uses a presen- [4] Chuklin, A., Seryukov, P., de Rijke, M.: Using Intent Infor- tation specific to the results with dominating intent l. For mation to Model User Behavior in Diversified Search. In: example, for the result page shown in Fig. 1 we have G1 = ECIR. Springer (2013) fresh, G2 = fresh, G3 = web. We will further refer to a list [5] Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experi- of presentation types {G1 , . . . , G10 } for a current session as mental comparison of click position-bias models. In: WSDM. a layout. p. 87. ACM (2008) [6] Dupret, G., Piwowarski, B.: A user browsing model to predict A typical user scenario can be described as follows. First, the user search engine click data from past observations. In: SIGIR. looks at the whole result page and decides whether to examine the pp. 331–338. SIGIR ’08, ACM (2008) k-th document or not. We assume that the examination probabil- [7] Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: ity P (Ek ) does not depend on the document itself, but depends Accurately interpreting clickthrough data as implicit feed- on the user intent, her previous interaction with other results, the back. In: SIGIR. p. 154. ACM (2005) document rank k and the SERP layout. If she decides to exam- [8] Radlinski, F., Dumais, S.: Improving personalized web search ine the document (if Ek = 1) we assume that she is focused on using result diversification. In: SIGIR. ACM (2006) that particular document. It implies that the probability of the click [9] Styskin, A., Romanenko, F., Vorobyev, F., Serdyukov, P.: Re- P (Ck = 1|Ek = 1) depends only on the user intent I and the doc- cency ranking by diversification of result set. In: CIKM. pp. ument relevance / attractiveness of the current document, but nei- 1949–1952. ACM (2011) ther on the layout nor on the document position k. After clicking (or not clicking) the document the user moves to another document following the same “examine-then-click" scenario. 3. RESULTS We used the UBM model as our baseline and ran experiments in order to answer the following research questions: