=Paper= {{Paper |id=None |storemode=property |title=Using Intent Information to Model User Behavior in Diversified Search |pdfUrl=https://ceur-ws.org/Vol-986/paper_21.pdf |volume=Vol-986 |dblpUrl=https://dblp.org/rec/conf/dir/ChuklinSR13 }} ==Using Intent Information to Model User Behavior in Diversified Search== https://ceur-ws.org/Vol-986/paper_21.pdf
              Using Intent Information to Model User Behavior
                      in Diversified Search (Abstract)∗

                         Aleksandr Chuklin1,2                  Pavel Serdyukov1               Maarten de Rijke2
                                                          1
                                                           Yandex, Moscow, Russia
                                            2
                                                ISLA, University of Amsterdam, The Netherlands
                      chuklin@yandex-team.ru, pavser@yandex-team.ru, derijke@uva.nl


ABSTRACT
A result page of a modern commercial search engine often con-
tains documents of different types targeted to satisfy different user
intents (news, blogs, multimedia). When evaluating system perfor-
mance and making design decisions we need to better understand
user behavior on such result pages. To address this problem various
click models have previously been proposed. In this paper we focus
on result pages containing fresh results and propose a way to model
user intent distribution and bias due to different document presen-
tation types. To the best of our knowledge this is the first work that
successfully uses intent and layout information to improve existing
click models.

Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Retrieval Models

General Terms
                                                                           Figure 1: Group of fresh results at the top followed by an ordi-
Algorithms, Experiment, Theory                                             nary search result item.

Keywords                                                                      The main problem that we address in this paper is the problem of
Click models, Diversity, User Behavior                                     modeling user behavior in the presence of vertical results. In order
                                                                           to better understand user behavior in a multi-intent environment we
1.    INTRODUCTION                                                         propose to exploit intent and layout information in a click model so
                                                                           as to improve its performance. Unlike previous click models our
   The idea of search result diversification appeared several years        proposed model uses additional information that is already avail-
ago in the work by Radlinski and Dumais [8]. Since then all major          able to search engines. We assume that the system already knows
commercial search engines addressed the problem of ambiguous               the probability distribution of intents / categories corresponding to
queries either by the technique called federated / vertical search         the query. This is a typical setup for the TREC diversity track as
(see, e.g., [2]) or by making result diversification a part of the rank-   well as for commercial search systems. We also know the presen-
ing process [1, 9]. In this work we focus on one particular verti-         tation type of each document. We argue that this presentation may
cal: fresh results, i.e., recently published webpages (news, blogs,        lead to some sort of bias in user behavior and taking it into account
etc.). Fig. 1 shows part of a search engine result page (SERP) in          may improve the click model’s performance.
which fresh results are mixed with ordinary results in response to
the query “Chinese islands”. We say that every document has a
presentation type, in our example “fresh” (the first two documents         2.    CLICK MODELS
in the figure) or “web” (the third, ordinary search result item). We          Click data has always been an important source of information
will further refer to the list of presentation types for the current re-   for web search engines. It is an implicit signal because we do not
sult page as a layout. We assume that each query has a number of           always understand how user behavior correlates with user satis-
categories or intents associated with it. In our case these will be        faction: user’s clicks are biased. Following Joachims et al. [7],
“fresh” and “web”.                                                         who conducted eye-tracking experiments, there was a series of pa-
                                                                           pers that model user behavior using probabilistic graphical models.
∗The full version of this paper appears in ECIR 2013 [4].
                                                                           The most influential works in this area include the UBM model by
                                                                           Dupret and Piwowarski [6], the Cascade Model by Craswell et al.
                                                                           [5] and the DBN model by Chapelle and Zhang [3].
Copyright is held by the author/owner(s).                                     A click model can be described as follows. When a user sub-
DIR’13, Delft, The Netherlands.                                            mits a query q to a search engine she gets back 10 results: u1 , . . . ,
u10 . Given a query q we denote a session to be a set of events ex-              • How do intent and layout information help in building click
perienced by the user since issuing the query until abandoning the                 models? How does the performance change when we use
result page or issuing another query. Note that one session corre-                 only one type of information or both of them?
sponds to exactly one query. The minimal set of random variables                 • How does the best variation of our model compare to other
used in all models to describe user behavior are: examination of the               existing click models?
k-th document (Ek ) and click on the k-th document (Ck ):
     • Ek indicates whether the user looked at the document at rank            The main contribution of our work is a framework of intent-
       k (hidden variables).                                                aware click models, which incorporates both layout and intent in-
                                                                            formation. Our intent-aware modification can be applied to any
     • Ck indicates whether the user clicked on the k-th document           click model to improve its perplexity. One interesting feature of
       (observed variables).                                                an intent aware click model is that it allows us to infer separate
In order to define a click model we need to denote dependencies             relevances for different intents from clicks. These relevances can
between these variables. For example, for the UBM model we de-              be further used as features for specific vertical ranking formulas.
fine                                                                        Another important property of intent-aware additions to click mod-
                                                                            els is that by analyzing examination probabilities we can see how
                 P (Ek = 1 | C1 , . . . , Ck−1 ) = γkd               (1)    user patience depends on his/her intent and the search engine result
                          Ek = 0 ⇒ Ck = 0                            (2)    page layout. Put differently, it allows us to use a click model as an
                    P (Ck = 1 | Ek = 1) = auk ,                      (3)    ad-hoc analytic tool.
                                                                               As to future work, we see a number of directions, especially con-
where γkd is a function of two integer parameters: the current po-          cerning specific verticals in order to check that our method is also
sition k and the distance to the rank of previous click d = k −             applicable to other verticals/intents. For instance, the mobile arena
P revClick = k − max{j | 0 ≤ j < k & Cj = 1} (we assume                     provides interesting research opportunities.
C0 = 1). Furthermore, auk is a variable responsible for the attrac-            Sometimes, intents are very unique, like for instance for the
tiveness of the document uk for the query q. If we know the a and           query “jaguar” there are at least two intents: finding information
γ parameters, we can predict click events. The better we predict            about cars and finding information about animals. It is very un-
clicks the better the click model is.                                       likely that a search engine has a special vertical for these intents.
   We propose a modification to existing click models that exploits         However, we believe that knowledge of the user’s intent can still
information about user intent and the result page layout. As a ba-          be used in order to better understand his/her behavior. Applying
sic model to modify we use the UBM click model by Dupret and                our ideas to these minor intents is an interesting direction for future
Piwowarski [6]. However, our extensions can equally well be ap-             work.
plied to other click models. We focus on HTML results that look
very similar to the standard 10 blue links. We do not know before-          4.     REFERENCES
hand that the user notices any differences between special (vertical)
results and ordinary ones.
                                                                             [1] Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversi-
   We add one hidden variable I and a set of observed variables
                                                                                 fying search results. In: WSDM. p. 5. ACM (2009)
{Gk } to the two sets of variables {Ek } and {Ck } commonly used
                                                                             [2] Arguello, J., Diaz, F., Callan, J., Crespo, J.: Sources of evi-
in click models:
                                                                                 dence for vertical selection. In: SIGIR. pp. 315–322. ACM
     • I = i indicates that the user performing the session has intent           (2009)
       i, i.e., relevance with respect to the category i is much more        [3] Chapelle, O., Zhang, Y.: A dynamic bayesian network click
       important for the user.                                                   model for web search ranking. In: WWW. ACM (2009)
     • Gk = l indicates that the result at position k uses a presen-         [4] Chuklin, A., Seryukov, P., de Rijke, M.: Using Intent Infor-
       tation specific to the results with dominating intent l. For              mation to Model User Behavior in Diversified Search. In:
       example, for the result page shown in Fig. 1 we have G1 =                 ECIR. Springer (2013)
       fresh, G2 = fresh, G3 = web. We will further refer to a list          [5] Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experi-
       of presentation types {G1 , . . . , G10 } for a current session as        mental comparison of click position-bias models. In: WSDM.
       a layout.                                                                 p. 87. ACM (2008)
                                                                             [6] Dupret, G., Piwowarski, B.: A user browsing model to predict
A typical user scenario can be described as follows. First, the user             search engine click data from past observations. In: SIGIR.
looks at the whole result page and decides whether to examine the                pp. 331–338. SIGIR ’08, ACM (2008)
k-th document or not. We assume that the examination probabil-               [7] Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.:
ity P (Ek ) does not depend on the document itself, but depends                  Accurately interpreting clickthrough data as implicit feed-
on the user intent, her previous interaction with other results, the             back. In: SIGIR. p. 154. ACM (2005)
document rank k and the SERP layout. If she decides to exam-                 [8] Radlinski, F., Dumais, S.: Improving personalized web search
ine the document (if Ek = 1) we assume that she is focused on                    using result diversification. In: SIGIR. ACM (2006)
that particular document. It implies that the probability of the click       [9] Styskin, A., Romanenko, F., Vorobyev, F., Serdyukov, P.: Re-
P (Ck = 1|Ek = 1) depends only on the user intent I and the doc-                 cency ranking by diversification of result set. In: CIKM. pp.
ument relevance / attractiveness of the current document, but nei-               1949–1952. ACM (2011)
ther on the layout nor on the document position k. After clicking
(or not clicking) the document the user moves to another document
following the same “examine-then-click" scenario.

3.     RESULTS
  We used the UBM model as our baseline and ran experiments in
order to answer the following research questions: