<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Intent Information to Model User Behavior in Diversified Search (Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksandr Chuklin</string-name>
          <email>chuklin@yandex-team.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Serdyukov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISLA, University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maarten de Rijke</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Yandex</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A result page of a modern commercial search engine often contains documents of different types targeted to satisfy different user intents (news, blogs, multimedia). When evaluating system performance and making design decisions we need to better understand user behavior on such result pages. To address this problem various click models have previously been proposed. In this paper we focus on result pages containing fresh results and propose a way to model user intent distribution and bias due to different document presentation types. To the best of our knowledge this is the first work that successfully uses intent and layout information to improve existing click models.</p>
      </abstract>
      <kwd-group>
        <kwd>Click models</kwd>
        <kwd>Diversity</kwd>
        <kwd>User Behavior</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The idea of search result diversification appeared several years
ago in the work by Radlinski and Dumais [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Since then all major
commercial search engines addressed the problem of ambiguous
queries either by the technique called federated / vertical search
(see, e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) or by making result diversification a part of the
ranking process [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ]. In this work we focus on one particular
vertical: fresh results, i.e., recently published webpages (news, blogs,
etc.). Fig. 1 shows part of a search engine result page (SERP) in
which fresh results are mixed with ordinary results in response to
the query “Chinese islands”. We say that every document has a
presentation type, in our example “fresh” (the first two documents
in the figure) or “web” (the third, ordinary search result item). We
will further refer to the list of presentation types for the current
result page as a layout. We assume that each query has a number of
categories or intents associated with it. In our case these will be
“fresh” and “web”.
      </p>
      <p>
        The full version of this paper appears in ECIR 2013 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The main problem that we address in this paper is the problem of
modeling user behavior in the presence of vertical results. In order
to better understand user behavior in a multi-intent environment we
propose to exploit intent and layout information in a click model so
as to improve its performance. Unlike previous click models our
proposed model uses additional information that is already
available to search engines. We assume that the system already knows
the probability distribution of intents / categories corresponding to
the query. This is a typical setup for the TREC diversity track as
well as for commercial search systems. We also know the
presentation type of each document. We argue that this presentation may
lead to some sort of bias in user behavior and taking it into account
may improve the click model’s performance.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>CLICK MODELS</title>
      <p>
        Click data has always been an important source of information
for web search engines. It is an implicit signal because we do not
always understand how user behavior correlates with user
satisfaction: user’s clicks are biased. Following Joachims et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
who conducted eye-tracking experiments, there was a series of
papers that model user behavior using probabilistic graphical models.
The most influential works in this area include the UBM model by
Dupret and Piwowarski [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the Cascade Model by Craswell et al.
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the DBN model by Chapelle and Zhang [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>A click model can be described as follows. When a user
submits a query q to a search engine she gets back 10 results: u1, . . . ,
u10. Given a query q we denote a session to be a set of events
experienced by the user since issuing the query until abandoning the
result page or issuing another query. Note that one session
corresponds to exactly one query. The minimal set of random variables
used in all models to describe user behavior are: examination of the
k-th document (Ek) and click on the k-th document (Ck):
Ek indicates whether the user looked at the document at rank
k (hidden variables).</p>
      <p>Ck indicates whether the user clicked on the k-th document
(observed variables).</p>
      <p>In order to define a click model we need to denote dependencies
between these variables. For example, for the UBM model we
define</p>
      <p>P (Ek = 1 j C1; : : : ; Ck 1) =</p>
      <p>kd</p>
      <p>Ek = 0 ) Ck = 0
P (Ck = 1 j Ek = 1) = auk ;
(1)
(2)
(3)
where kd is a function of two integer parameters: the current
position k and the distance to the rank of previous click d = k
P revClick = k maxfj j 0 j &lt; k &amp; Cj = 1g (we assume
C0 = 1). Furthermore, auk is a variable responsible for the
attractiveness of the document uk for the query q. If we know the a and
parameters, we can predict click events. The better we predict
clicks the better the click model is.</p>
      <p>
        We propose a modification to existing click models that exploits
information about user intent and the result page layout. As a
basic model to modify we use the UBM click model by Dupret and
Piwowarski [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, our extensions can equally well be
applied to other click models. We focus on HTML results that look
very similar to the standard 10 blue links. We do not know
beforehand that the user notices any differences between special (vertical)
results and ordinary ones.
      </p>
      <p>We add one hidden variable I and a set of observed variables
fGkg to the two sets of variables fEkg and fCkg commonly used
in click models:</p>
      <p>I = i indicates that the user performing the session has intent
i, i.e., relevance with respect to the category i is much more
important for the user.</p>
      <p>Gk = l indicates that the result at position k uses a
presentation specific to the results with dominating intent l. For
example, for the result page shown in Fig. 1 we have G1 =
fresh, G2 = fresh , G3 = web. We will further refer to a list
of presentation types fG1; : : : ; G10g for a current session as
a layout.</p>
      <p>A typical user scenario can be described as follows. First, the user
looks at the whole result page and decides whether to examine the
k-th document or not. We assume that the examination
probability P (Ek) does not depend on the document itself, but depends
on the user intent, her previous interaction with other results, the
document rank k and the SERP layout. If she decides to
examine the document (if Ek = 1) we assume that she is focused on
that particular document. It implies that the probability of the click
P (Ck = 1jEk = 1) depends only on the user intent I and the
document relevance / attractiveness of the current document, but
neither on the layout nor on the document position k. After clicking
(or not clicking) the document the user moves to another document
following the same “examine-then-click" scenario.</p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
      <p>We used the UBM model as our baseline and ran experiments in
order to answer the following research questions:
How do intent and layout information help in building click
models? How does the performance change when we use
only one type of information or both of them?
How does the best variation of our model compare to other
existing click models?</p>
      <p>The main contribution of our work is a framework of
intentaware click models, which incorporates both layout and intent
information. Our intent-aware modification can be applied to any
click model to improve its perplexity. One interesting feature of
an intent aware click model is that it allows us to infer separate
relevances for different intents from clicks. These relevances can
be further used as features for specific vertical ranking formulas.
Another important property of intent-aware additions to click
models is that by analyzing examination probabilities we can see how
user patience depends on his/her intent and the search engine result
page layout. Put differently, it allows us to use a click model as an
ad-hoc analytic tool.</p>
      <p>As to future work, we see a number of directions, especially
concerning specific verticals in order to check that our method is also
applicable to other verticals/intents. For instance, the mobile arena
provides interesting research opportunities.</p>
      <p>Sometimes, intents are very unique, like for instance for the
query “jaguar” there are at least two intents: finding information
about cars and finding information about animals. It is very
unlikely that a search engine has a special vertical for these intents.
However, we believe that knowledge of the user’s intent can still
be used in order to better understand his/her behavior. Applying
our ideas to these minor intents is an interesting direction for future
work.
4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollapudi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halverson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ieong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Diversifying search results</article-title>
          . In: WSDM. p.
          <fpage>5</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Arguello</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crespo</surname>
          </string-name>
          , J.:
          <article-title>Sources of evidence for vertical selection</article-title>
          .
          <source>In: SIGIR</source>
          . pp.
          <fpage>315</fpage>
          -
          <lpage>322</lpage>
          . ACM (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y.:
          <article-title>A dynamic bayesian network click model for web search ranking</article-title>
          .
          <source>In: WWW. ACM</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chuklin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seryukov</surname>
          </string-name>
          , P., de Rijke, M.:
          <article-title>Using Intent Information to Model User Behavior in Diversified Search</article-title>
          . In: ECIR. Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zoeter</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>Ramsey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>An experimental comparison of click position-bias models</article-title>
          .
          <source>In: WSDM</source>
          . p.
          <fpage>87</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dupret</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piwowarski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A user browsing model to predict search engine click data from past observations</article-title>
          .
          <source>In: SIGIR</source>
          . pp.
          <fpage>331</fpage>
          -
          <lpage>338</lpage>
          . SIGIR '08,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granka</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hembrooke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gay</surname>
          </string-name>
          , G.:
          <article-title>Accurately interpreting clickthrough data as implicit feedback</article-title>
          .
          <source>In: SIGIR</source>
          . p.
          <fpage>154</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Radlinski</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Improving personalized web search using result diversification</article-title>
          .
          <source>In: SIGIR. ACM</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Styskin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanenko</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorobyev</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serdyukov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Recency ranking by diversification of result set</article-title>
          .
          <source>In: CIKM</source>
          . pp.
          <fpage>1949</fpage>
          -
          <lpage>1952</lpage>
          . ACM (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>