<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Reusing Historical Interaction Data for Faster Online Learning to Rank for IR (Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anne Schuth a.g.schuth@uva.nl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISLA, University of Amsterdam</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Katja Hofmann</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Maarten de Rijke</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Shimon Whiteson</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>26</volume>
      <issue>2013</issue>
      <abstract>
        <p>We summarize the findings from Hofmann et al. [6]. Online learning to rank for information retrieval (IR) holds promise for allowing the development of “self-learning” search engines that can automatically adjust to their users. With the large amount of e.g., click data that can be collected in web search settings, such techniques could enable highly scalable ranking optimization. However, feedback obtained from user interactions is noisy, and developing approaches that can learn from this feedback quickly and reliably is a major challenge. In this paper we investigate whether and how previously collected (historical) interaction data can be used to speed up learning in online learning to rank for IR. We devise the first two methods that can utilize historical data (1) to make feedback available during learning more reliable and (2) to preselect candidate ranking functions to be evaluated in interactions with users of the retrieval system. We evaluate both approaches on 9 learning to rank data sets and find that historical data can speed up learning, leading to substantially and significantly higher online performance. In particular, our preselection method proves highly effective at compensating for noise in user feedback. Our results show that historical data can be used to make online learning to rank for IR much more effective than previously possible, especially when feedback is noisy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>In recent years, learning to rank methods have become popular
in information retrieval (IR) as a means of tuning retrieval systems.
However, most current approaches work offline, meaning that
manually annotated data needs to be collected beforehand, and that, once
deployed, the system cannot continue to adjust to user needs, unless
it is retrained with additional data. An alternative setting is online
learning to rank, where the system learns directly from interactions
with its users. These approaches are typically based on
reinforcement learning techniques, meaning that the system tries out new
ranking functions (also called rankers), and learns from feedback
inferred from users’ interactions with the presented rankings. In
contrast to offline learning to rank approaches, online approaches
do not require any initial training material, but rather automatically
improve rankers while they are being used.</p>
      <p>A main challenge that online learning to rank for IR approaches
have to address is to learn as quickly as possible from the limited
quality and quantity of feedback that can be inferred from user
interactions. In this paper we address this challenge by proposing the
first two online learning to rank algorithms that can reuse previously
collected (historical) interaction data to make online learning more
reliable and faster.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <p>We model online learning to rank for IR as a cycle of
interactions between users and retrieval system. Users submit queries to
which the system responds with ranked result lists. The user
interacts with the result lists, and these interactions allow the search
engine to update its ranking model to improve performance over
time. We address the problem of learning a ranking function that
generalizes over queries and documents, and assume that queries
are independent of each other, and of previously presented results.</p>
      <p>
        Learning in this setting is implemented as stochastic gradient
descent to learn a weight vector w for a linear combination of
ranking features. Ranking features X encode the relationship between
a query and the documents in a document collection (e.g., tf-idf,
PageRank, etc.). Given a weight vector w, and ranking features
X candidate documents are scored using s = wX. Sorting the
documents by these scores results in a result list for the given w.
Our baseline method learns weight vectors using the dueling bandit
gradient descent (DBGD, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) algorithm. This algorithm maintains
a current best weight vector, and learns by generating candidate
weight vectors that are compared to the current best. When a
candidate is found to improve over the current best weight vector, the
weights are updated.
      </p>
      <p>
        User feedback is interpreted using interleaved comparison
methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These methods can infer unbiased relative feedback about
ranker quality from implicit feedback, such as user clicks. In
particular, they combine the result lists produced by the two rankers
into one result ranking, which is then shown to the user. Clicks on
the documents contributed by each ranker can then be interpreted
as votes for that ranker. Our baseline interleaved comparison
methods are Balanced Interleave (BI) and Team Draft (TD) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Our
extensions for reusing historical data are enabled by Probabilistic
Interleave (PI) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Based on DBGD and PI, we can now define our two approaches
for reusing historical data to speed up online learning to rank.</p>
      <p>
        Reliable Historical Comparison (RHC). RHC is based on the
intuition that repeating comparisons on historical data should
provide additional information to complement live comparisons, which
can make estimates of relative performance more reliable. This is
expected to reduce noise and lead to faster learning. Reusing
historical interaction data for additional comparisons is possible using
PI, but estimates may be biased. To remove bias, we use importance
sampling as proposed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We combine the resulting historical
estimates with the original live estimate using the Graybill-Deal
estimator [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This combined estimator weights the two estimates
by the ratio of their variances.
      </p>
      <p>Candidate Pre-Selection (CPS). Our second approach for reusing
historical data to speed up online learning to rank for IR uses
historical data to improve candidate generation. Instead of randomly
BI
TD</p>
      <p>PI
RHC</p>
      <p>CPS
0 200 400 600 800 1000
Figure 1: Offline performance in NDCG (vertical axis,
computed on held-out test queries after each learning step) on
NP2003 data set, for the informational click model over 1K queries.
generating a candidate ranker to test in each comparison, it
generates a pool of candidate rankers, and selects the most promising one
using historical data. We hypothesize that historical data can be
used to identify promising rankers, and that the increased quality of
candidate rankers can speed up learning.</p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTS AND RESULTS</title>
      <p>Our experiments are designed to investigate whether online
learning to rank for IR can be sped up by using historical data. They are
based on an existing simulation framework, which combines fully
annotated learning to rank data sets with probabilistic user models
to simulate user interactions with a search engine that learns online.</p>
      <p>
        We conduct our experiments on the 9 data sets provided as
LETOR 3.0 and 4.0. These data sets implement retrieval tasks
that range from navigational (e.g., home page finding) to
informational (e.g., literature search). They range in size from 50 to 1700
queries, 45 to 64 features, and up to 1000 judged documents per
topic. Starting a data set, we simulate user queries by uniform
sampling from the provided queries. After the retrieval system returns a
ranked result list, user feedback is generated using the Dependent
Click Model (DCM) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], an extension of the Cascade Model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that
has been shown to be effective in explaining users’ click behavior
in web search. We instantiate the user model with three levels of
noise. The perfect click model provides reliable feedback. The
navigational and informational model reflect two types of search tasks.
Our experiments compare and contrast three baseline runs (BI:
balanced interleave, TD: team draft, and PI: probabilistic interleave)
and our proposed methods for reusing historical interaction data,
RHC and CPS. Over all data sets, we find that the performance of
the baseline methods substantially degrades with noise as expected.
Comparing the performance of these baseline methods to that of
RHC and CPS answers our research question of whether reusing
historical interaction data can compensate for this noise.
      </p>
      <p>Performance for our method RHC confirms our hypothesis.
Under perfect user feedback, the method’s performance is equivalent
to that of the baseline methods that use live data only. However, its
relative performance improves with increased noise. Under the
informational click model, the method performs significantly better than
the best baseline method on five of the nine data sets. Performance
is still equivalent on two data sets, and decreases on the remaining
two. Best performance under all click models is achieved by our
CPS method. While we expected performance improvements under
noisy click feedback, this method achieves significant improvements
over the baseline methods even when click feedback is perfect. We
attribute this improvement to the more exhaustive local exploration
enabled by this approach. Performance improvements are highest
under noisy feedback. An example is shown in Figure 1. This graph
shows the offline performance in terms of NDCG on the held-out
test folds for the data set NP2003 over the number of iterations
(queries). We see that the baseline methods BI, TD, and PI learn
slowly as the amount of available feedback increases. RHC,
learning is significantly and substantially faster, because complementing
comparisons with historical data makes feedback for learning more
reliable. Finally, CPS is able to compensate for most of the noise in
user feedback, leading to significantly faster learning.</p>
    </sec>
    <sec id="sec-4">
      <title>4. CONCLUSION</title>
      <p>In this paper, we investigated whether and how historical data
can be reused to speed up online learning to rank for IR. We
proposed the first two online learning to rank approaches that can reuse
historical interaction data. RHC uses historical interaction data to
make feedback inferred from user interactions more reliable. CPS
uses this data to preselect candidate rankers so that the quality of
the rankers compared in live interactions is improved.</p>
      <p>We found that both proposed methods can improve the reliability
of online learning to rank for IR under noisy user feedback. Best
performance was observed using the CPS method, which can
outperform all other methods significantly and substantially under all
levels of noise. Performance gains of CPS were particularly high
when click feedback was noisy. This result demonstrates that CPS
is effective in compensating for noise in click feedback.</p>
      <p>This work is the first to show that historical data can be used
to significantly and substantially improve online performance in
online learning to rank for IR. These methods are expected to make
online learning with noisy feedback more reliable and therefore
more widely applicable.</p>
      <p>Acknowledgements. This research was partially supported by the
European Union’s ICT Policy Support Programme as part of the Competitiveness
and Innovation Framework Programme, CIP ICT-PSP under grant agreement
nr 250430, the European Community’s Seventh Framework Programme
(FP7/2007-2013) under grant agreements nr 258191 (PROMISE Network of
Excellence) and 288024 (LiMoSINe project), the Netherlands Organisation
for Scientific Research (NWO) under project nrs 612.061.814, 612.061.815,
640.004.802, 727.011.005, 612.001.116, HOR-11-10, the Center for
Creation, Content and Technology (CCCT), the Hyperlocal Service Platform
project funded by the Service Innovation &amp; ICT program, the WAHSP and
BILAND projects funded by the CLARIN-nl program, the Dutch national
program COMMIT, by the ESF Research Network Program ELIAS, and
the Elite Network Shifts project funded by the Royal Dutch Academy of
Sciences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zoeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ramsey</surname>
          </string-name>
          .
          <article-title>An experimental comparison of click position-bias models</article-title>
          .
          <source>In WSDM '08</source>
          , pages
          <fpage>87</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Graybill</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Deal</surname>
          </string-name>
          .
          <article-title>Combining unbiased estimators</article-title>
          .
          <source>Biometrics</source>
          ,
          <volume>15</volume>
          (
          <issue>4</issue>
          ):
          <fpage>543</fpage>
          -
          <lpage>550</lpage>
          ,
          <year>1959</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Guo</surname>
          </string-name>
          , C. Liu, and
          <string-name>
            <given-names>Y. M.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Efficient multiple-click models in web search</article-title>
          .
          <source>In WSDM '09</source>
          , pages
          <fpage>124</fpage>
          -
          <lpage>131</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whiteson</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. de Rijke</surname>
          </string-name>
          .
          <article-title>A probabilistic method for inferring preferences from clicks</article-title>
          .
          <source>In CIKM '11</source>
          , pages
          <fpage>249</fpage>
          -
          <lpage>258</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whiteson</surname>
          </string-name>
          , and M. de Rijke.
          <article-title>Estimating interleaved comparison outcomes from historical click data</article-title>
          .
          <source>In CIKM '12</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whiteson</surname>
          </string-name>
          , and M. de Rijke.
          <article-title>Reusing historical interaction data for faster online learning to rank for IR</article-title>
          .
          <source>In WSDM '13</source>
          , pages
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurup</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>How does clickthrough data reflect retrieval quality?</article-title>
          <source>In CIKM '08</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yue</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Interactively optimizing information retrieval systems as a dueling bandits problem</article-title>
          .
          <source>In ICML'09</source>
          , pages
          <fpage>1201</fpage>
          -
          <lpage>1208</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>