<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BM25 Pseudo Relevance Feedback Using Anserini at Waseda University</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhaohao Zeng</string-name>
          <email>zhaohao@fuji.waseda.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetsuya Sakai</string-name>
          <email>tetsuyasakai@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Supported Collections:</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Waseda University</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>robust04</institution>
        </aff>
      </contrib-group>
      <fpage>62</fpage>
      <lpage>63</lpage>
      <abstract>
        <p>We built a Docker image for BM25PRF (BM25 with Pseudo Relevance Feedback) retrieval model with Anserini. Also, grid search is provided in the Docker image for parameter tuning. Experimental results suggest that BM25PRF with default parameters outperforms vanilla BM25 on robust04, but tuning parameters on 49 topics of robust04 did not further improve its efectiveness. Image Source: github.com/osirrc/anserini-bm25prf-docker Docker Hub: hub.docker.com/r/osirrc2019/anserini-bm25prf Docker image, logarithm is applied to r in the OW calculation to alleviate this problem according to Sakai and Robertson [4]:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>OVERVIEW</title>
      <p>
        BM25 has been widely used as a baseline model for text retrieval
tasks. However, some researches only implement the vanilla form
of BM25 without query expansion and parameter tuning. As a
result, the performance of BM25 may be underestimated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In our
Docker image, we implemented BM25PRF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which utilises Pesudo
Relevance Feedback (PRF) to expand queries for BM25. We also
implemented parameter tuning in the Docker image because we
believe how to obtain the optimised parameters is also an important
part of reproducible research. We built BM25PRF and parameter
tuning with Anserini [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a toolkit built on top of Lucene for replicable
IR research.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RETRIEVAL MODELS</title>
      <p>
        Given a query q, BM25PRF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] ranks the collection with the classic
BM25 first, and then extracts m terms that have high Ofer Weights
(OW ) from the top R ranked documents to expand the query. To
calculate the Ofer Weight for a term ti , its Relevance Weight (RW )
needs to be calculated first as follows:
      </p>
      <p>RW (ti ) = log (r + 0.5) (N − n − R + r + 0.5) (1)
(n − r + 0.5) (R − r + 0.5)
where r is the Document Frequency (DF) of term ti in the top R
documents, n is the DF of ti in the whole collection, and N is the
number of documents in the collection. Then, the Ofer Weight is
OW (ti ) = RW (ti ) · r
(2)
However, in practice a term that has high r will tend to have high
OW , so some common words (e.g., be, been) may have high OW
and be selected as expansion terms. Since the common words are
not informative, they may be not helpful for ranking. Thus, in our</p>
      <p>OW (ti ) = RW (ti ) · log(r )
After query expansion, the expanded terms will be used for the
second search using a BM25 variant: for one term ti and one document
dj , the score s (ti , dj ) is calculated as follows:
s (ti , dj ) = s ′(ti , dj )
w · s ′(ti , dj ) else
</p>
      <p>if ti ∈ q
s ′(ti , dj ) =</p>
      <p>RW (ti ) · T F (ti , dj ) · (K 1 + 1)</p>
      <p>K 1 · ((1 − b ) + (b · (N DL(dj )))) + T F (ti , dj )
where T F (ti , dj ) is the term frequency of term ti in dj , N DL(dj ) is
N · |dj | , w is
the normalised document length of dj : N DL(dj ) = PN
k |dk |
the weight of new terms, and K 1 and b are the hyper-parameters
of BM25. All the tunable hyper-parameters are shown in Table 1.
K 1
b
K 1pr f
bpr f
R
w
m</p>
      <p>Search space
0.1 - 0.9 step=0.1
0.1 - 0.9 step=0.1
0.1 - 0.9 step=0.1
0.1 - 0.9 step=0.1
{5, 10, 20}
{0.1, 0.2, 0.5, 1}
{0, 5, 10, 20, 40}</p>
      <p>Default
0.9
0.4
0.9
0.4
10
0.2
20</p>
      <sec id="sec-2-1">
        <title>Note</title>
        <p>K 1 of the first search
b of the first search
K 1 of the second search
b of the second search
num of relevant docs
weight of new terms
num of new terms
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>TECHNICAL DESIGN</title>
      <p>Supported Hooks:
init, index, search, train
Since the BM25PRF retrieval model is not included in the original
Anserini library, we forked its repository and added two JAVA
classes: BM25PRFSimilarity and BM25PRFReRanker by extending
the Similarity Class and the ReRanker Class, respectively. Thus, the
implemented BM25PRF can be utilised on any collections supported
by Anserini, though we only tested it on the robust04 collection
in this paper. Python scripts are used as hooks to run the necessary
commands (e.g., index and search) via jig.1 Jig is a tool provided</p>
      <sec id="sec-3-1">
        <title>1https://github.com/osirrc/jig</title>
        <p>by the OSIRRC organisers to operate the Docker images which
follow the OSIRRC specification.</p>
        <p>Grid search is also provided in the Docker image for parameter
tuning, and can be executed using the train hook of jig. It performs
search and evaluation for every combination of parameters
speciifed. To reduce the search space of grid search, our tuning process
consists of two steps. First, it performs search on a validation set
using the original BM25 to find the optimal parameters of it (i.e.,
K 1 and b) based on Mean P@20. The K 1 and b are the parameters
for the initial iteration of BM25PRF, so precision may be important
for extracting efective expansion terms. Then, the tuned K 1 and b
are frozen, and the other parameters of BM25PRF (i.e., K 1pr f , bpr f ,
R, w, and m) are tuned on the validation set based on MAP.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>RESULTS</title>
      <p>The parameter tuning was performed on 49 topics2 of robust04,
and the tuned parameters are shown in Table 2. As shown in Table 3,
the BM25PRF outperforms the vanilla BM25, but the tuned
hyperparameters do not improve BM25PRF’s performance on robust04.
This may be because the validation set used for tuning is too small,
and the parameters have been overfitted. Since the goal of this
study is about using Docker for reproducible IR research instead of
demonstrating the efectiveness of BM25PRF and grid search, we
do not further discuss the performance in this paper.
b
0.2</p>
      <p>K 1pr f
0.9
bpr f
0.6</p>
      <p>R
10
w
0.1
5</p>
    </sec>
    <sec id="sec-5">
      <title>OSIRRC EXPERIENCE</title>
      <p>Docker has been widely used in industry for delivering software,
but we found that using Docker to manage an experimental
environment has advantages for research usage as well. First, it is
easier to configure environments with Docker than with a
baremetal server, especially for deep learning scenarios where a lot of
packages (e.g., GPU driver, CUDA and cuDNN) need to be installed.
Moreover, Docker makes experiments more trackable. Research
code is usually messy, lacks documentation, and may need a lot
of changes during the experiment, so even the author may have
dificulty to remember the whole change log. Since each Docker
tag is an executable archive, it provides a kind of version control
on executables. Moreover, if the docker images follow some
common specification like the ones we build for OSIRRC, running the
2The topics ids of the validation set are provided by the OSIRRC organisers in jig:
https://github.com/osirrc/jig/tree/master/sample_training_validation_query_ids
research codes we wrote several months ago is not a nightmare
anymore.</p>
      <p>However, the biggest obstacle we faced during the development
for OSIRRC is that it is more dificult to debug with Docker and jig.
For example, there is no simple approach to setting a debugger into
the Docker container when it was launched by jig. Furthermore,
current jig assumes that the index files are built inside the Docker
container and commit the Docker and built index as a new image,
which means that the index needs to be built again after modifying
the source code. While it is not a serious problem for small
collections like robust04, it may take too much time for large collections.
To solve this problem, we think jig should allow users to mount
external index when launching the search hook. Although
mounting external data into a Docker container is a standard action when
using Docker’s command line tools directly, but OSIRRC expects
Docker images to be operated through jig, which currently does
not provide such a feature.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Clancy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>osirrc/anserini-docker: OSIRRC @ SIGIR 2019 Docker Image for Anserini</article-title>
          . https://doi.org/10.5281/zenodo.3246820
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The Neural Hype and Comparisons Against Weak Baselines</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , Vol.
          <volume>52</volume>
          . ACM,
          <volume>40</volume>
          -
          <fpage>51</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          and Karen Spärck Jones.
          <year>1994</year>
          .
          <article-title>Simple, proven approaches to text retrieval</article-title>
          .
          <source>Technical Report 356</source>
          . Computer Laboratory University of Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Tetsuya</given-names>
            <surname>Sakai</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stephen E</given-names>
            <surname>Robertson</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Relative and absolute term selection criteria: a comparative study for English and Japanese IR</article-title>
          .
          <source>In SIGIR</source>
          <year>2002</year>
          .
          <volume>411</volume>
          -
          <fpage>412</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Anserini: Enabling the Use of Lucene for Information Retrieval Research</article-title>
          .
          <source>In SIGIR</source>
          <year>2017</year>
          .
          <volume>1253</volume>
          -
          <fpage>1256</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>