<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Dockerising Terrier for The Open-Source IR Replicability Challenge (OSIRRC 2019)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arthur Câmara</string-name>
          <email>A.BarbosaCamara@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Craig Macdonald</string-name>
          <email>craig.macdonald@glasgow.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
          ,
          <addr-line>Delft</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Glasgow</institution>
          ,
          <addr-line>Glasgow</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>25</volume>
      <issue>2019</issue>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Reproducibility and replicability are key concepts in science, and it is therefore important for information retrieval (IR) platforms to aid in reproducing and replicating experiments. In this paper, we describe the creation of a Docker container for Terrier within the framework of the OSIRRC 2019 challenge, which allows typical runs to be reproduced on TREC Test Collections such as Robust04, GOV2, Core2018. In doing so, it is hoped that the produced Docker image can be of aid to other (re)producing baseline experiments on these test collections. Initiatives like OSIRRC are key in advancing these key concepts in the IR area. By making not only the source code available, but also the exact same environment and standardising inputs and outputs, it is possible to easily compare approaches and thereby improve the quality of the research for Information Retrieval.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>OVERVIEW</title>
      <p>
        Terrier (Terabyte Retriever) is an information retrieval (IR) toolkit,
initiated by the University of Glasgow, which has been developed
since 2001 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It implements a number of retrieval and indexing
methods, ready to be used in both research and production.
      </p>
      <p>Given its open source nature, Terrier has been used in a number
of papers in the field of IR and others over the years, particularly
using standard IR test collections such as those from the Text
REtrieval Conference (TREC). For this reason, we agreed to join the
OSIRRC 2019 challenge, to create a Docker container image to
allow standard baseline results to be obtained using Terrier in a
manner that can be easily cross-compared with other platforms
implementing the OSIRRC design. Moreover, it is hoped that the
produced Docker image can be of aid to others in (re)producing
baseline experiments on these test collections.</p>
      <p>This paper describes the implementation of the Terrier Docker
image. In particular: Section 2 describes the various scripts
implemented, as well as the obtained retrieval performances; Section 3
describes the lessons learned in this implementation; Concluding
remarks and outlook follow in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>TECHNICAL DESIGN</title>
      <p>The nature of the OSIRRC challenge is that implementing systems
should provide a Dockerfile that can be used to create a Docker
container image that can be run on any Docker installation. The
container image is required to implement to a number of “hooks”
simply put, an executable at a known location in the image
filesystem. These hooks are then called by the OSIRRC jig, which also
mounts any additional files required into the container image (for
instance, the corpus files for indexing, or the topic files for retrieval).
In the following, we describe the Dockerfile used to build the
container image, and the hooks that we implemented for Terrier within
the image.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Dockerfile</title>
      <p>The Dockerfile builds a container image with the necessary
prerequisites for Terrier. As Terrier is developed in Java, we base the
Terrier container image on the OpenJDK standard image for Java
8. A number of other libraries are installed, including Python (for
interacting with the OSIRRC jig); gcompat (standard C libraries for
trec_eval); and the Jupyter pre-requisites. We chose not to install
Terrier using the Dockerfile, to maintain a lightweight container
image.
2.2</p>
      <p>Standard Hooks
2.2.1 init. This hook is used to prepare the container. We use this
hook to download and extract Terrier. We provide example code for
both downloading a pre-built “tarball” Terrier from the Terrier.org
website, or checking out a version from the Github repository. This
hook is configured to use Terrier latest stable version, 5.2. It can,
however, be easily configured for fetching other Terrier versions,
by changing the variable version in the init script.</p>
      <p>The hook is also compatible with git, making it possible to fetch
bleeding-edge versions of Terrier directly from the terrier-core
Github repository1. This can be controlled by the variable github
in the same init script. Note that when using the Github version
the init hook may take longer to run, since the code will be compiled
manually for your system. In our experiments, this could take up
to 3 extra minutes.
2.2.2 index. This hook is used to index the corpus that has been
mounted by the jig. The jig provides the name of the corpus (see
Table 1 for supported corpora), as well as the format (trectext,
trecweb or json). The index jig uses the corpora name to configure
the indexing process. For example, for the robust04 corpus which
uses TREC Disks 4 &amp; 5, we remove the Congressional Record from
the indexing manifest (the collection.spec file that lists the files
Terrier should index) as well READMEs and other unnecessary files,
and configure an additional decompression plugin. Similarly, for the
core18 corpus, we configure Terrier to download 2 an additional
indexing plugin to support parsing of the TREC Washington Post
corpus.</p>
      <sec id="sec-3-1">
        <title>1https://github.com/terrier-org/terrier-core</title>
        <p>2Indeed, inspired by Apache Spark, since version 5.0, Terrier supports downloading
additional plugins from MavenCentral, based on the value of the terrier.mvn.coords
configuration property.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Name Description</title>
        <p>robust04 TREC Disks 4 &amp; 5, minus CR
gov2 TREC GOV2
core18 TREC Washington Post
cw09b TREC ClueWeb09 part B
cw12b TREC ClueWeb12 part B</p>
        <p>Table 1: Supported corpora.</p>
        <p>Finally, we note that Terrier supports a variety of indexing
conifgurations - we document here our choices and the alternatives
available:
• Positions: Terrier, by default, does not record positions (also
called blocks) in the direct or inverted index. Passing the
optional argument of block.indexing to the jig will result
in positions being saved. This allows proximity weighting
models to be run.
• Indexer: Terrier’s “classical” indexer creates a direct index
(also called a forward index) before inversion of the direct
index to create the inverted index. Indeed, the direct index
allows pseudo-relevance feedback query expansion.
However, the classical indexer is known to be slower than the
alternative “single-pass” indexer that Terrier also provides.
If the direct index is not required, the single-pass indexer
could be used by passing a -j flag to Terrier during indexing.
• Fields: Including fields in the index allow the frequency
of query terms within diferent parts of the document (e.g.
TITLE tags) to be recorded separately. This allows the use of
ifeld-based weighting models such as BM25F or PL2F, or use
of fields for features within a learning-to-rank setting.
• Stemming &amp; Stopwords: We retain Terrier’s default
setting of Porter Stemming and standard stopword removal.</p>
        <p>
          Terrier’s stopword list has 733 words.
2.2.3 search. This hooks makes use of Terrier’s batchretrieve
command to execute a ‘run’, i.e. to extract 1000 results for each
of a batch of information needs (topics/queries). The jig mounts
the topics into the container image. In addition to learning-to-rank
(discussed further below), we provide of-the-shelf support for 12
retrieval configurations. These are broken down by three
orthogonal components: weighting model (BM25 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], as well as PL2 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and
DPH [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] from the Divergence from Randomness framework);
proximity (pBiL Divergence from Randomness model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]); and query
expansion (Bo1 Divergence from Randomness model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]).
        </p>
        <p>The combination of these three components yields the following
possible configurations, to be passed to the hook using the --opts
config=&lt;setting&gt; parameter:
• BM25
– bm25: Vanilla BM25
– bm25_qe: BM25 with query expansion
– bm25_prox: BM25 with proximity
– bm25_prox_qe: BM25 with proximity and query
expansion
• PL2
– pl2: Vanilla PL2
– pl2_qe: PL2 with query expansion
– pl2_prox: PL2 with proximity
– pl2_prox_qe: PL2 with proximity and query expansion
• DPH
– dph: Vanilla DPH
– dph_qe: DPH with query expansion
– dph_prox: DPH with proximity
– dph_prox_qe: DPH with proximity and query expansion
The expected results for each of these runs can be found in
Table 2, together with the relative improvements of each configuration
over the vanilla version. On analysing the results in the table, it
is of interest to note that query expansion always improves the
ifnal result considerably, reaching up to 27.90% of improvement
over the vanilla versions. However, the same cannot be said about
the proximity option. While yielding improvements of up to 3.61%
(bm25_prox), it can, sometimes, decrease the performance of the
vanilla version up to 1.70% (pl2_prox). These results are
interesting, since they show that diferent methods can result in diverse
observations across the multiple corpora and query sets.</p>
        <p>When combining both proximity and query expansion, the
results, overall, show improvements over both the vanilla and qe
or proximity alone, with improvements of up to 27.26% over the
vanilla versions. In the cases where proximity decreased the
original results, combining query expansion and proximity search does
not improve the results, as expected. However, the results from
scenarios like the DPH model on Core18 and GOV2 (topics
701750) show that, even if both query expansion and proximity search
are combined, the overall result may not improve over only the
stronger of the two methods (usually, query expansion).</p>
        <p>This again reinforces the knowledge that each dataset is
diferent, and that a practitioner should be aware not to simply stack
methods that provides marginal gains, but instead test multiple
combinations, and understand how each method may behave when
used in combination with others.
2.3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Learning To Rank Hooks</title>
      <p>
        Terrier provides support for learning-to-rank in several manners
the ability to integrate additional features during ranking, including
additional query dependent features without having to re-traverse
the inverted or direct index [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], as well as providing integration of
the Jforests 3 implementation of LambdaMART [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Learning to Rank integration is demonstrated through two hooks,
train and search.
2.3.1 train. This hook extracts features for the training and
validation topics, before calling Jforests to build the learned model.
To aid implementation, train calls the search hook internally to
obtain results for the training and validation sets, specifying the
bm25_ltr_features retrieval configuration. The retrieval features
to use are configurable by specifying the features argument to
the jig.
2.3.2 search. Search also supports generation of the final
learningto-rank run, using the bm25_ltr_jforest retrieval configuration.
This configuration assumes that train has already been called and
hence a Jforests learned model file already exists.</p>
      <sec id="sec-4-1">
        <title>3https://github.com/yasserg/jforests/</title>
        <p>
          In the interact hook, we provide three HTTP-accessible methods
that allow a researcher to interact with the Terrier instance. Two
of these provide access to the results of the search engine, while
the third allows the user to conduct further experiments within a
Jupyter notebook environment, making use of Terrier-Spark [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
Each HTTP server is made available on a separate port, as detailed
below4.
2.4.1 Port 1980: Simple search interface. This provides a user-friendly
simple web presentation of the search results, allowing the user to
enter queries, and receive ranked search results.
2.4.2 Port 1981: REST API. This provides a REST endpoint for
Terrier to provide search results from. This can be used directly, or
can be used by another instance of Terrier to query the index in
a running container (i.e. Terrier can be both a server or a client).
Figure 1 shows an example of using Terrier from the command
line of another machine to access an index hosted within a Docker
container.
2.4.3 Port 1982: Terrier-Spark Jupyter Notebook. Finally, port 1982
starts a Jupyter notebook with Apache Toree’s Scala kernel installed.
This allows use of Terrier-Spark - a Scala interface built on top of
Apache Spark that allows Terrier retrieval experiments to be
conducted, including in a Jupypter notebook [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ]. An example
notebook is provided that allows the user to run more experiments on
the available indices. Functionalities include querying and
evaluating outcomes (as shown in Figure 2), as well as combining Terrier’s
learning-to-rank feature support with Apache Spark’s machine
learning capabilities.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>LESSONS LEARNED</title>
      <p>While developing this work, a number of roadblocks appeared,
prompting new insights and workarounds that ended up improving
the overall reproducibility of the work. Some of these roadblocks,
formulated as questions, are described in this section.
4Note that the ports on the host machine may difer, due to the way that Docker
assigns ports. It is foreseen that this will be resolved in future versions of the OSIRRC
jig - see https://github.com/osirrc/jig/issues/112.
#this starts the REST endpoint on port 1981
[dockerhost]$ cd jig
[dockerhost]$ python run.py interact --repo terrier --tag latest
-----#this demonstrates access to that index from another machine
[anotherhost]$ cd terrier
[anotherhost]$ bin/terrier interactive -I http://dockerhost:1981/
terrier query&gt; information retrieval end:5</p>
      <p>Displaying 1-6 results
0 FBIS4-20699 10.268754805435458
1 FBIS4-20702 9.768490153503198
2 FR941027-2-00046 9.491347902606723
3 FBIS4-20701 9.456022500508775
4 FBIS3-24510 9.31403481019499
5 FBIS4-20700 8.792342494849281</p>
    </sec>
    <sec id="sec-6">
      <title>Do you really have the original version of the corpus?</title>
      <p>We discovered, like several other research labs involved in the
OSIRRC challenge, that TREC Disks 4 &amp; 5 had been originally
compressed using the archaic Unix compress utility, resulting in .z
.1z and .2z filename sufices. Our own copies in Glasgow and Delft
had at sometime been recompressed using more contemporary Gzip
compression (with a resulting .gz filename sufix).</p>
      <p>We made some minor adjustments in Terrier version 5.2 that
allowed decompression of .z files using an Apache Commons package
to be integrated into Terrier on-the-fly.
3.2</p>
    </sec>
    <sec id="sec-7">
      <title>How much memory is in this container?</title>
      <p>Like any Java process, Terrier is limited in the amount of memory
available in the Java Virtual Machine (JVM). We worked hard to
ensure that the JVM is allowed to use as much memory once a
container is running. This allows Terrier potential speed improvements
for both indexing and retrieval.</p>
    </sec>
    <sec id="sec-8">
      <title>Can the classical indexer be more aggressive in using the available memory?</title>
      <p>In OSIRRC, we elected to default to Terrier’s classical indexer, as
this allows more flexibility in the index due to the creation of a
direct index compared to the faster single-pass indexer. However, it
was recognised that the classical indexer had seen less attention in
recent years, and hence could be further optimised. In particular, in
Terrier 5.2, we made changes to the classical indexer to recognise
the available memory, and be more aggressive in its use of that RAM.
In particular, we have observed significant eficiency improvements
when building a block index for GOV2 (an 11% reduction in indexing
time for a Docker host machine with many CPU cores, with larger
benefit observed for less powerful hosts).
4</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS &amp; OUTLOOK</title>
      <p>This paper has described the implementation of the Terrier-Docker
container image within the OSIRRC replicability challenge. This
has been a worthwhile efort that has allowed many IR platforms
and toolkits to be made available within a standardised Docker
environment. We have aimed to provide a range of standard
retrieval configurations that Terrier can provide for the relevant test
collections. Meanwhile, participation in the challenge has allowed
some improvements to the Terrier platform, that will be released
in version 5.2.</p>
      <p>On the other hand, while the Docker image is a step in the
direction of allowing replication of IR experiments, we believe that it
should be combined with a notebook-like environments that
facilitate the scripting of advanced experiments. We have provided one
example Terrier-Spark notebook, which demonstrates the possible
functionality of conducting an IR experiment within a notebook.
However, we acknowledge the overheads of operating in a Spark
environment (both in eficiency and in code complexity). In the future,
we seek better integration of Terrier into a Python environment, to
allow easier scripting of complex retrieval experiments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Giambattista</given-names>
            <surname>Amati</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Probabilistic Models for Information Retrieval based on Divergence from Randomness</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . Department of Computing Science, University of Glasgow.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Giambattista</given-names>
            <surname>Amati</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Frequentist and Bayesian Approach to Information Retrieval</article-title>
          .
          <source>In ECIR (Lecture Notes in Computer Science)</source>
          , Vol.
          <volume>3936</volume>
          . Springer,
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Combining Terrier with Apache Spark to create Agile Experimental Information Retrieval Pipelines</article-title>
          .
          <source>In SIGIR. ACM</source>
          ,
          <volume>1309</volume>
          -
          <fpage>1312</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>Richard McCreadie</surname>
            ,
            <given-names>and Iadh</given-names>
          </string-name>
          <string-name>
            <surname>Ounis</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Agile Information Retrieval Experimentation with Terrier Notebooks</article-title>
          .
          <source>In DESIRES (CEUR Workshop Proceedings)</source>
          , Vol.
          <volume>2167</volume>
          . CEUR-WS.org,
          <volume>54</volume>
          -
          <fpage>61</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>Richard</surname>
            <given-names>McCreadie</given-names>
          </string-name>
          , Rodrygo L. T. Santos, and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>From puppy to maturity: Experiences in developing Terrier</article-title>
          .
          <source>Proc. of OSIR at SIGIR</source>
          (
          <year>2012</year>
          ),
          <fpage>60</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , Rodrygo L. T. Santos, Iadh Ounis, and
          <string-name>
            <given-names>Ben</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>About learning models with multiple query-dependent features</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>31</volume>
          ,
          <issue>3</issue>
          (
          <year>2013</year>
          ),
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jie</given-names>
            <surname>Peng</surname>
          </string-name>
          , Craig Macdonald, Ben He,
          <string-name>
            <surname>Vassilis Plachouras</surname>
            , and
            <given-names>Iadh</given-names>
          </string-name>
          <string-name>
            <surname>Ounis</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Incorporating term dependency in the dfr framework</article-title>
          .
          <source>In SIGIR. ACM</source>
          ,
          <volume>843</volume>
          -
          <fpage>844</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            and
            <given-names>Hugo</given-names>
          </string-name>
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The Probabilistic Relevance Framework: BM25 and Beyond</article-title>
          .
          <source>Foundations and Trends in Information Retrieval 3</source>
          ,
          <issue>4</issue>
          (
          <year>2009</year>
          ),
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Chris J. C.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <surname>Krysta M. Svore</surname>
            , and
            <given-names>Jianfeng</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
          </string-name>
          .
          <year>2008</year>
          . Ranking, Boosting, and
          <string-name>
            <given-names>Model</given-names>
            <surname>Adaptation</surname>
          </string-name>
          .
          <source>Technical Report MSR-TR-2008-109</source>
          . Microsoft.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>