=Paper= {{Paper |id=Vol-2409/docker02 |storemode=property |title=Dockerising Terrier for The Open-Source IR Replicability Challenge (OSIRRC 2019) |pdfUrl=https://ceur-ws.org/Vol-2409/docker02.pdf |volume=Vol-2409 |authors=Arthur Barbosa Câmara,Craig Macdonald |dblpUrl=https://dblp.org/rec/conf/sigir/CamaraM19 }} ==Dockerising Terrier for The Open-Source IR Replicability Challenge (OSIRRC 2019)== https://ceur-ws.org/Vol-2409/docker02.pdf
          Dockerising Terrier for The Open-Source IR Replicability
                         Challenge (OSIRRC 2019)
                                Arthur Câmara                                                                        Craig Macdonald
                       A.BarbosaCamara@tudelft.nl                                                           craig.macdonald@glasgow.ac.uk
                       Delft University of Technology                                                            University of Glasgow
                           Delft, the Netherlands                                                                    Glasgow, UK

ABSTRACT                                                                                    mounts any additional files required into the container image (for
Reproducibility and replicability are key concepts in science, and                          instance, the corpus files for indexing, or the topic files for retrieval).
it is therefore important for information retrieval (IR) platforms                          In the following, we describe the Dockerfile used to build the con-
to aid in reproducing and replicating experiments. In this paper,                           tainer image, and the hooks that we implemented for Terrier within
we describe the creation of a Docker container for Terrier within                           the image.
the framework of the OSIRRC 2019 challenge, which allows typical
runs to be reproduced on TREC Test Collections such as Robust04,                            2.1      Dockerfile
GOV2, Core2018. In doing so, it is hoped that the produced Docker                           The Dockerfile builds a container image with the necessary pre-
image can be of aid to other (re)producing baseline experiments on                          requisites for Terrier. As Terrier is developed in Java, we base the
these test collections. Initiatives like OSIRRC are key in advancing                        Terrier container image on the OpenJDK standard image for Java
these key concepts in the IR area. By making not only the source                            8. A number of other libraries are installed, including Python (for
code available, but also the exact same environment and standardis-                         interacting with the OSIRRC jig); gcompat (standard C libraries for
ing inputs and outputs, it is possible to easily compare approaches                         trec_eval); and the Jupyter pre-requisites. We chose not to install
and thereby improve the quality of the research for Information                             Terrier using the Dockerfile, to maintain a lightweight container
Retrieval.                                                                                  image.

1    OVERVIEW                                                                               2.2      Standard Hooks
 Terrier (Terabyte Retriever) is an information retrieval (IR) toolkit,                     2.2.1 init. This hook is used to prepare the container. We use this
initiated by the University of Glasgow, which has been developed                            hook to download and extract Terrier. We provide example code for
since 2001 [5]. It implements a number of retrieval and indexing                            both downloading a pre-built “tarball” Terrier from the Terrier.org
methods, ready to be used in both research and production.                                  website, or checking out a version from the Github repository. This
   Given its open source nature, Terrier has been used in a number                          hook is configured to use Terrier latest stable version, 5.2. It can,
of papers in the field of IR and others over the years, particularly                        however, be easily configured for fetching other Terrier versions,
using standard IR test collections such as those from the Text RE-                          by changing the variable version in the init script.
trieval Conference (TREC). For this reason, we agreed to join the                              The hook is also compatible with git, making it possible to fetch
OSIRRC 2019 challenge, to create a Docker container image to                                bleeding-edge versions of Terrier directly from the terrier-core
allow standard baseline results to be obtained using Terrier in a                           Github repository1 . This can be controlled by the variable github
manner that can be easily cross-compared with other platforms                               in the same init script. Note that when using the Github version
implementing the OSIRRC design. Moreover, it is hoped that the                              the init hook may take longer to run, since the code will be compiled
produced Docker image can be of aid to others in (re)producing                              manually for your system. In our experiments, this could take up
baseline experiments on these test collections.                                             to 3 extra minutes.
   This paper describes the implementation of the Terrier Docker
                                                                                            2.2.2 index. This hook is used to index the corpus that has been
image. In particular: Section 2 describes the various scripts imple-
                                                                                            mounted by the jig. The jig provides the name of the corpus (see
mented, as well as the obtained retrieval performances; Section 3
                                                                                            Table 1 for supported corpora), as well as the format (trectext,
describes the lessons learned in this implementation; Concluding
                                                                                            trecweb or json). The index jig uses the corpora name to configure
remarks and outlook follow in Section 4.
                                                                                            the indexing process. For example, for the robust04 corpus which
                                                                                            uses TREC Disks 4 & 5, we remove the Congressional Record from
2    TECHNICAL DESIGN
                                                                                            the indexing manifest (the collection.spec file that lists the files
The nature of the OSIRRC challenge is that implementing systems                             Terrier should index) as well READMEs and other unnecessary files,
should provide a Dockerfile that can be used to create a Docker                             and configure an additional decompression plugin. Similarly, for the
container image that can be run on any Docker installation. The                             core18 corpus, we configure Terrier to download2 an additional
container image is required to implement to a number of “hooks” -                           indexing plugin to support parsing of the TREC Washington Post
simply put, an executable at a known location in the image filesys-                         corpus.
tem. These hooks are then called by the OSIRRC jig, which also                              1 https://github.com/terrier-org/terrier-core
                                                                                            2 Indeed, inspired by Apache Spark, since version 5.0, Terrier supports downloading
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). OSIRRC 2019 co-located with SIGIR        additional plugins from MavenCentral, based on the value of the terrier.mvn.coords
2019, 25 July 2019, Paris, France.                                                          configuration property.




                                                                                       26
OSIRRC 2019, July 25, 2019, Paris, France                                                                              Arthur Câmara and Craig Macdonald


                 Name             Description                                         – pl2_prox: PL2 with proximity
               robust04 TREC Disks 4 & 5, minus CR                                    – pl2_prox_qe: PL2 with proximity and query expansion
                 gov2            TREC GOV2                                          • DPH
                core18       TREC Washington Post                                     – dph: Vanilla DPH
                 cw09b       TREC ClueWeb09 part B                                    – dph_qe: DPH with query expansion
                 cw12b       TREC ClueWeb12 part B                                    – dph_prox: DPH with proximity
                    Table 1: Supported corpora.                                       – dph_prox_qe: DPH with proximity and query expansion
                                                                                 The expected results for each of these runs can be found in Ta-
                                                                              ble 2, together with the relative improvements of each configuration
                                                                              over the vanilla version. On analysing the results in the table, it
   Finally, we note that Terrier supports a variety of indexing con-
                                                                              is of interest to note that query expansion always improves the
figurations - we document here our choices and the alternatives
                                                                              final result considerably, reaching up to 27.90% of improvement
available:
                                                                              over the vanilla versions. However, the same cannot be said about
     • Positions: Terrier, by default, does not record positions (also        the proximity option. While yielding improvements of up to 3.61%
       called blocks) in the direct or inverted index. Passing the            (bm25_prox), it can, sometimes, decrease the performance of the
       optional argument of block.indexing to the jig will result             vanilla version up to 1.70% (pl2_prox). These results are interest-
       in positions being saved. This allows proximity weighting              ing, since they show that different methods can result in diverse
       models to be run.                                                      observations across the multiple corpora and query sets.
     • Indexer: Terrier’s “classical” indexer creates a direct index             When combining both proximity and query expansion, the re-
       (also called a forward index) before inversion of the direct           sults, overall, show improvements over both the vanilla and qe
       index to create the inverted index. Indeed, the direct index           or proximity alone, with improvements of up to 27.26% over the
       allows pseudo-relevance feedback query expansion. How-                 vanilla versions. In the cases where proximity decreased the origi-
       ever, the classical indexer is known to be slower than the             nal results, combining query expansion and proximity search does
       alternative “single-pass” indexer that Terrier also provides.          not improve the results, as expected. However, the results from
       If the direct index is not required, the single-pass indexer           scenarios like the DPH model on Core18 and GOV2 (topics 701-
       could be used by passing a -j flag to Terrier during indexing.         750) show that, even if both query expansion and proximity search
     • Fields: Including fields in the index allow the frequency              are combined, the overall result may not improve over only the
       of query terms within different parts of the document (e.g.            stronger of the two methods (usually, query expansion).
       TITLE tags) to be recorded separately. This allows the use of             This again reinforces the knowledge that each dataset is differ-
       field-based weighting models such as BM25F or PL2F, or use             ent, and that a practitioner should be aware not to simply stack
       of fields for features within a learning-to-rank setting.              methods that provides marginal gains, but instead test multiple
     • Stemming & Stopwords: We retain Terrier’s default set-                 combinations, and understand how each method may behave when
       ting of Porter Stemming and standard stopword removal.                 used in combination with others.
       Terrier’s stopword list has 733 words.

2.2.3 search. This hooks makes use of Terrier’s batchretrieve                 2.3      Learning To Rank Hooks
command to execute a ‘run’, i.e. to extract 1000 results for each             Terrier provides support for learning-to-rank in several manners -
of a batch of information needs (topics/queries). The jig mounts              the ability to integrate additional features during ranking, including
the topics into the container image. In addition to learning-to-rank          additional query dependent features without having to re-traverse
(discussed further below), we provide off-the-shelf support for 12            the inverted or direct index [6], as well as providing integration of
retrieval configurations. These are broken down by three orthogo-             the Jforests 3 implementation of LambdaMART [9].
nal components: weighting model (BM25 [8], as well as PL2 [1] and                Learning to Rank integration is demonstrated through two hooks,
DPH [2] from the Divergence from Randomness framework); prox-                 train and search.
imity (pBiL Divergence from Randomness model [7]); and query
                                                                              2.3.1 train. This hook extracts features for the training and val-
expansion (Bo1 Divergence from Randomness model [1]).
                                                                              idation topics, before calling Jforests to build the learned model.
   The combination of these three components yields the following
                                                                              To aid implementation, train calls the search hook internally to
possible configurations, to be passed to the hook using the --opts
                                                                              obtain results for the training and validation sets, specifying the
config= parameter:
                                                                              bm25_ltr_features retrieval configuration. The retrieval features
     • BM25                                                                   to use are configurable by specifying the features argument to
       – bm25: Vanilla BM25                                                   the jig.
       – bm25_qe: BM25 with query expansion
       – bm25_prox: BM25 with proximity                                       2.3.2 search. Search also supports generation of the final learning-
       – bm25_prox_qe: BM25 with proximity and query expan-                   to-rank run, using the bm25_ltr_jforest retrieval configuration.
          sion                                                                This configuration assumes that train has already been called and
     • PL2                                                                    hence a Jforests learned model file already exists.
       – pl2: Vanilla PL2
       – pl2_qe: PL2 with query expansion                                     3 https://github.com/yasserg/jforests/




                                                                         27
Dockerising Terrier for The Open-Source IR Replicability Challenge (OSIRRC 2019)                                             OSIRRC 2019, July 25, 2019, Paris, France


      Table 2: Expected performance per method and corpus. The best result for each corpus and query set is emphasised.

       Method                                Robust04                   Core18                                              GOV2
                                                                                                  701-750              751-800                801-850
                   Vanilla                   0.2363                     0.2326                    0.2461               0.3081                 0.2629
                   +QE                       0.2762 (+16.89%)           0.2975 (+27.90%)          0.2621 (+6.50%)      0.3506 (+13.79%)       0.3118 (+18.60%)
       BM25
                   +Proximity                0.2404 (+1.74%)            0.2369 (+1.85%)           0.2537 (+3.09%)      0.3126 (+1.46%)        0.2724 (+3.61%)
                   +QE +Proximity            0.2781 (+17.69%)           0.2960 (+27.26%)          0.2715 (+10.32%)     0.3507 (+13.83)        0.3085 (+17.34%)
                   Vanilla                   0.2241                     0.2225                    0.2334               0.2884                 0.2363
                   +QE                       0.2538 (+13.25%)           0.2787 (+25.26%)          0.2478 (+6.17%)      0.3160 (+9.57%)        0.2739 (+15.91%)
         PL2
                   +Proximity                0.2283 (+1.87%)            0.2248 (+1.03%)           0.2347 (+0.056%)     0.2835 (-1.70%)        0.2361 (-0.08%)
                   +QE +Proximity            0.2575 (+14.90%)           0.2821 (+26.79%)          0.2455 (+5.18%)      0.3095 (+7.32%)        0.2628 (+11.21%)
                   Vanilla                   0.2479                     0.2427                    0.2804               0.3311                 0.2917
                   +QE                       0.2821 (+13.80%)           0.3055 (+25.88%)          0.3120 (+11.27%)     0.3754 (+13.38%)       0.3439 (+17.90%)
        DPH
                   +Proximity                0.2501 (+0.89%)            0.2428 (+0.04%)           0.2834 (+1.07%)      0.3255 (-1.69%)        0.2904 (-0.45%)
                   +QE +Proximity            0.2869 (+15.73%)           0.3035 (+25.05%)          0.3064 (+9.27%)      0.3095 (-6.52%)        0.3288 (+12.72)


2.4      Interaction                                                                             #this starts the REST endpoint on port 1981
                                                                                                 [dockerhost]$ cd jig
In the interact hook, we provide three HTTP-accessible methods                                   [dockerhost]$ python run.py interact --repo terrier --tag latest
that allow a researcher to interact with the Terrier instance. Two                               ------
of these provide access to the results of the search engine, while                               #this demonstrates access to that index from another machine
the third allows the user to conduct further experiments within a                                [anotherhost]$ cd terrier
Jupyter notebook environment, making use of Terrier-Spark [3].                                   [anotherhost]$ bin/terrier interactive -I http://dockerhost:1981/
Each HTTP server is made available on a separate port, as detailed                               terrier query> information retrieval end:5
below4 .                                                                                           Displaying 1-6 results
                                                                                                 0 FBIS4-20699 10.268754805435458
2.4.1 Port 1980: Simple search interface. This provides a user-friendly                          1 FBIS4-20702 9.768490153503198
simple web presentation of the search results, allowing the user to                              2 FR941027-2-00046 9.491347902606723
enter queries, and receive ranked search results.                                                3 FBIS4-20701 9.456022500508775
                                                                                                 4 FBIS3-24510 9.31403481019499
2.4.2 Port 1981: REST API. This provides a REST endpoint for                                     5 FBIS4-20700 8.792342494849281
Terrier to provide search results from. This can be used directly, or
can be used by another instance of Terrier to query the index in                                 Figure 1: Accessing an index hosted on the Terrier Docker
a running container (i.e. Terrier can be both a server or a client).                             container via the Terrier REST API.
Figure 1 shows an example of using Terrier from the command
line of another machine to access an index hosted within a Docker
container.
2.4.3 Port 1982: Terrier-Spark Jupyter Notebook. Finally, port 1982                              3.1   Do you really have the original version of
starts a Jupyter notebook with Apache Toree’s Scala kernel installed.
                                                                                                       the corpus?
This allows use of Terrier-Spark - a Scala interface built on top of
Apache Spark that allows Terrier retrieval experiments to be con-                                We discovered, like several other research labs involved in the
ducted, including in a Jupypter notebook [3, 4]. An example note-                                OSIRRC challenge, that TREC Disks 4 & 5 had been originally
book is provided that allows the user to run more experiments on                                 compressed using the archaic Unix compress utility, resulting in .z
the available indices. Functionalities include querying and evaluat-                             .1z and .2z filename suffices. Our own copies in Glasgow and Delft
ing outcomes (as shown in Figure 2), as well as combining Terrier’s                              had at sometime been recompressed using more contemporary Gzip
learning-to-rank feature support with Apache Spark’s machine                                     compression (with a resulting .gz filename suffix).
learning capabilities.                                                                              We made some minor adjustments in Terrier version 5.2 that al-
                                                                                                 lowed decompression of .z files using an Apache Commons package
3     LESSONS LEARNED                                                                            to be integrated into Terrier on-the-fly.
While developing this work, a number of roadblocks appeared,
prompting new insights and workarounds that ended up improving                                   3.2   How much memory is in this container?
the overall reproducibility of the work. Some of these roadblocks,                               Like any Java process, Terrier is limited in the amount of memory
formulated as questions, are described in this section.                                          available in the Java Virtual Machine (JVM). We worked hard to
4 Note that the ports on the host machine may differ, due to the way that Docker
                                                                                                 ensure that the JVM is allowed to use as much memory once a con-
assigns ports. It is foreseen that this will be resolved in future versions of the OSIRRC        tainer is running. This allows Terrier potential speed improvements
jig - see https://github.com/osirrc/jig/issues/112.                                              for both indexing and retrieval.




                                                                                            28
OSIRRC 2019, July 25, 2019, Paris, France                                                                                 Arthur Câmara and Craig Macdonald




                                            Figure 2: An example of evaluating a run from Terrier-Spark


3.3     Can the classical indexer be more                                    environment. We have aimed to provide a range of standard re-
        aggressive in using the available memory?                            trieval configurations that Terrier can provide for the relevant test
                                                                             collections. Meanwhile, participation in the challenge has allowed
In OSIRRC, we elected to default to Terrier’s classical indexer, as
                                                                             some improvements to the Terrier platform, that will be released
this allows more flexibility in the index due to the creation of a
                                                                             in version 5.2.
direct index compared to the faster single-pass indexer. However, it
                                                                                On the other hand, while the Docker image is a step in the di-
was recognised that the classical indexer had seen less attention in
                                                                             rection of allowing replication of IR experiments, we believe that it
recent years, and hence could be further optimised. In particular, in
                                                                             should be combined with a notebook-like environments that facili-
Terrier 5.2, we made changes to the classical indexer to recognise
                                                                             tate the scripting of advanced experiments. We have provided one
the available memory, and be more aggressive in its use of that RAM.
                                                                             example Terrier-Spark notebook, which demonstrates the possible
In particular, we have observed significant efficiency improvements
                                                                             functionality of conducting an IR experiment within a notebook.
when building a block index for GOV2 (an 11% reduction in indexing
                                                                             However, we acknowledge the overheads of operating in a Spark en-
time for a Docker host machine with many CPU cores, with larger
                                                                             vironment (both in efficiency and in code complexity). In the future,
benefit observed for less powerful hosts).
                                                                             we seek better integration of Terrier into a Python environment, to
                                                                             allow easier scripting of complex retrieval experiments.
4     CONCLUSIONS & OUTLOOK
This paper has described the implementation of the Terrier-Docker
container image within the OSIRRC replicability challenge. This
                                                                             REFERENCES
                                                                             [1] Giambattista Amati. 2003. Probabilistic Models for Information Retrieval based
has been a worthwhile effort that has allowed many IR platforms                  on Divergence from Randomness. Ph.D. Dissertation. Department of Computing
and toolkits to be made available within a standardised Docker                   Science, University of Glasgow.




                                                                        29
Dockerising Terrier for The Open-Source IR Replicability Challenge (OSIRRC 2019)                                                   OSIRRC 2019, July 25, 2019, Paris, France


[2] Giambattista Amati. 2006. Frequentist and Bayesian Approach to Information                 [6] Craig Macdonald, Rodrygo L. T. Santos, Iadh Ounis, and Ben He. 2013. About
    Retrieval. In ECIR (Lecture Notes in Computer Science), Vol. 3936. Springer, 13–24.            learning models with multiple query-dependent features. ACM Trans. Inf. Syst. 31,
[3] Craig Macdonald. 2018. Combining Terrier with Apache Spark to create Agile                     3 (2013), 11.
    Experimental Information Retrieval Pipelines. In SIGIR. ACM, 1309–1312.                    [7] Jie Peng, Craig Macdonald, Ben He, Vassilis Plachouras, and Iadh Ounis. 2007.
[4] Craig Macdonald, Richard McCreadie, and Iadh Ounis. 2018. Agile Information                    Incorporating term dependency in the dfr framework. In SIGIR. ACM, 843–844.
    Retrieval Experimentation with Terrier Notebooks. In DESIRES (CEUR Workshop                [8] Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance
    Proceedings), Vol. 2167. CEUR-WS.org, 54–61.                                                   Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval 3,
[5] Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012.                4 (2009), 333–389.
    From puppy to maturity: Experiences in developing Terrier. Proc. of OSIR at SIGIR          [9] Qiang Wu, Chris J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2008. Ranking,
    (2012), 60–63.                                                                                 Boosting, and Model Adaptation. Technical Report MSR-TR-2008-109. Microsoft.




                                                                                          30