University of Waterloo Docker Images for OSIRRC at SIGIR 2019
                          Ryan Clancy, Zeynep Akkalyoncu Yilmaz, Ze Zhong Wu, and Jimmy Lin
                                                       David R. Cheriton School of Computer Science
                                                                  University of Waterloo

1    OVERVIEW                                                                                  Solrini and Elastirini capabilities are exposed via the interact
The University of Waterloo team submitted a total of four Docker                            hook in the OSIRRC jig. Since both Solr and Elasticsearch are de-
images to the Open-Source IR Replicability Challenge (OSIRRC)                               signed as web apps, the user can trigger the hook and then directly
at SIGIR 2019. This short overview outlines the functionality of                            navigate to a URL to access system capabilities. The batch runs
each image. As the READMEs in all our source repositories provide                           provided by the solrini and elastirini images are exactly the
details on the technical design of our images and the retrieval                             same as the anserini image.
models used in our runs, we intentionally do not duplicate this                             The final image submitted by our group packages Birch, our newest
information here.                                                                           open-source search engine2 that takes advantage of BERT [4] for
   Our primary submission is a packaging of Anserini [11, 12], an                           ad hoc document retrieval:
open-source information retrieval toolkit built around Lucene to                                     https://github.com/osirrc/birch-docker
facilitate replicable research. This anserini-docker image resides                          BERT can be characterized as one instance of a family of deep neu-
at the following URL:                                                                       ral models that make heavy use of pretraining [8, 9]. Application
        https://github.com/osirrc/anserini-docker                                           to many natural language processing tasks, ranging from sentence
The Anserini project grew out of the Open-Source IR Reproducibil-                           classification to sequence labeling, has led to impressive gains on
ity Challenge from 2015 [5] and reflects growing community in-                              standard benchmark datasets. The model has been adapted to pas-
terest in using Lucene for academic IR research [1, 2]. As Lucene                           sage ranking [7] and question answering [13], and Birch can be
was not originally designed as a research toolkit, Anserini aims to                         viewed as a continuation of this thread of research, alongside other
fill in the “missing parts” that allow researchers to run standard                          recent models such as CEDR [6]. The central insight that Birch
ad hoc retrieval experiments “right out of the box”, including com-                         explores, as detailed in Yang et al. [14], is to aggregate sentence-level
petitive baselines and integration hooks for neural ranking models.                         scores to rank documents. This image allows other researchers to
Given Lucene’s tremendous production deployment base (typically                             replicate the results of our paper with the search hook.
via Solr or Elasticsearch), better alignment between research in
engines promises a smoother transition path from the lab to the
"real world" for research innovations.
1 https://projectblacklight.org/                                                            2 https://github.com/castorini/birch
