<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IR: Developing Information Retrieval Evaluation Resources Using Lucene.
SIGIR Forum</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>University of Waterloo Docker Images for OSIRRC at SIGIR 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ryan Clancy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zeynep Akkalyoncu Yilmaz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ze Zhong Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jimmy Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David R. Cheriton School of Computer Science University of Waterloo</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>50</volume>
      <issue>2</issue>
      <fpage>1429</fpage>
      <lpage>1430</lpage>
      <abstract>
        <p>The University of Waterloo team submitted a total of four Docker images to the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This short overview outlines the functionality of each image. As the READMEs in all our source repositories provide details on the technical design of our images and the retrieval models used in our runs, we intentionally do not duplicate this information here. Our primary submission is a packaging of Anserini [11, 12], an open-source information retrieval toolkit built around Lucene to facilitate replicable research. This anserini-docker image resides at the following URL: https://github.com/osirrc/anserini-docker Solrini and Elastirini capabilities are exposed via the interact hook in the OSIRRC jig. Since both Solr and Elasticsearch are designed as web apps, the user can trigger the hook and then directly navigate to a URL to access system capabilities. The batch runs provided by the solrini and elastirini images are exactly the same as the anserini image. The final image submitted by our group packages Birch, our newest open-source search engine2 that takes advantage of BERT [4] for ad hoc document retrieval:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>OVERVIEW
The Anserini project grew out of the Open-Source IR
Reproducibility Challenge from 2015 [5] and reflects growing community
interest in using Lucene for academic IR research [1, 2]. As Lucene
was not originally designed as a research toolkit, Anserini aims to
ifll in the “missing parts” that allow researchers to run standard
ad hoc retrieval experiments “right out of the box”, including
competitive baselines and integration hooks for neural ranking models.
Given Lucene’s tremendous production deployment base (typically
via Solr or Elasticsearch), better alignment between research in
information retrieval and the practice of building real world search
engines promises a smoother transition path from the lab to the
“real world” for research innovations.</p>
      <p>In addition to our main Anserini image, we built two ancillary
images for the OSIRRC exercise:
https://github.com/osirrc/solrini-docker
https://github.com/osirrc/elastirini-docker
In production environments, Lucene is most often used as a core
search library that powers two widely-deployed “full stack” search
applications: Solr and Elasticsearch. With “Solrini” and “Elastirini”,
we have integrated Anserini with Solr and Elasticsearch,
respectively. The integration is such that we can use Anserini as a common
frontend to index into a backend Solr or Elasticsearch instance. This
allows unification of the document processing pipeline
(tokenization, stemming, etc.) to support standard TREC ad hoc experiments,
while allowing users to take advantage of the wealth of capabilities
provided by Solr and Elasticsearch. In the case of Solr, users can
interact with sophisticated searching and faceted browsing interfaces
such as Project Blacklight1 [10], as described in Clancy et al. [3]. In
the case of Elasticsearch, we can gain access to the so-called ELK
stack (Elasticsearch, Logstash, Kibana) to provide a complete data
analytics environment, including slick visualization interfaces.
BERT can be characterized as one instance of a family of deep
neural models that make heavy use of pretraining [8, 9]. Application
to many natural language processing tasks, ranging from sentence
classification to sequence labeling, has led to impressive gains on
standard benchmark datasets. The model has been adapted to
passage ranking [7] and question answering [13], and Birch can be
viewed as a continuation of this thread of research, alongside other
recent models such as CEDR [6]. The central insight that Birch
explores, as detailed in Yang et al. [14], is to aggregate sentence-level
scores to rank documents. This image allows other researchers to
replicate the results of our paper with the search hook.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>