ielab at the Open-Source IR Replicability Challenge 2019
                                Harrisen Scells                                                                         Guido Zuccon
                       The University of Queensland                                                           The University of Queensland
                            Brisbane, Australia                                                                    Brisbane, Australia
                            h.scells@uq.net.au                                                                    g.zuccon@uq.edu.au

ABSTRACT                                                                                    formats such as XML, JSON, WARC or plain text – or even purpose-
We present our participating Docker image in the SIGIR Open-                                based formats, e.g. the TREC document format. Due to this, the
Source IR Replicability Challenge 2019. Our participation includes                          second contribution of our participation in OSIRRC 2019 is a tool
an Elasticsearch-based Docker image conforming to the specifi-                              for processing IR test collection file formats into Elasticsearch bulk
cations laid out by the challenge, a collection parser, and a topic                         indexing operations. Finally, because Elasticsearch is a commercial,
parser. Our Docker image is capable of indexing and searching sev-                          off-the-shelf search engine, not built for research purposes, it does
eral commonly used Information Retrieval test collections including                         not directly support issuing TREC topics as queries. Due to this, the
robust04 (Trec 2004 Robust Track Topics – disks 4/5), core17 (TREC                          third contribution of our participation in OSIRRC 2019 is a tool for
New York Times Annotated Corpus), and core18 (TREC Washington                               directly issuing TREC topic files to Elasticsearch. These two tools
Post Corpus). The Docker image and associated tools we present                              are described in more detail in the following sections.
in this paper provide a solid foundation for replicable Information                            The OSIRRC 2019 uses a jig to issue commands to hooks within
Retrieval experiments using Elasticsearch.                                                  participating Docker images. We implement the main hooks of the
                                                                                            jig, described in Section 2. We also report the arguments we pass
KEYWORDS                                                                                    to the jig in order for it to use the implemented hooks in Section 3.
                                                                                            Finally, we report baseline results from indexing and searching the
Systematic Reviews, Query Formulation, Boolean Queries
                                                                                            three collections compatible with our Docker image in Section 4.
                                                                                               Like other participants in the challenge, our Dockerfile, asso-
                                                                                            ciated tools, and documentation is made available on GitHub at
1    INTRODUCTION                                                                           https://github.com/osirrc/ielab-docker.
The main contribution of our participation in the Open-Source IR                            2     HOOK IMPLEMENTATIONS
Replicability Challenge 2019 (OSIRRC 2019) is an Elasticsearch-
based Docker image. Our Docker image provides an environment                                The OSIRRC 2019 uses a jig3 to communicate with the Docker image
for performing replicable Information Retrieval (IR) experiments                            from participating teams. The jig supports five different hooks that
using Elasticsearch. However, Elasticsearch is not designed to be                           a participating Docker image can implement. Our Docker image
used in academic or research settings, e.g., by default, it cannot                          supports the init, index, and search hooks. Our implementation
output TREC run files, or issue TREC topics as queries; it is diffi-                        of these hooks are described in the following subsections. We also
cult to configure fine-grain aspects of retrieval models, and it is                         examine the unimplemented hooks and how these would work if
difficult to extend in terms of additional retrieval models. These                          implemented in future work.
are all challenges which have been identified in initiatives such as
the Elastic4ir1 presentation as part of the SIGIR LIARR 2017 Work-                          2.1      init
shop [2] and the IR In Practice session at AFIRM 20192 . Despite                            The init hook is used to prepare a participating Docker image prior
these limitations, we were able to package Elasticsearch conforming                         to indexing. We found, however that using this hook significantly
to the majority of the specifications in this challenge.                                    slowed development time, and instead decided to include all of
   The contributed Docker image is capable of indexing and search-                          our setup steps in our Dockerfile. When using a Dockerfile, listed
ing three test collections: robust04 — Trec 2004 Robust Track Topics                        commands can be cached between image builds. When compiling
— disks 4/5, core17 — TREC New York Times Annotated Corpus,                                 code, downloading external artefacts, and copying large files, these
and core18 — TREC Washington Post Corpus. We also make two                                  processes take time away from development when in the init
other contributions which support indexing and searching using                              hook.
Elasticsearch. One of the main selling points of Elasticsearch is                              Our Dockerfile uses a multi-stage build to reduce the total size
the ability to be schema-less (i.e., automatic mapping of document                          of the image, counteracting what would be a downside to using it.
fields and data types to the underlying Lucene index). As a conse-                          In the first stage, development libraries are installed, Elasticsearch
quence of this, documents in Elasticsearch are represented as (and                          is downloaded and decompressed (our participating Docker image
indexed as) JSON objects. This presents a challenge for indexing                            uses Elasticsearch 7.0.0, which was the latest version at the time de-
standard IR test collections, as they commonly use a variety of file                        velopment of the image began), and our command-line applications
                                                                                            for parsing and searching are compiled. In the following stage, only
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons        the necessary runtime applications are installed (e.g., Python), and
License Attribution 4.0 International (CC BY 4.0). OSIRRC 2019 co-located with SIGIR
2019, 25 July 2019, Paris, France.                                                          the artefacts from the previous stage are copied.
1 https://github.com/ielab/elastic4IR
2 http://ielab.io/tutorials/ir-in-practice.html                                             3 https://github.com/osirrc/jig


                                                                                       57
OSIRRC 2019, July 25, 2019, Paris, France                                                                                           Harrisen Scells and Guido Zuccon


2.2     index                                                                      <DOC>
                                                                                   <DOCNO> FBIS3-61525 </DOCNO>
The index hook is used to index one or more collections. There                     <HT>      "jpust004___94105" </HT>
are several steps before a collection can be indexed in this phase.
Firstly, depending on the collection, files must be decompressed or                <HEADER>
removed (e.g., removing README files, changelogs, tools). Next,                    <AU>   JPRS-UST-94-004 </AU>
Elasticsearch must be started. Because Elasticsearch is designed to                Document Type:JPRS
                                                                                   Document Title:Science &amp; Technology
be run as a service, it must be started in the background. Our index
hook then waits until the Elasticsearch web service is reachable (by               </HEADER>
repeatedly making HTTP requests), and then waits until the cluster                 <ABS> Central Eurasia </ABS>
health level of Elasticsearch is at least yellow (i.e., all shards have            <DATE1>   26 January 1994 </DATE1>
been allocated). API requests to Elasticsearch will fail if the web                <F P=100> LIFE SCIENCES </F>
                                                                                   <F P=101> MEDICINE AND PUBLIC HEALTH </F>
service is available but the cluster health is red (i.e., there are shards         <H3> <TI>   Environmental Monitoring System of Natural Water
still waiting to be allocated or there is a problem with Elasticsearch).               Conditions </TI></H3>
This was a major pain point during development of our system and                   <F P=102> 947C0156B Kiev GIDROBIOLOGICHESKIY ZHURNAL in Russian
                                                                                   Vol 29 No 3, May-Jun 93 pp 88-95 </F>
lead to many unexpected issues.
    At this stage, Elasticsearch is available for indexing documents.              <F P=103> 947C0156B </F>
                                                                                   <F P=104> Kiev GIDROBIOLOGICHESKIY ZHURNAL </F>
Documents in the collection path (provided by the jig) are trans-
formed by our TREC collection parser 4 and indexed in bulk. When
indexing different collections, we do not attempt any document                     <TEXT>
                                                                                   Language: <F P=105> Russian </F>
cleaning or pre-processing (e.g., stemming). We generally index                    Article Type:CSO
titles into one field, then the content of the document from all other
fields (e.g., paragraphs, sub-headings) under the content part of                  <F P=106> [Article by S.V. Kreneva, Azov Scientific
                                                                                       Research Institute </F>
the document (e.g., TEXT/body/contents, etc.) into a single text                   of Fishery Industry, Rostov na Donu; UDC [593.17:574.63](08)(28)]
field. Examples of how collection documents are transformed into a                 [Abstract] The widening gap between the rates at which new
                                                                                   contaminants appear the maximum...
representation suitable to be indexed in Elasticsearch are described
in detail in the following sections.                                               </TEXT>

2.2.1 robust04. Documents in this collection are in a XML-like                     </DOC>
format (in fact we parse them using a generic XML parser). One
aspect of this file format that caused the XML parser to fail are the                        (a) Contents of robust04 document FBIS3-61525.
unquoted attributes on elements (e.g., <F P=100>, the 100 should
be wrapped in quotation marks to be valid XML). To overcome this,                  {
                                                                                        "DocNo": "FBIS3-61525",
we simply removed all attributes from every element. When trans-                        "Text": "Language: Russian Article Type:CSO [Article
forming the files in the robust04 collection, we index everything in                             by S.V. Kreneva, Azov Scientific Research
the <TEXT> tag as a single string, ignoring structure. An example                                Institute of Fishery Industry, Rostov na Donu;
                                                                                                 UDC [593.17:574.63](08)(28)]   [Abstract] The
of this transformation is presented in Figure 1.                                                 widening gap between the rates at which new
                                                                                                 contaminants appear the maximum..."
2.2.2 core17. The documents for core17 are in a standardised XML                   }
format. We take a similar approach to indexing these documents to
the robust04 documents. Unlike the robust04 documents, there is a                 (b) Elasticsearch representation of robust04 document FBIS3-61525.
clear title in the header of the core17 documents which we extract.
We then proceed to index everything between the <body> tags as                    Figure 1: Example transformation of a document in robust04
a single string, ignoring structure. An example core17 document                   into a representation suitable to be indexed in Elasticsearch.
and what we transform it into as a JSON representation suitable
for indexing in Elasticsearch is presented in Figure 2.                           this issue, we simply cast all values to a string for this field before
                                                                                  indexing.
2.2.3 core18. The documents in the core18 collection are already
in a format suitable for Elasticsearch to index (i.e., they come as a
                                                                                  2.3       search
JSON object). One challenge we faced here, however is that the main
content of these documents is stored as an array of individual JSON               The search hook is used to perform ad-hoc searches on a previously
objects, all with the same field name (content), but with different               indexed collection. This hook uses TREC topics as queries, and as
data-types for the values. This issue is illustrated in Figure 3. This is         such, these topic files for a collection must be parsed. Our topic
a problem because Elasticsearch infers the types of values to create              parsing and searching tool 5 reads a topic file, issues the title of
a mapping based on the first document indexed. When Elasticsearch                 the topic as an Elasticsearch “query string” query, and parses the
encounters a value that does not conform to the data-type specified               Elasticsearch API response into a TREC run file to be evaluated
by the mapping, it will fail to index the document. To overcome                   automatically by the jig.

4 https://github.com/osirrc/ielab-docker/tree/master/cparser                      5 https://github.com/osirrc/ielab-docker/tree/master/tsearcher


                                                                             58
ielab at the Open-Source IR Replicability Challenge 2019                                                  OSIRRC 2019, July 25, 2019, Paris, France


 <?xml version="1.0" encoding="UTF-8"?>                                      {
 <!DOCTYPE nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.3...">                    "content": "Homicides remain steady in the Washington region",
 <nitf change.date="June 10, 2005" change.time="19:30">                           "mime": "text/plain",
 <head>                                                                           "type": "title"
 <title>ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY</title>                }
 <meta content="23" name="publication_day_of_month"/>
 <meta content="1" name="publication_month"/>
 <meta content="1987" name="publication_year"/>
                                                                                             (a) Data-type of content is a string.
 <meta content="Friday" name="publication_day_of_week"/>
 <meta content="Weekend Desk" name="dsk"/>                                   {
 <meta content="20" name="print_page_number"/>                                    "content": 1483230909000,
 <meta content="C" name="print_section"/>                                         "mime": "text/plain",
 <meta content="3" name="print_column"/>                                          "type": "date"
 <meta content="Arts" name="online_sections"/>                               }
 <docdata>
 <doc-id id-string="6135"/>
 <doc.copyright holder="The New York Times" year="1987"/>                                   (b) Data-type of content is an integer.
 <identified-content>
 <classifier class="indexing_service">ART</classifier>
 <classifier class="indexing_service">REVIEWS</classifier>                  Figure 3: Example of JSON objects in the content section in a
 <classifier class="indexing_service">ART SHOWS</classifier>                core18 document which contain different data-types for the
 <org class="indexing_service">WHITNEY MUSEUM OF AMERICAN...</org>
 <person class="indexing_service">SMITH, ROBERTA</person>
                                                                            same field (content).
 <person class="indexing_service">SALLE, DAVID</person>
 <classifier class="online_producer">Review</classifier>                     python3 run.py prepare \
 <classifier class="online_producer">Top/Features/Arts</classifier>            --repo osirrc2019/ielab \
 </identified-content>                                                         --collections robust04=/path/to/disk45=trectext
 </docdata>
 </head>
 <body>                                                                                                  (a) robust04
 <body.head>
 <hedline>                                                                   python3 run.py prepare \
 <hl1>ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY</hl1>                      --repo osirrc2019/ielab \
 </hedline>                                                                    --collections core17=/path/to/NYTcorpus=nyt
 <byline class="print_byline">By ROBERTA SMITH</byline>
 <byline class="normalized_byline">SMITH, ROBERTA</byline>                                                (b) core17
 </body.head>
 <body.content>                                                              python3 run.py prepare \
 <block class="lead_paragraph">                                                --repo osirrc2019/ielab \
 <p>LEAD: THE exhibition of David Salle's work that has just ...</p>           --collections core18=/path/to/WashingtonPost.v2=wp
 </block>
 <block class="full_text">                                                                                (c) core18
 <p>For American art of the 1980's, Salle's painting stands...</p>
 </block>
 </body.content>                                                            Figure 4: Prepare commands issued to the jig used to index
 </body>
 </nitf>
                                                                            each collection in our Elasticsearch Docker image.

                 (a) Contents of core17 document 6135.
 {
      "DocNo": "6135",                                                         We are also aware of optional parameters in the jig that can be
      "Title": "ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY",             used to pass configuration items into the Docker image. If we were
      "Text": "ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY By
              ROBERTA SMITH SMITH, ROBERTA LEAD: THE exhibition
                                                                            to exploit these optional parameters in future work, we would use
              of David Salle's work that has just...For American            them to specify aspects of Elasticsearch such as sharding settings
              art of the 1980's, Salle's painting stands..."                and document pre-processing using the index hook, or the fields
 }
                                                                            to consider when scoring documents or the retrieval model using
      (b) Elasticsearch representation of core17 document 6135.
                                                                            the search hook.

Figure 2: Example transformation of a document in core17                    3     JIG ARGUMENTS
into a representation suitable to be indexed in Elasticsearch.              Here we provide the arguments passed to the jig in order to index
                                                                            and search the three collections compatible with our Docker image.
2.4     Unimplemented Hooks                                                 The following two sections describe the two jig commands and the
                                                                            arguments we pass to them to obtain our baseline results.
The two unimplemented hooks are train and interact. As future
work, if the train hook were to be implemented, we would use it
to optimise retrieval function parameters (e.g., the K1 and B param-        3.1     prepare
eters of BM25, similar to work by Jimmy et al. [4]). Furthermore,           The prepare jig command prepares a Docker image for performing
if the interact hook were to be implemented, we would use it                experiments. This command calls the init and index hooks of the
to expose the REST API of Elasticsearch. This would allow one to            Docker image. Figure 4 presents the arguments passed to the jig in
connect other tools such as Kibana, and explore the data further.           order to prepare our Elasticsearch Docker image to run experiments.


                                                                       59
OSIRRC 2019, July 25, 2019, Paris, France                                                                                          Harrisen Scells and Guido Zuccon


 python run.py search \                                                          The jig of the OSIRRC 2019 introduced a standardised framework
   --repo osirrc2019/ielab \
   --output out/ielab \
                                                                              for anyone to implement hooks to a search system, to allow for
   --qrels qrels/qrels.robust04.txt \                                         replicability. However, due to the lack of standardisation of the
   --topic topics/topics.robust04.txt \                                       fine-grain parametrisation of the systems, it becomes difficult to
   --collection robust04
                                                                              control systems comparisons. What the jig introduced is a way to
                                  (a) robust04
                                                                              perform experiments, however, it lacked a standard way to config-
                                                                              ure experiments. Many participants have exploited the optional
 python run.py search \
   --repo osirrc2019/ielab \                                                  arguments for indexing and searching, however the problem now is
   --output out/ielab \                                                       that each participant uses different syntax and naming conventions
   --qrels qrels/qrels.core17.txt \                                           for standard parameters (e.g., which retrieval model to use, whether
   --topic topics/topics.core17.txt \
   --collection core17                                                        to perform stemming or not, etc.). We suggest that if this challenge
                                                                              were to be run again, another hook be added which defines the
                                   (b) core17                                 capabilities of a Docker image, which can include official, standard
 python run.py search \                                                       capabilities, such as the name and parameters of the retrieval model,
   --repo osirrc2019/ielab \                                                  what document pre-processing steps to take, or what fields to index
   --output out/ielab \
   --qrels qrels/qrels.core18.txt \
                                                                              (which generally apply to all search systems), as well as self-defined
   --topic topics/topics.core18.txt \                                         capabilities such as the compression algorithm to use (which may
   --collection core18                                                        only apply to a subset of systems). Currently, while it is limiting to
                                                                              directly compare the evaluation results of our image to the images
                                   (c) core18                                 of other participants due to the reasons listed above, it becomes
                                                                              possible for others working with Elasticsearch to easily compare
Figure 5: Search commands issued to the jig used to search                    their system to this baseline. The current Docker images developed
each collection in our Elasticsearch Docker image.                            as part of this initiative only allow runs to be replicable and not
                                                                              comparable. By listing the capabilities of systems, it would allow
      Run                                    MAP     P@30     nDCG@20         one to fairly compare between two or more systems that share
                                                                              capabilities. Although system comparison was not the goal of this
      robust04 (BM25, k=1000)               0.1826   0.3236     0.3477
                                                                              challenge, it could easily be facilitated by adopting solutions similar
      core17 (BM25, k=1000)                 0.0831   0.2333     0.1779
                                                                              to that described above.
      core18 (BM25, k=1000)                 0.1899   0.2760     0.3081
                                                                                 The OSIRRC 2019 initiative was valuable from both the per-
Table 1: Results of runs for the robust04, core17, and core18                 spective of implementing the jig hooks, and from comparing how
collections using our Elasticsearch Docker image. Each run                    others chose to implement them in their hooks. Despite the fact
uses the default Elasticsearch BM25 scorer for ranking, and                   that Elasticsearch is primarily a commercial/production product,
each run retrieves a maximum of 1,000 documents (k).                          and not a research oriented tool, there are several published aca-
3.2     search                                                                demic research papers that use it, e.g., those from our ielab research
                                                                              team6 [4–6], and other groups [1, 3, 7]. This may be in part due
The search command issues queries from TREC topic files and                   to the ease with which one can index and search documents in
outputs a TREC run file which is evaluated by the jig. This command           Elasticsearch (very little to no configuration is necessary, provided
calls the search hook of the Docker image. Figure 5 presents the              an adequate parser is available). We hope that our contribution
arguments passed to the jig in order to run the search experiments            to and participation in this challenge leads to further replicabil-
on the collections compatible with our Elasticsearch Docker image.            ity and reproducibility of Elasticsearch and other search systems,
                                                                              and that others can more easily compare their own Elasticsearch
4     RESULTS                                                                 experiments to this common baseline.
Table 1 presents the evaluation results for each collection indexed
using our Elasticsearch Docker image. The evaluation measures                 REFERENCES
reported in this table are chosen based on the measures reported              [1] Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, and Claudio
by the jig. Although all participants report the same evaluation                  Gennaro. 2018. Large-Scale Image Retrieval with Elasticsearch. In The 41st In-
measures it is important to note that one cannot fairly compare                   ternational ACM SIGIR Conference on Research &#38; Development in Information
                                                                                  Retrieval (SIGIR ’18). ACM, New York, NY, USA, 925–928.
between systems and these results can only be used to demonstrate             [2] Leif Azzopardi, Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Mosh-
the replicability of our system. We elaborate on this detail in the               feghi, Harrisen Scells, Peilin Yang, and Guido Zuccon. 2017. The Lucene for
                                                                                  information access and retrieval research (LIARR) workshop at SIGIR 2017. (2017).
following section, and present a possible solution.                           [3] Rafael Glater, Rodrygo L.T. Santos, and Nivio Ziviani. 2017. Intent-Aware Semantic
                                                                                  Query Annotation. In Proceedings of the 40th International ACM SIGIR Conference
5     DISCUSSION & CONCLUSIONS                                                    on Research and Development in Information Retrieval (SIGIR ’17). ACM, New York,
                                                                                  NY, USA, 485–494.
Our contribution at the OSIRRC 2019 is an Elasticsearch Docker                [4] Jimmy, Guido Zuccon, and Bevan Koopman. 2016. Boosting titles does not gener-
image capable of indexing and searching the robust04, core17, and                 ally improve retrieval effectiveness. In Proceedings of the 21st Australasian document
                                                                                  computing symposium. 25–32.
core18 collections. While we had many difficulties getting Elastic-
search to work in this setup, and correctly parsing and indexing
documents, we were able to implement the main hooks for the jig.              6 http://ielab.io/


                                                                         60
ielab at the Open-Source IR Replicability Challenge 2019                                                                            OSIRRC 2019, July 25, 2019, Paris, France


[5] Daniel Locke, Guido Zuccon, and Harrisen Scells. 2017. Automatic Query Gen-                       26th ACM International Conference on Information and Knowledge Management.
    eration from Legal Texts for Case Law Retrieval. In Asia Information Retrieval                    2291–2294.
    Symposium. Springer, 181–193.                                                                 [7] Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, and Ophir Frieder.
[6] Harrisen Scells, Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi,                     2015. Retrieving medical literature for clinical decision support. In European
    and Shlomo Geva. 2017. Integrating the framing of clinical questions via PICO|                    conference on information retrieval. Springer, 538–549.
    into the retrieval of medical literature for systematic reviews. In Proceedings of the


                                                                                             61