<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ielab at the Open-Source IR Replicability Challenge 2019</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harrisen Scells</string-name>
          <email>h.scells@uq.net.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Zuccon</string-name>
          <email>g.zuccon@uq.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Queensland</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>25</volume>
      <issue>2019</issue>
      <fpage>57</fpage>
      <lpage>61</lpage>
      <abstract>
        <p>We present our participating Docker image in the SIGIR OpenSource IR Replicability Challenge 2019. Our participation includes an Elasticsearch-based Docker image conforming to the specifications laid out by the challenge, a collection parser, and a topic parser. Our Docker image is capable of indexing and searching several commonly used Information Retrieval test collections including robust04 (Trec 2004 Robust Track Topics - disks 4/5), core17 (TREC New York Times Annotated Corpus), and core18 (TREC Washington Post Corpus). The Docker image and associated tools we present in this paper provide a solid foundation for replicable Information Retrieval experiments using Elasticsearch.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The main contribution of our participation in the Open-Source IR
Replicability Challenge 2019 (OSIRRC 2019) is an
Elasticsearchbased Docker image. Our Docker image provides an environment
for performing replicable Information Retrieval (IR) experiments
using Elasticsearch. However, Elasticsearch is not designed to be
used in academic or research settings, e.g., by default, it cannot
output TREC run files, or issue TREC topics as queries; it is
dificult to configure fine-grain aspects of retrieval models, and it is
dificult to extend in terms of additional retrieval models. These
are all challenges which have been identified in initiatives such as
the Elastic4ir1 presentation as part of the SIGIR LIARR 2017
Workshop [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the IR In Practice session at AFIRM 20192. Despite
these limitations, we were able to package Elasticsearch conforming
to the majority of the specifications in this challenge.
      </p>
      <p>The contributed Docker image is capable of indexing and
searching three test collections: robust04 — Trec 2004 Robust Track Topics
— disks 4/5, core17 — TREC New York Times Annotated Corpus,
and core18 — TREC Washington Post Corpus. We also make two
other contributions which support indexing and searching using
Elasticsearch. One of the main selling points of Elasticsearch is
the ability to be schema-less (i.e., automatic mapping of document
ifelds and data types to the underlying Lucene index). As a
consequence of this, documents in Elasticsearch are represented as (and
indexed as) JSON objects. This presents a challenge for indexing
standard IR test collections, as they commonly use a variety of file
formats such as XML, JSON, WARC or plain text – or even
purposebased formats, e.g. the TREC document format. Due to this, the
second contribution of our participation in OSIRRC 2019 is a tool
for processing IR test collection file formats into Elasticsearch bulk
indexing operations. Finally, because Elasticsearch is a commercial,
of-the-shelf search engine, not built for research purposes, it does
not directly support issuing TREC topics as queries. Due to this, the
third contribution of our participation in OSIRRC 2019 is a tool for
directly issuing TREC topic files to Elasticsearch. These two tools
are described in more detail in the following sections.</p>
      <p>The OSIRRC 2019 uses a jig to issue commands to hooks within
participating Docker images. We implement the main hooks of the
jig, described in Section 2. We also report the arguments we pass
to the jig in order for it to use the implemented hooks in Section 3.
Finally, we report baseline results from indexing and searching the
three collections compatible with our Docker image in Section 4.</p>
      <p>Like other participants in the challenge, our Dockerfile,
associated tools, and documentation is made available on GitHub at
https://github.com/osirrc/ielab-docker.
2</p>
    </sec>
    <sec id="sec-2">
      <title>HOOK IMPLEMENTATIONS</title>
      <p>The OSIRRC 2019 uses a jig3 to communicate with the Docker image
from participating teams. The jig supports five diferent hooks that
a participating Docker image can implement. Our Docker image
supports the init, index, and search hooks. Our implementation
of these hooks are described in the following subsections. We also
examine the unimplemented hooks and how these would work if
implemented in future work.
2.1</p>
      <p>init
The init hook is used to prepare a participating Docker image prior
to indexing. We found, however that using this hook significantly
slowed development time, and instead decided to include all of
our setup steps in our Dockerfile. When using a Dockerfile, listed
commands can be cached between image builds. When compiling
code, downloading external artefacts, and copying large files, these
processes take time away from development when in the init
hook.</p>
      <p>Our Dockerfile uses a multi-stage build to reduce the total size
of the image, counteracting what would be a downside to using it.
In the first stage, development libraries are installed, Elasticsearch
is downloaded and decompressed (our participating Docker image
uses Elasticsearch 7.0.0, which was the latest version at the time
development of the image began), and our command-line applications
for parsing and searching are compiled. In the following stage, only
the necessary runtime applications are installed (e.g., Python), and
the artefacts from the previous stage are copied.
3https://github.com/osirrc/jig
2.2</p>
      <p>index
The index hook is used to index one or more collections. There
are several steps before a collection can be indexed in this phase.
Firstly, depending on the collection, files must be decompressed or
removed (e.g., removing README files, changelogs, tools). Next,
Elasticsearch must be started. Because Elasticsearch is designed to
be run as a service, it must be started in the background. Our index
hook then waits until the Elasticsearch web service is reachable (by
repeatedly making HTTP requests), and then waits until the cluster
health level of Elasticsearch is at least yellow (i.e., all shards have
been allocated). API requests to Elasticsearch will fail if the web
service is available but the cluster health is red (i.e., there are shards
still waiting to be allocated or there is a problem with Elasticsearch).
This was a major pain point during development of our system and
lead to many unexpected issues.</p>
      <p>At this stage, Elasticsearch is available for indexing documents.
Documents in the collection path (provided by the jig) are
transformed by our TREC collection parser 4 and indexed in bulk. When
indexing diferent collections, we do not attempt any document
cleaning or pre-processing (e.g., stemming). We generally index
titles into one field, then the content of the document from all other
ifelds (e.g., paragraphs, sub-headings) under the content part of
the document (e.g., TEXT/body/contents, etc.) into a single text
ifeld. Examples of how collection documents are transformed into a
representation suitable to be indexed in Elasticsearch are described
in detail in the following sections.
2.2.1 robust04. Documents in this collection are in a XML-like
format (in fact we parse them using a generic XML parser). One
aspect of this file format that caused the XML parser to fail are the
unquoted attributes on elements (e.g., &lt;F P=100&gt;, the 100 should
be wrapped in quotation marks to be valid XML). To overcome this,
we simply removed all attributes from every element. When
transforming the files in the robust04 collection, we index everything in
the &lt;TEXT&gt; tag as a single string, ignoring structure. An example
of this transformation is presented in Figure 1.
2.2.2 core17. The documents for core17 are in a standardised XML
format. We take a similar approach to indexing these documents to
the robust04 documents. Unlike the robust04 documents, there is a
clear title in the header of the core17 documents which we extract.
We then proceed to index everything between the &lt;body&gt; tags as
a single string, ignoring structure. An example core17 document
and what we transform it into as a JSON representation suitable
for indexing in Elasticsearch is presented in Figure 2.
2.2.3 core18. The documents in the core18 collection are already
in a format suitable for Elasticsearch to index (i.e., they come as a
JSON object). One challenge we faced here, however is that the main
content of these documents is stored as an array of individual JSON
objects, all with the same field name ( content), but with diferent
data-types for the values. This issue is illustrated in Figure 3. This is
a problem because Elasticsearch infers the types of values to create
a mapping based on the first document indexed. When Elasticsearch
encounters a value that does not conform to the data-type specified
by the mapping, it will fail to index the document. To overcome
&lt;DOC&gt;
&lt;DOCNO&gt; FBIS3-61525 &lt;/DOCNO&gt;
&lt;HT&gt; "jpust004___94105" &lt;/HT&gt;
&lt;HEADER&gt;
&lt;AU&gt; JPRS-UST-94-004 &lt;/AU&gt;
Document Type:JPRS
Document Title:Science &amp;amp; Technology
&lt;/HEADER&gt;
&lt;ABS&gt; Central Eurasia &lt;/ABS&gt;
&lt;DATE1&gt; 26 January 1994 &lt;/DATE1&gt;
&lt;F P=100&gt; LIFE SCIENCES &lt;/F&gt;
&lt;F P=101&gt; MEDICINE AND PUBLIC HEALTH &lt;/F&gt;
&lt;H3&gt; &lt;TI&gt; Environmental Monitoring System of Natural Water</p>
      <p>Conditions &lt;/TI&gt;&lt;/H3&gt;
&lt;F P=102&gt; 947C0156B Kiev GIDROBIOLOGICHESKIY ZHURNAL in Russian
Vol 29 No 3, May-Jun 93 pp 88-95 &lt;/F&gt;
&lt;F P=103&gt; 947C0156B &lt;/F&gt;
&lt;F P=104&gt; Kiev GIDROBIOLOGICHESKIY ZHURNAL &lt;/F&gt;
&lt;TEXT&gt;
Language: &lt;F P=105&gt; Russian &lt;/F&gt;
Article Type:CSO
&lt;F P=106&gt; [Article by S.V. Kreneva, Azov Scientific</p>
      <p>Research Institute &lt;/F&gt;
of Fishery Industry, Rostov na Donu; UDC [593.17:574.63](08)(28)]
[Abstract] The widening gap between the rates at which new
contaminants appear the maximum...
&lt;/TEXT&gt;
&lt;/DOC&gt;
{
}</p>
      <p>(a) Contents of robust04 document FBIS3-61525.
"DocNo": "FBIS3-61525",
"Text": "Language: Russian Article Type:CSO [Article
by S.V. Kreneva, Azov Scientific Research
Institute of Fishery Industry, Rostov na Donu;
UDC [593.17:574.63](08)(28)] [Abstract] The
widening gap between the rates at which new
contaminants appear the maximum..."
(b) Elasticsearch representation of robust04 document FBIS3-61525.
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!DOCTYPE nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.3..."&gt;
&lt;nitf change.date="June 10, 2005" change.time="19:30"&gt;
&lt;head&gt;
&lt;title&gt;ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY&lt;/title&gt;
&lt;meta content="23" name="publication_day_of_month"/&gt;
&lt;meta content="1" name="publication_month"/&gt;
&lt;meta content="1987" name="publication_year"/&gt;
&lt;meta content="Friday" name="publication_day_of_week"/&gt;
&lt;meta content="Weekend Desk" name="dsk"/&gt;
&lt;meta content="20" name="print_page_number"/&gt;
&lt;meta content="C" name="print_section"/&gt;
&lt;meta content="3" name="print_column"/&gt;
&lt;meta content="Arts" name="online_sections"/&gt;
&lt;docdata&gt;
&lt;doc-id id-string="6135"/&gt;
&lt;doc.copyright holder="The New York Times" year="1987"/&gt;
&lt;identified-content&gt;
&lt;classifier class="indexing_service"&gt;ART&lt;/classifier&gt;
&lt;classifier class="indexing_service"&gt;REVIEWS&lt;/classifier&gt;
&lt;classifier class="indexing_service"&gt;ART SHOWS&lt;/classifier&gt;
&lt;org class="indexing_service"&gt;WHITNEY MUSEUM OF AMERICAN...&lt;/org&gt;
&lt;person class="indexing_service"&gt;SMITH, ROBERTA&lt;/person&gt;
&lt;person class="indexing_service"&gt;SALLE, DAVID&lt;/person&gt;
&lt;classifier class="online_producer"&gt;Review&lt;/classifier&gt;
&lt;classifier class="online_producer"&gt;Top/Features/Arts&lt;/classifier&gt;
&lt;/identified-content&gt;
&lt;/docdata&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;body.head&gt;
&lt;hedline&gt;
&lt;hl1&gt;ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY&lt;/hl1&gt;
&lt;/hedline&gt;
&lt;byline class="print_byline"&gt;By ROBERTA SMITH&lt;/byline&gt;
&lt;byline class="normalized_byline"&gt;SMITH, ROBERTA&lt;/byline&gt;
&lt;/body.head&gt;
&lt;body.content&gt;
&lt;block class="lead_paragraph"&gt;
&lt;p&gt;LEAD: THE exhibition of David Salle's work that has just ...&lt;/p&gt;
&lt;/block&gt;
&lt;block class="full_text"&gt;
&lt;p&gt;For American art of the 1980's, Salle's painting stands...&lt;/p&gt;
&lt;/block&gt;
&lt;/body.content&gt;
&lt;/body&gt;
&lt;/nitf&gt;</p>
      <p>(a) Contents of core17 document 6135.
"DocNo": "6135",
"Title": "ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY",
"Text": "ART: DAVID SALLE'S WORKS SHOWN AT THE WHITNEY By
ROBERTA SMITH SMITH, ROBERTA LEAD: THE exhibition
of David Salle's work that has just...For American
art of the 1980's, Salle's painting stands..."
{
}</p>
      <p>(b) Elasticsearch representation of core17 document 6135.</p>
    </sec>
    <sec id="sec-3">
      <title>Unimplemented Hooks</title>
      <p>
        The two unimplemented hooks are train and interact. As future
work, if the train hook were to be implemented, we would use it
to optimise retrieval function parameters (e.g., the K 1 and B
parameters of BM25, similar to work by Jimmy et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Furthermore,
if the interact hook were to be implemented, we would use it
to expose the REST API of Elasticsearch. This would allow one to
connect other tools such as Kibana, and explore the data further.
{
}
{
}
"content": "Homicides remain steady in the Washington region",
"mime": "text/plain",
"type": "title"
      </p>
      <p>(a) Data-type of content is a string.
"content": 1483230909000,
"mime": "text/plain",
"type": "date"</p>
      <p>(b) Data-type of content is an integer.
core18 document which contain diferent data-types for the
same field (content).
python3 run.py prepare \
--repo osirrc2019/ielab \
--collections robust04=/path/to/disk45=trectext
python3 run.py prepare \
--repo osirrc2019/ielab \
--collections core17=/path/to/NYTcorpus=nyt
python3 run.py prepare \
--repo osirrc2019/ielab \
--collections core18=/path/to/WashingtonPost.v2=wp
python run.py search \
--repo osirrc2019/ielab \
--output out/ielab \
--qrels qrels/qrels.robust04.txt \
--topic topics/topics.robust04.txt \
--collection robust04
python run.py search \
--repo osirrc2019/ielab \
--output out/ielab \
--qrels qrels/qrels.core17.txt \
--topic topics/topics.core17.txt \
--collection core17
python run.py search \
--repo osirrc2019/ielab \
--output out/ielab \
--qrels qrels/qrels.core18.txt \
--topic topics/topics.core18.txt \
--collection core18
0.3477
0.1779
0.3081
Table 1 presents the evaluation results for each collection indexed
using our Elasticsearch Docker image. The evaluation measures
reported in this table are chosen based on the measures reported
by the jig. Although all participants report the same evaluation
measures it is important to note that one cannot fairly compare
between systems and these results can only be used to demonstrate
the replicability of our system. We elaborate on this detail in the
following section, and present a possible solution.
5</p>
    </sec>
    <sec id="sec-4">
      <title>DISCUSSION &amp; CONCLUSIONS</title>
      <p>Our contribution at the OSIRRC 2019 is an Elasticsearch Docker
image capable of indexing and searching the robust04, core17, and
core18 collections. While we had many dificulties getting
Elasticsearch to work in this setup, and correctly parsing and indexing
documents, we were able to implement the main hooks for the jig.</p>
      <p>The jig of the OSIRRC 2019 introduced a standardised framework
for anyone to implement hooks to a search system, to allow for
replicability. However, due to the lack of standardisation of the
ifne-grain parametrisation of the systems, it becomes dificult to
control systems comparisons. What the jig introduced is a way to
perform experiments, however, it lacked a standard way to
configure experiments. Many participants have exploited the optional
arguments for indexing and searching, however the problem now is
that each participant uses diferent syntax and naming conventions
for standard parameters (e.g., which retrieval model to use, whether
to perform stemming or not, etc.). We suggest that if this challenge
were to be run again, another hook be added which defines the
capabilities of a Docker image, which can include oficial, standard
capabilities, such as the name and parameters of the retrieval model,
what document pre-processing steps to take, or what fields to index
(which generally apply to all search systems), as well as self-defined
capabilities such as the compression algorithm to use (which may
only apply to a subset of systems). Currently, while it is limiting to
directly compare the evaluation results of our image to the images
of other participants due to the reasons listed above, it becomes
possible for others working with Elasticsearch to easily compare
their system to this baseline. The current Docker images developed
as part of this initiative only allow runs to be replicable and not
comparable. By listing the capabilities of systems, it would allow
one to fairly compare between two or more systems that share
capabilities. Although system comparison was not the goal of this
challenge, it could easily be facilitated by adopting solutions similar
to that described above.</p>
      <p>
        The OSIRRC 2019 initiative was valuable from both the
perspective of implementing the jig hooks, and from comparing how
others chose to implement them in their hooks. Despite the fact
that Elasticsearch is primarily a commercial/production product,
and not a research oriented tool, there are several published
academic research papers that use it, e.g., those from our ielab research
team6 [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4–6</xref>
        ], and other groups [
        <xref ref-type="bibr" rid="ref1 ref3 ref7">1, 3, 7</xref>
        ]. This may be in part due
to the ease with which one can index and search documents in
Elasticsearch (very little to no configuration is necessary, provided
an adequate parser is available). We hope that our contribution
to and participation in this challenge leads to further
replicability and reproducibility of Elasticsearch and other search systems,
and that others can more easily compare their own Elasticsearch
experiments to this common baseline.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Amato</surname>
          </string-name>
          , Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gennaro</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Large-Scale Image Retrieval with Elasticsearch</article-title>
          .
          <source>In The 41st International ACM SIGIR Conference on Research &amp;#38; Development in Information Retrieval (SIGIR '18)</source>
          . ACM, New York, NY, USA,
          <fpage>925</fpage>
          -
          <lpage>928</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Leif</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Moshfeghi, Harrisen Scells,
          <string-name>
            <given-names>Peilin</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The Lucene for information access and retrieval research (LIARR) workshop</article-title>
          at SIGIR 2017.
          <article-title>(</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Rafael</given-names>
            <surname>Glater</surname>
          </string-name>
          , Rodrygo L.T. Santos, and
          <string-name>
            <given-names>Nivio</given-names>
            <surname>Ziviani</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Intent-Aware Semantic Query Annotation</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)</source>
          . ACM, New York, NY, USA,
          <fpage>485</fpage>
          -
          <lpage>494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Jimmy</surname>
            ,
            <given-names>Guido</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
            , and
            <given-names>Bevan</given-names>
          </string-name>
          <string-name>
            <surname>Koopman</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Boosting titles does not generally improve retrieval efectiveness</article-title>
          .
          <source>In Proceedings of the 21st Australasian document computing symposium</source>
          .
          <volume>25</volume>
          -
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Locke</surname>
          </string-name>
          , Guido Zuccon, and
          <string-name>
            <given-names>Harrisen</given-names>
            <surname>Scells</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Automatic Query Generation from Legal Texts for Case Law Retrieval</article-title>
          .
          <source>In Asia Information Retrieval Symposium</source>
          . Springer,
          <fpage>181</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Harrisen</given-names>
            <surname>Scells</surname>
          </string-name>
          , Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi, and
          <string-name>
            <given-names>Shlomo</given-names>
            <surname>Geva</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Integrating the framing of clinical questions via PICO| into the retrieval of medical literature for systematic reviews</article-title>
          .
          <source>In Proceedings of the 26th ACM International Conference on Information and Knowledge Management</source>
          .
          <fpage>2291</fpage>
          -
          <lpage>2294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Luca</given-names>
            <surname>Soldaini</surname>
          </string-name>
          , Arman Cohan, Andrew Yates, Nazli Goharian, and
          <string-name>
            <given-names>Ophir</given-names>
            <surname>Frieder</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Retrieving medical literature for clinical decision support</article-title>
          .
          <source>In European conference on information retrieval</source>
          . Springer,
          <fpage>538</fpage>
          -
          <lpage>549</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>