<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Argument Retrieval for Controversial Questions Using Retrieve and Re-rank Pipelines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raunak Agarwal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrei Koniaev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robin Schaefer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Linguistics, University of Potsdam</institution>
          ,
          <addr-line>14476 Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This notebook documents Team Macbeth's contribution to the CLEF 2021 shared task Touché: Argument Retrieval for Controversial Questions. Our approach consists of diferent configurations of a two-step retrieve and re-rank pipeline. We experimented with sparse and dense approaches for argument retrieval and trained query-document cross-encoders for argument re-ranking. Our findings suggest that a sparse retriever combined with a custom re-ranker performed the best out of all our approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Argument Retrieval</kwd>
        <kwd>Sentence Embeddings</kwd>
        <kwd>Semantic Search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        While standard information retrieval systems have focused largely on sparse
bag-of-wordsbased approaches such as BM25, recent trends in IR indicate the performant nature of a two-step
retrieval and re-ranking pipeline, where a sizeable number of candidate documents are first
retrieved using the aforementioned sparse representations, and then re-ranked using (trainable)
neural models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Attempts are also being made to get rid of sparse representations altogether through the use
of dense retrieval systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A standard dense retrieval architecture comprises a
transformerbased encoder, which is fine-tuned on a given training corpus with queries and relevant
documents. The encoded documents are usually added into an inverted index based on approximate
nearest neighbours. There is also work which shows that combining sparse and dense
representations can further enhance the performance of these IR systems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Our submissions for the Touché shared task center around the above methods.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>All our experiments were computed on a setup comprising of an Intel Xeon E5-2650 CPU (24
cores, 256 GB RAM) and 2 NVIDIA GTX 1080Ti GPU’s (24 GB VRAM). We also used Weights
and Biases3 to track our experiments.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Pre-training</title>
        <p>
          We pre-trained the entire args.me corpus on a Masked Language Modeling (MLM) task
introduced first by BERT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and later modified by Liu et al. in RoBERTa [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. RoBERTa demonstrated
an improvement on BERT’s performance with a small adaptation to the pre-training task, hence
we chose to follow their approach.
        </p>
        <p>
          Our motivation for pre-training was to make sure that our model first learns from the
domaininvariant representations present in RoBERTa-base, and then enhances these representations
through (continued) pre-training on our custom domain. This kind of domain-adaptive
pretraining has been known to ofer gains in task performance [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>We used the hyper-parameters presented in the RoBERTa-base model and trained it for 10
epochs4, generating a domain-specific RoBERTa-base model with perplexity ≈ 4.1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Re-annotation</title>
        <p>
          The organisers of Touché 2021 provide the participants with 2298 relevance judgements to allow
training/evaluation of their systems. These relevance judgments are the result of crowd-sourcing
eforts of Mechanical Turk 5 workers - a practice which has been criticised for its questionable
data quality [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], leaving aside major ethical considerations concerning labour exploitation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Our initial plan was to use these annotations to train a sentence-pair classifier. After a closer
look, however, we found that these annotations were riddled with errors and therefore, not
suitable as a training set.</p>
        <p>Instead of eliminating their use altogether, we decided to re-annotate all of the 2298 relevance
judgements.6 We went through two rounds of annotation for each query-document pair, and
achieved the following metric for inter-annotator agreement: Krippendorf’s alpha = 0.39
3https://wandb.ai/
4https://wandb.ai/ragabet/’roberta-base’
5https://www.mturk.com/
6The annotations are available on our git repository.</p>
        <p>Due to time constraints, the runs we submitted were trained only on the first round of
annotations. The relatively low inter-annotator score suggests our runs would’ve turned out
slightly diferent had we trained our models on an average of the two annotation rounds.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Sentence Embeddings</title>
        <p>When it was first introduced, BERT set new state-of-the-art results on various NLP tasks,
including question answering, sentence classification, and sentence-pair regression. A big
disadvantage of the BERT’s network structure, however, was its inability to generate sentence
embeddings based on single-input sequences.</p>
        <p>
          To overcome the above issue, we used UKP Lab’s Sentence-BERT (or SBERT) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] which is a
modification of the standard BERT architecture. SBERT adds a mean pooling operation on top
of the contextualized word vectors generated by BERT/RoBERTa. This enables the generation
of semantically meaningful sentence/document embeddings which can be used for downstream
tasks. We made use of the regression objective function described in their paper. A pair-wise
regressor was trained using cosine-similarity between the two embeddings  and  (where  is
the query embedding and  is the document embedding). The objective function was optimized
using mean-squared-error loss.
        </p>
        <p>
          The terminology used in SBERT is further refined by Humeau et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] where the following
approaches for pair-wise sentence scoring are defined: Bi-Encoders and Cross-Encoders. (See
Figure 1).
        </p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Bi-Encoder</title>
          <p>The architecture introduced in SBERT is what is now known as a bi-encoder. Using a bi-encoder,
each sentence can be encoded into an independent sentence embedding. The creation of these
vector representations enables eficient document retrieval through the use of standard similarity
measures (such as Euclidean distance/cosine-similarity) in the embedding space.</p>
          <p>After the pre-training step (3.2), we trained a bi-encoder using the query-document
annotations described in 3.3. This bi-encoder was used to generate document embeddings for the
entire corpus, giving us an embedding space of size  * , where  is the embedding size and 
is the total number of documents. This embedding space was then indexed by a dense retriever
as described in 3.5.2.</p>
          <p>Note: Each document in the corpus consists of premises and a conclusion. To generate
document embeddings, we ignore the conclusion and use only the premises.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Cross-Encoder</title>
          <p>A cross-encoder is analogous to the standard BERT design where full-attention is applied across
tokens over an input sentence pair. While a bi-encoder takes two inputs and returns two
representations (or embeddings), cross-encoders take two inputs and return a single decision
directly. They outperform bi-encoders on pair-wise sentence scoring tasks at the cost of speed.</p>
          <p>
            Since cross-encoders are slow and do not produce independent embeddings, they cannot be
used for retrieval tasks. We used them in the second step of our pipeline to re-rank documents
where a cross-encoder was trained (after MLM pretraining 3.2) on the annotations as described
in 3.3. As a baseline, we also made use of a cross-encoder pretrained on the MSMARCO dataset.
[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]
          </p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Retrieval Models</title>
        <sec id="sec-3-5-1">
          <title>3.5.1. Sparse: BM25 (Elasticsearch)</title>
          <p>BM25 is a traditional bag-of-words-based retrieval function which scores the relevancy of
documents for a given query using the frequencies of common terms between the query and
document. As a variation of the TF-IDF function, it is sensitive to the token frequencies as well
as their inverse document frequencies.</p>
          <p>Due to its simplicity, computational eficiency, and performance, BM25 serves as a critical
component of large-scale search applications and serves as the de facto industrial standard in IR
tasks. To index our id-document pairs, we used the implementation available in Elasticsearch7
with the default settings enabled.</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>3.5.2. Dense: Approximate Nearest Neighbours (hnswlib)</title>
          <p>
            Despite its robustness, BM25 has several shortcomings. It sufers from the lexical gap problem
[
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], a common occurrence in systems built on sparse representations; empirical results have
also shown that it overly penalizes very long documents [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
          <p>
            To overcome the above problems, we deployed BM25’s sparse retriever alongside a dense
retriever. Experimental results demonstrate that the contextual text representations from BERT
are more efective than BM25 on retrieval tasks [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>Constructing a dense retriever was a two step process: first, we encoded the entire corpus
into a dense vector space using the bi-encoder described in 3.4.1. Second, the representations
7https://lucene.apache.org/core/7_0_1/core/org/apache/lucene/search/similarities/BM25Similarity.html
created by the bi-encoder were indexed using a library that implements approximate nearest
neighbours search (hnswlib).8</p>
          <p>
            Approximate nearest neighbour search is an important step in eficiently generating similar
document vectors for a given query vector. The alternative is to attempt cosine-similarity of
the query vector with every single document vector i.e. brute force. We chose hnswlib since
systems based on hierarchical navigable small world graphs (HNSW) [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] represent the current
state-of-the-art in approximate nearest neighbour search.9
          </p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Data Augmentation</title>
        <p>
          For our data augmentation approach, we utilized the methodology described in the Augmented
SBERT paper [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] where a pre-trained cross-encoder was used to weakly label a sample of
unlabeled query-document pairs. The query-document pairs were sampled using BM25, fed
into a cross-encoder trained on MSMARCO to generate silver labels, which were then appended
to the gold training set to train an augmented bi-encoder. (Figure 2)
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Retrieve and Re-rank</title>
        <p>The two-step pipeline of retrieve and re-rank has been known to work well on IR tasks. Given a
search query, the first step is to retrieve a large list of candidate documents which are potentially
relevant for the query. For the retrieval stage, we experimented with a sparse retriever (3.5.1), a
dense retriever (3.5.2), and a combination of both (by simply appending the outputs of the two
retrievers).</p>
        <p>In the second step, we used a re-ranker based on a cross-encoder (3.4.2) that scores the
relevancy of all the retrieved candidates (Figure 3). We experimented with a custom
crossencoder trained on our annotations and a pre-trained cross-encoder trained on the MSMARCO
dataset. For each query, 100 candidate documents were retrieved and sent to the cross-encoder
8https://github.com/nmslib/hnswlib
9http://ann-benchmarks.com/</p>
        <p>Run
1
2
3
4
5</p>
        <p>Retriever
Sparse</p>
        <p>Sparse
Sparse + Dense</p>
        <p>Dense
Sparse + Dense</p>
        <p>Augmenter</p>
        <p>Reranker</p>
        <p>Relevance</p>
        <p>Quality
No
No
No
No
Yes</p>
        <p>Pretrained Cross Encoder
Custom Cross Encoder
Custom Cross Encoder
Custom Cross Encoder
Custom Cross Encoder
for the purposes of re-ranking. After re-ranking, only the top 50 documents were included in
the final submission file.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>We performed evaluation using the relevance judgements10 and quality judgements11 provided
by the organisers of the shared task. The metric used was nDCG@5. The results are in Table 1.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we outlined Team Macbeth’s contribution to the CLEF lab Touché. Our central
approach consisted of using tried-and-tested methods for information retrieval and re-ranking.
We pre-trained the args.me corpus on a masked language modeling task, re-annotated the
relevance arguments from Touché 2020, and attempted neural methods for both retrieval and
re-ranking. The combination of a sparse retriever and a custom neural re-ranker stands out as
the best method in terms of both argument relevance as well as argument quality.
10https://webis.de/events/touche-21/touche-task1-51-100-relevance.qrels
11https://webis.de/events/touche-21/touche-task1-51-100-quality.qrels</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gienapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beloucif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , Overview of touché 2021:
          <article-title>Argument retrieval</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ajjour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Data acquisition for argument search: The args</article-title>
          .me corpus,
          <source>in: KI</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>Passage</surname>
          </string-name>
          re-ranking
          <source>with bert</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>1901</year>
          .04085.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oğuz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. tau Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , M. Collins, Sparse, dense, and
          <article-title>attentional representations for text retrieval</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2005</year>
          .00181.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>Don't stop pretraining: Adapt language models to domains</article-title>
          and tasks,
          <year>2020</year>
          . arXiv:
          <year>2004</year>
          .10964.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hauser</surname>
          </string-name>
          , G. Paolacci,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Chandler</surname>
          </string-name>
          ,
          <article-title>Common concerns with mturk as a participant pool: Evidence and solutions, 2018</article-title>
          . URL: psyarxiv.com/uq45c. doi:
          <volume>10</volume>
          .31234/osf.io/uq45c.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Schlagwein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cecez-Kecmanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hanckel</surname>
          </string-name>
          ,
          <article-title>Ethical norms and issues in crowdsourcing practices: A habermasian analysis</article-title>
          ,
          <source>Information Systems Journal</source>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1111/isj.12227.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-transformers documentation</article-title>
          , https://www.sbert.net/,
          <year>2019</year>
          . (Accessed on 05/28/
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Humeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shuster</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Weston</surname>
          </string-name>
          ,
          <article-title>Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bajaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stoica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Ms marco: A human generated machine reading comprehension dataset</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <volume>1611</volume>
          .
          <fpage>09268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>Bridging the lexical chasm: statistical approaches to answer-finding</article-title>
          ,
          <source>in: SIGIR '00</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>When documents are very long, bm25 fails!</article-title>
          ,
          <source>Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Malkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Yashunin</surname>
          </string-name>
          ,
          <article-title>Eficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <volume>1603</volume>
          .
          <fpage>09320</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Daxenberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2010</year>
          .08240.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>