<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>G.: Visualizing data using t-sne. Journal of machine learning
research 9(Nov)</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0004-3702</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring Argument Retrieval with Transformers</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Leipzig University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>41</volume>
      <fpage>297</fpage>
      <lpage>304</lpage>
      <abstract>
        <p>We report on our recent efforts to employ transformer-based models as part of an information retrieval pipeline, using argument retrieval as a benchmark. Transformer models, both causal and bidirectional, are independently used to expand queries using generative approaches as well as to densely embed and retrieve arguments. In particular, we investigate three approaches: (1) query expansion using GPT-2, (2) query expansion using BERT, and orthogonal to these approaches, (3) embedding of documents using Google's BERT-like universal sentence encoder (USE) combined with a subsequent retrieval step based on a nearest-neighbor search in the embedding space. A comparative evaluation of our approaches at the Touché lab on argument retrieval places our query expansion based on GPT-2 first on the leaderboard with a retrieval performance of 0.808 nDCG@5, improving over the task baseline by 6.878%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Search has become the sine qua non tool of information access, and the gateway to the
World Wide Web. The users of web search engines meanwhile expect a high quality
of the search results in terms of their relevance to the queries submitted: If relevant
documents exist for a given query, they are usually found, and the most relevant ones
can be expected to be ranked highest. However, search engines are optimized for ad
hoc retrieval tasks, and a key assumption is that a single document suffices to satisfy
the information need underlying a query. That assumption falls apart when the topic of
interest is inherently subjective and nuanced, such as is the case for contentious issues.
Any one document will most likely argue one way or another, so that a user may need to
peruse many documents at a time to satisfy a deliberative information need. The current
search landscape is, however, not especially attuned to such a nuance, usually preferring
to let ”the stakeholders compete for what opinion ranks higher” [23]. The ability to
specifically handle arguments rather than the documents that might contain them is an
attempt to address that problem using computational argumentation analysis [23, 31].
Such an approach forms the basis of the args.me search engine which relies on a corpus
of more than 300,000 arguments mined from online debate portals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This corpus
formed the basis of the Touché shared task on argument retrieval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        We leverage transformer models to see how they might enrich argument retrieval
at various stages of the information retrieval (IR) pipeline. Attention-based transformer
models [30] have seen a recent surge in popularity as their readiness for massive
parallelization and the increasing availability of computational power on GPUs led to such
models claiming the state-of-the-art crown on many natural language processing and
understanding tasks [
        <xref ref-type="bibr" rid="ref7">7, 25</xref>
        ]. We show that competitive performance can be achieved on
this task with no preprocessing of the documents or fine-tuning of the models on the
task. In what follows, Section 2 presents related work, Section 3 exposes our approach,
and Section 4 presents the results of our participation in the shared task.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Although the subject of arguments in a retrieval context is by no means a new
development (see, e.g., [26] and [28]), the field itself is still nascent. Lawrence and Reed
[18], Potthast et al. [23], and Wachsmuth et al. [31] provide a comprehensive overview
of the field, with the last two further putting forth the theoretical considerations that
underlie the development of a tool where arguments form the unit of retrieval, using the
args.me dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to ground the theory in an applied retrieval study. Another salient
research direction is spearheaded by IBM’s Project Debater and corresponding studies
that leverage its numerous datasets (see [29] for an overview of the project).
      </p>
      <p>
        The growing interest of natural language processing (NLP) researchers in
information retrieval and argument mining [27], as well as the impressive performance and ease
of use of transformer-based models [
        <xref ref-type="bibr" rid="ref18">32</xref>
        ] and their propensity to semantic nuance makes
a convergence of the two unavoidable. Indeed, researchers in IBM’s project debater
have recently published two corresponding papers [
        <xref ref-type="bibr" rid="ref14 ref9">9, 14</xref>
        ] that both use BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for
argument retrieval and argument quality assessment. Like other efforts in the argument
mining direction [
        <xref ref-type="bibr" rid="ref11 ref6">6, 11</xref>
        ], these methods, though impressive in their results, do not
readily transfer to a domain other than that of the retrieval of arguments because they are
usually fine-tuned on a specific document corpus.
      </p>
      <p>
        The use of transformers in general, specifically BERT, to the field of document
retrieval was until very recently limited to frameworks where initial retrieval is
delegated to the Lucene-based Anserini retrieval toolkit [
        <xref ref-type="bibr" rid="ref19 ref20">34, 33</xref>
        ], which, while proving a
promising approach, did not attempt to instrumentalize transformers at different parts
of the IR pipeline. Another approach similarly leverages BERT and Anserini for ad hoc
document retrieval, while also coupling the approach with the ability to interface with
Python to simplify neural network based research [
        <xref ref-type="bibr" rid="ref21">35</xref>
        ].
      </p>
      <p>
        The semantic prowess of transformers makes them prime candidates for enriching
IR pipelines that rely on query expansion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For a deeper coverage of query expansion
we refer the reader to Azad and Deepak’s [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] thorough survey on the topic. In brief,
query expansion consists in augmenting a user query to increase its effectiveness and
reduce ambiguity. That is achieved through reformulating the query using additional
data, and the source of that data coincides with the different sub-approaches of this
sub-field of IR. Azad and Deepak [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] differentiate between two main directions: global
analysis and local analysis. The latter relies on user feedback, both direct and implicit,
whereas the former consists of approaches that rely on knowledge that can be gleaned
and derived from the query itself or its context. These include linguistic, corpus-based,
log-based and web-based approaches. Global analysis has proven to be of particular
interest for transformer-focused research [
        <xref ref-type="bibr" rid="ref8">8, 22, 19, 21</xref>
        ]. The work of Dibia [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in
particular aligns nicely with our approach in Section 3.2. Our approaches differ however
in the strategies used to determine where and how to inject context. To our knowledge,
the query expansion approach we develop in Section 3.1 is the first use of a transformer
decoder (GPT-2 in this instance) to generate documents that read as though they might
have originated from a corpus of interest, at least plausibly enough for a retrieval
system, and thus narrow down the scope of search.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Using Transformers for Document Retrieval</title>
      <p>
        We set out to instrumentalize for the retrieval of documents the ability of different
transformers to encode knowledge. The original transformer introduced by Vaswani
et al. [30] uses a standard seq2seq encoder-decoder architecture, whereby the encoder
learns a task-specific embedding, and the decoder learns a language model. Subsequent
transformer-based models do not necessarily follow that convention. BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] only
uses an encoder, and GPT-2 [25] only a decoder. It is therefore useful to qualify the
rather general “transformer” nomenclature with either “encoder” or “decoder”.
      </p>
      <p>
        We use transformer decoders (i.e., GPT-2-like models) for query expansion via
text hallucination (Section 3.1), i.e., the generation of a text that reads as if it might
have come from the corpus, and transformer encoders (i.e., BERT-liked models) for
keyword-based query expansion (Section 3.2). Moreover, we consider transformer
encoders for document embedding (Section 3.3). Both query expansion approaches make
use of an Elasticsearch index where the documents of the args.me corpus were indexed
using a language model with Dirichlet priors (DirichletLM) [
        <xref ref-type="bibr" rid="ref22">36</xref>
        ], which has been shown
to be superior to other retrieval models for retrieving arguments from the args.me
corpus [23]. The embedding approach, on the other hand, uses a vector-based similarity
index for retrieval.
      </p>
      <p>Self-supervised pre-training on massive amounts of qualitatively diverse data is
what enables transformer models to encode the knowledge that allows them to
perform as well as they do on natural language processing (NLP) and natural language
understanding (NLU) tasks. Therein also lies their promise for retrieval tasks. We make
the conscious decision to only rely on the knowledge encoded into the models by these
tasks; that is, we do not rely on any argumentation-specific fine-tuning. That allows us
to gauge the performance of transformers for retrieval tasks in general. In particular, our
investigation aims at a modular proof-of-concept approach, not to show the superiority
of a certain method over others, nor was it our goal to optimize the ensuing pipeline for
accuracy on the relevance judgments.
3.1</p>
      <sec id="sec-3-1">
        <title>Query Expansion with a Transformer Decoder</title>
        <p>A lot of the recent media hype around transformer networks is centered around their
ability to generate coherent text. Transformer decoders can be trained as causal
language models (CLM), seeing as the representation of a given token can only depend
on the tokens preceding it. Indeed, since transformer decoders like GPT-2 simulate
language models, it is possible to iteratively generate sentences by sampling from the
output distribution at a given time step and feeding that output back into the network at
the following time step. By choosing an opening for a text (called prompt), it becomes
thus possible to steer GPT-2 and exert some influence over the text it generates. Our
approach for query expansion makes use of this capability to generate argumentative
expansions of a given query with a positive, negative, and a neutral stance by
prompting GPT-2 in turn with the six prompts shown in Table 1. The prompts are purposefully
constructed so as to simulate an argumentative dialog that is to be completed by GPT-2.</p>
        <p>
          The quality of generated sequences, and thus also of our retrieval results, is highly
contingent upon the way in which tokens are sampled from the network. It is easy
for neural language models to degenerate into incoherent, non-human sounding text
when using naive likelihood maximization [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Many sampling methods exist that try
to make sure the generated text is less likely to be incoherent, repetitive, or overly
generic [
          <xref ref-type="bibr" rid="ref12 ref13 ref16">12, 13, 16</xref>
          ]; the following sampling approaches are the most salient ones,
ordered from basic to advanced:
– Pure sampling is the most naive as well as often the worst-performing sampling
method. It consists of greedily choosing the most likely token at every time step.
        </p>
        <p>
          This sampling method usually leads to low quality text.
– Beam Search is a heuristic that keeps a set number of candidate sequences until
they all reach the desired length and keeps the most likely candidate. Search for
the output sequence that actually maximizes likelihood is intractable, so that beam
search provides a reasonable alternative in practice.
– Temperature scaling reshapes the output distribution by skewing it either toward
high probability tokens or low probability tokens: the former improves generation
quality but hurts diversity [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], while the latter regularizes generation by making
the model less certain of its top choices.
– Top-k sampling is the sampling method that was used for GPT-2 [25]. It truncates
the language model’s output distribution to a set of size k of most likely tokens. The
sampling then uses relative probability within this truncated set. All other tokens
are not considered for generation.
– Nucleus sampling is a stochastic decoding method proposed by Holtzman et al.
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Similar to top-k sampling, it also truncates the output distribution, albeit by
setting a threshold the cumulative distribution of top tokens must reach before
discarding the rest of the tokens.
        </p>
        <p>For each one of the six prompts in Table 1, we use GPT-2 to generate four
possible continuations up to a maximum of 100 tokens. To ensure the four continuations are
different from each other, we chose different combinations of the aforementioned
sampling strategies.1 Having framed the prompts as conversational (by virtue of the dashes
and formatting), the ensuing generated text often tends to read like an argument.
Having generated 24 such texts,2 we discard the original query and use these hallucinations
as 24 queries against the DirichletLM index, which we combine additively to generate
the final rankings. We only consider those documents which were returned by at least
twelve of the queries simultaneously.3</p>
        <p>For illustration purposes, consider the query “Can alternative energy effectively
replace fossil fuels?”. Three of the 24 texts that our approach generates are:
– Yes, because it has proven to be a significant and lasting improvement in fuel
efficiency, carbon neutrality. The only other thing that could possibly help this energy
is the need for nuclear reactors at low cost which would require more than 20
percent of current generation electricity by 2030 (currently under construction),
plus less renewable resources like wind or solar power as well but with sufficient
amounts of coal/solar panels if there are enough [sic]
– No, because there is no evidence for that. There are a few possible alternative
energy options available to people who would like to cut down fossil fuels, and I
believe those include wind or solar power.“The main thing we want in our future
climate policies has to be better use of resources instead on these things than if they
weren’t used at the moment,” said Kieferlein. “We need clean air [sic]
– Not sure. However, many scientists have made the point that alternative sources
of power are already producing more carbon emissions than they would otherwise
(and it seems like such a small number in our country). There has been some debate
about whether this was actually true or if there simply wasn’t much coal available
at all to replace fossil fuels and other forms thereof as an environmentally
sustainable form. . . In fact, recent studies suggest we [sic]
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Query Expansion with a Transformer Encoder</title>
        <p>
          Unlike transformer decoders, transformer encoders like BERT [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] cannot be used to
auto-regressively generate sentences. Their attention-based architecture is such that any
token can see (“attend to”) any other token of the sentence. Being able to look ahead
into the future of a sequence breaks the causality required by a CLM. As such, these
models have to rely on other pre-training tasks to gain linguistic coherence. Global
linguistic coherence is achieved through next sentence prediction, where BERT has
to predict whether two sentences follow each other in a corpus. Local coherence is
achieved through another pre-training task, namely masked language modeling (MLM).
1One with greedy sampling using 10 beams, and three using a temperature of 1.6, a top-k
threshhold of 100 tokens, and a nucleus sampling probability threshhold of 0.4.
2Six prompts with four continuations per prompt.
3This corresponds to an Elasticsearch boolean query of type “should” with the min_should_match
parameter set to 12.
- What do you think? &lt;query&gt;
- No, because of [MASK] and the risk of [MASK] [MASK].
- What do you think? &lt;query&gt;
- Absolutely not, I think [MASK] is bad!
- What do you think? &lt;query&gt;
- No, [MASK] is associated with [MASK] during [MASK].
- What do you think? &lt;query&gt;
- What about [MASK] or [MASK]?
- What do you think? &lt;query&gt;
- Don’t forget about [MASK]!
During training, tokens in the input are masked at random and the network is tasked with
guessing what that word was. Learning to fill in the blank has been shown to imporve
the quality of text generation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>We leverage this ability of the model to “fill in the blank” to enrich the original
query with a set of words that are contextually relevant to the topic at hand. To achieve
that, we again augment the original query using the same strategy as outlined in the
previous section, this time, however, we leave blanks for BERT to fill out in the form of
the [MASK] token (see Table 2). For every [MASK] in every augmented seed text, we
ask BERT to return the five most likely words, filtering out stop words, punctuation, and
sub-words. This amounts to an average (min=206, max=473) of 340 thematic keywords
per query. All keywords are then joined together into a space-separated list of keywords
which is what we use to query the DirichletLM index, discarding the original query.
For illustration, consider the query “Can Alternative Energy Effectively Replace Fossil
Fuels?” and provide the resulting keywords when expanding the query with BERT:
diesel, cost, nuclear, consumption, hydrogen, technologies, energy, future,
electricity, pregnancy, coal, alternative, migration, emissions, efficiency,
economics, technology, growth, wartime, earthquakes, green, environmental,
accidents, costs, renewable, winter, development, pollution, new, stress, water, oil,
accident, death, health, warming, sustainability, accidental, fires, competition
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Document Embedding with a Transformer Encoder</title>
        <p>
          Our final approach also employs a transformer encoder in the form of the large variant
of Google’s universal sentence encoder (USE) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Though architecturally similar to
BERT, the USE model is trained with very different pre-training tasks, which were
specifically picked to perform well on downstream NLU tasks with the goal of creating
a sentence embedder. A further distinction to BERT is USE’s unbounded input length,
which lends itself well to the args.me corpus. The following pre-training tasks were
considered:
– Skip-thought is a self-supervised pretraining task, originally devised to use LSTMs
to provide high-quality sentence vectors by training on a large amount of
contiguous text [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
– Natural language response suggestion imparts conversational awareness to the
sentence encoder, which fits quite well to the task at hand. The goal of this supervised
task is to predict the best short reply among millions of options in response to an
email [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
– Stanford natural language inference is a labeled dataset [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] of 570,000 sentence
pairs. This can be seen as a supervised variant of BERT’s next sentence prediction
pre-training task. In this instance however, entailment, contradiction, and
irrelevance are explicitly labeled in the data itself, rather than implied by the relative
position of two sentences in an unlabeled corpus of contiguous text.
        </p>
        <p>These pre-training tasks make USE a great candidate for argument retrieval. Using USE,
we embed each document in the args.me corpus into a 512-dimensional space. To
retrieve arguments given a query, we embed the query using the same model into the same
space, and perform exhaustive nearest-neighbor search,4 considering both L2 distance
and inner-product (IP) distance for retrieval.</p>
        <p>We carried out a small pilot experiment to make sure USE projects the args.me
corpus in a semantically meaningful way by running k-means on the embedded corpus,
choosing a cluster size of 100. The clusters obtained are both syntactically and
semantically coherent in a way that is surprisingly meaningful. Some clusters are thematically
coherent, encompassing topics, such as religion, politics, and economics. Others are
both syntactically and semantically coherent, where all premises are of the form “X is
better than Y” and covering themes such as video game consoles, superheroes, and
consumer electronics. Further clusters are only syntactically coherent, where, for instance,
all arguments consist of YouTube links or of repeated short idiosyncratic phrases one
tends to find on online debate websites (e.g., “I agree.”).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>The evaluation of our approaches to argument retrieval was carried out as part of
the Touché shared task. In what follows, we briefly recap the experimental setup and
overview the performance achieved.
4.1</p>
      <sec id="sec-4-1">
        <title>Experimental Setup</title>
        <p>
          The Touché shared task on argument retrieval [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] uses the TIRA evaluation
platform [24] to judge entries to the competition. On TIRA every task participant is
assigned their own virtual machine, and submitting a retrieval model to the shared task
        </p>
        <sec id="sec-4-1-1">
          <title>4https://github.com/facebookresearch/faiss</title>
          <p>Model
GPT-2
Baseline (DirichletLM)
BERT
Team Aragorn
USE (L2)
Team Zorro
USE (IP)
corresponds to submitting software to be run on that virtual machine with the relevant
inputs provided by TIRA at run time. Those inputs include the args.me corpus and a list
of 50 topics (queries) on which the retrieval model is to be judged using crowdsourced
relevance judgments. Though participant entries are ranked by nDCG@5 scores on the
leaderboard,5 TIRA also returns nDCG@10, nDCG, and QrelCoverage@10, which we
include in Table 3.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>The results of all our runs are included in Table 3, where one can clearly see the
considerable improvement over the official baseline by our GPT-2 query expansion approach,
which comes out on top of the leaderboard. Our BERT query expansion model
basically ties with the baseline of 0.756 as it manages to score an nDCG@5 score of 0.755.
We speculate that this performance might be partly due to the fact that both the args.me
corpus and the datasets on which BERT and GPT-2 are trained consist of user-generated
internet data. Both embedding-based runs perform worse than the query expansion
approaches and that is to be expected, as the only information signal afforded to those runs
originates in the query itself, whereas the other approaches had the benefit of
considerable added context through query expansion. Still, judging the embeddings on their
own constitutes a useful baseline. A promising approach would combine these two
orthogonal approaches.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This work showcases three possible uses of transformer models for the retrieval of
relevant arguments from the args.me corpus in particular, and document retrieval from a
corpus in general. Impressive results were achieved without hyperparameter tuning or
optimization of any sort. A promising future continuation of this work would be an
ablation study that judges the effect that the hyperparameters have on retrieval-optimized
text generation. Such a continuation would be necessary to judge the quantitative merits
of each approach.</p>
      <sec id="sec-5-1">
        <title>5https://events.webis.de/touche-20/shared-task-1.html</title>
        <p>It is also important to note that BERT and GPT-2 are merely the representatives
of the transformer family of models. There exists a myriad of models6 that build on
the foundations laid down by the works that introduced them, iterating, improving,
and filling gaps those original models did not take into account. It would therefore be
important to experiment with other models, perhaps also come up with new pre-training
tasks that would make query expansion even more performant.</p>
        <p>Furthermore, our approach leaves out any sort of natural language preprocessing of
the corpus or fine-tuning of any of the used models. That some approaches perform as
well as they do is a testament to the amount of linguistic and world knowledge encoded
in the weights of pre-trained transformers. A future research direction might leverage
the args.me and Project Debater corpora to add more argumentative awareness to the
transformers and indubitably improve retrieval results.</p>
        <p>Finally, we believe it crucial to sensibly modulate any research direction with the
due ethical considerations of such projects. It is unclear whether to include user
feedback, as including such signals would incur the risk of calcifying existing biases. While
it might be useful to think of a search engine as an educational tool, it might prove
dangerous to assume it is the prerogative of an information retrieval technology to
monopolize the task of teaching its users how to think by conditioning them to blindly rely
on it to populate their existing biases.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Data Acquisition for Argument Search: The args.me corpus</article-title>
          . In: Benzmüller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H. (eds.)
          <source>42nd German Conference on Artificial Intelligence (KI</source>
          <year>2019</year>
          ), pp.
          <fpage>48</fpage>
          -
          <lpage>59</lpage>
          , Springer, Berlin Heidelberg New York (
          <year>Sep 2019</year>
          ), https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -30179-
          <issue>8</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Azad</surname>
            ,
            <given-names>H.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deepak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Query expansion techniques for information retrieval: A survey</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>56</volume>
          (
          <issue>5</issue>
          ),
          <volume>1698</volume>
          ?1735 (Sep
          <year>2019</year>
          ), ISSN 0306-4573, https://doi.org/10.1016/j.ipm.
          <year>2019</year>
          .
          <volume>05</volume>
          .009, URL http://dx.doi.org/10.1016/j.ipm.
          <year>2019</year>
          .
          <volume>05</volume>
          .009
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bondarenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fröbe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloucif</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gienapp</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ajjour</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Overview of Touché 2020:
          <article-title>Argument Retrieval</article-title>
          .
          <source>In: Working Notes Papers of the CLEF 2020 Evaluation Labs (Sep</source>
          <year>2020</year>
          ),
          <source>ISSN 1613-0073</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angeli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          . In: Màrquez,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Pighin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Marton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2015</year>
          , Lisbon, Portugal,
          <source>September 17-21</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>632</fpage>
          -
          <lpage>642</lpage>
          , The Association for Computational Linguistics (
          <year>2015</year>
          ), https://doi.org/10.18653/v1/d15-
          <fpage>1075</fpage>
          , URL https://doi.org/10.18653/v1/d15-
          <fpage>1075</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hua</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Limtiaco</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
          </string-name>
          , R.S.,
          <string-name>
            <surname>Constant</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guajardo-Cespedes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strope</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzweil</surname>
          </string-name>
          , R.:
          <article-title>Universal sentence encoder</article-title>
          . CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          ), URL http://arxiv.org/abs/
          <year>1803</year>
          .11175
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Chakrabarty</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hidey</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muresan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>AMPERSAND: argument mining for persuasive online discussions</article-title>
          . In: Inui,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <surname>X</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          , pp.
          <fpage>2933</fpage>
          -
          <lpage>2943</lpage>
          , Association for Computational Linguistics (
          <year>2019</year>
          ), https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -1291, URL https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -1291
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ), URL http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dibia</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Neuralqa: A usable library for question answering (contextual query expansion + bert) on large datasets (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Ein-Dor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shnarch</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dankin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halfon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sznajder</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alzate</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gleize</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choshen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aharonov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slonim</surname>
          </string-name>
          , N.:
          <article-title>Corpus wide argument mining - A working solution</article-title>
          .
          <source>In: The Thirty-Fourth AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2020</year>
          , The Thirty-Second
          <source>Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <source>IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2020</year>
          , New York, NY, USA, February 7-
          <issue>12</issue>
          ,
          <year>2020</year>
          , pp.
          <fpage>7683</fpage>
          -
          <lpage>7691</lpage>
          , AAAI Press (
          <year>2020</year>
          ), URL https://aaai.org/ojs/index.php/AAAI/article/view/6270
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fedus</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>A.M.:</given-names>
          </string-name>
          <article-title>Maskgan: Better text generation via filling in the _______</article-title>
          .
          <source>In: 6th International Conference on Learning Representations, ICLR</source>
          <year>2018</year>
          , Vancouver, BC, Canada, April 30 - May 3,
          <year>2018</year>
          , Conference Track Proceedings, OpenReview.net (
          <year>2018</year>
          ), URL https://openreview.net/forum?id=ByOExmWAb
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Fromm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faerman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seidl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>TACAM: topic and context aware argument mining</article-title>
          . In: Barnaghi,
          <string-name>
            <given-names>P.M.</given-names>
            ,
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Manolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Tzouramanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Vakali</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.) 2019 IEEE/WIC/ACM International Conference on Web Intelligence,
          <string-name>
            <surname>WI</surname>
          </string-name>
          <year>2019</year>
          , Thessaloniki, Greece,
          <source>October 14-17</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          , ACM (
          <year>2019</year>
          ), https://doi.org/10.1145/3350546.3352506, URL https://doi.org/10.1145/3350546.3352506
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Géron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems.</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep learning</article-title>
          ,
          <source>vol. 1</source>
          . MIT press Cambridge (
          <year>2016</year>
          ),
          <source>ISBN 978-0-262-03561-3</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Gretz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen-Karlik</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toledo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lahav</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aharonov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slonim</surname>
          </string-name>
          , N.:
          <article-title>A large-scale dataset for argument quality ranking: Construction and analysis</article-title>
          .
          <source>In: The Thirty-Fourth AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2020</year>
          , The Thirty-Second
          <source>Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <source>IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2020</year>
          , New York, NY, USA, February 7-
          <issue>12</issue>
          ,
          <year>2020</year>
          , pp.
          <fpage>7805</fpage>
          -
          <lpage>7813</lpage>
          , AAAI Press (
          <year>2020</year>
          ), URL https://aaai.org/ojs/index.php/AAAI/article/view/6285
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Henderson</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Rfou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strope</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lukács</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miklos</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurzweil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Efficient natural language response suggestion for smart reply</article-title>
          .
          <source>CoRR abs/1705</source>
          .00652 (
          <year>2017</year>
          ), URL http://arxiv.org/abs/1705.00652
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Holtzman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buys</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forbes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>The curious case of neural text degeneration</article-title>
          .
          <source>In: 8th International Conference on Learning Representations, ICLR</source>
          <year>2020</year>
          ,
          <string-name>
            <given-names>Addis</given-names>
            <surname>Ababa</surname>
          </string-name>
          , Ethiopia,
          <source>April 26-30</source>
          ,
          <year>2020</year>
          , OpenReview.net (
          <year>2020</year>
          ), URL https://openreview.net/forum?id=rygGQyrFvH
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kiros</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zemel</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urtasun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fidler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Skip-thought vectors</article-title>
          . In: Cortes,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems</source>
          <year>2015</year>
          , December 7-
          <issue>12</issue>
          ,
          <year>2015</year>
          , Montreal, Quebec, Canada, pp.
          <fpage>3294</fpage>
          -
          <lpage>3302</lpage>
          (
          <year>2015</year>
          ), URL http://papers.nips.cc/paper/5950-skip
          <article-title>-thought-vectors Mining</article-title>
          ,
          <source>ArgMining@EMNLP</source>
          <year>2017</year>
          , Copenhagen, Denmark, September 8,
          <year>2017</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>59</lpage>
          , Association for Computational Linguistics (
          <year>2017</year>
          ), https://doi.org/10.18653/v1/w17-
          <fpage>5106</fpage>
          , URL https://doi.org/10.18653/v1/w17-
          <fpage>5106</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delangue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cistac</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rault</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Funtowicz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
          </string-name>
          , J.:
          <article-title>Huggingface's transformers: State-of-the-art natural language processing</article-title>
          . CoRR abs/
          <year>1910</year>
          .03771 (
          <year>2019</year>
          ), URL http://arxiv.org/abs/
          <year>1910</year>
          .03771
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>End-to-end open-domain question answering with bertserini</article-title>
          . In: Ammar,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Louis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mostafazadeh</surname>
          </string-name>
          , N. (eds.)
          <article-title>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7,
          <year>2019</year>
          , Demonstrations, pp.
          <fpage>72</fpage>
          -
          <lpage>77</lpage>
          , Association for Computational Linguistics (
          <year>2019</year>
          ), https://doi.org/10.18653/v1/n19-
          <fpage>4013</fpage>
          , URL https://doi.org/10.18653/v1/n19-
          <fpage>4013</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Simple applications of BERT for ad hoc document retrieval</article-title>
          . CoRR abs/
          <year>1903</year>
          .10972 (
          <year>2019</year>
          ), URL http://arxiv.org/abs/
          <year>1903</year>
          .10972
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Yilmaz</surname>
            ,
            <given-names>Z.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Applying BERT to document retrieval with birch</article-title>
          . In: Padó,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP</source>
          <year>2019</year>
          ,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Kong</surname>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <fpage>2019</fpage>
          - System Demonstrations, pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          , Association for Computational Linguistics (
          <year>2019</year>
          ), https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -3004, URL https://doi.org/10.18653/v1/
          <fpage>D19</fpage>
          -3004
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A study of smoothing methods for language models applied to ad hoc information retrieval</article-title>
          .
          <source>In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pp.
          <fpage>334</fpage>
          -
          <lpage>342</lpage>
          , SIGIR '01,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2001</year>
          ), ISBN 1581133316, https://doi.org/10.1145/383952.384019, URL https://doi.org/10.1145/383952.384019
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>