<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Modular Framework for Conversational Search Reproducible Experimentation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Alessio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guglielmo Faggioli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>31</volume>
      <fpage>02</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>The Conversational Search (CS) paradigm facilitates users' interaction with IR systems through natural language sentences, and it is increasingly being used in various scenarios. However, the proliferation of custom conversational search systems and components makes it challenging to compare and and design new CS agents. To tackle this issue, we propose DECAF: a modular and extensible conversational search framework that enables rapid development of conversational agents. DECAF integrates all the necessary components of a modern conversational search system into a uniform interface. DECAF allows for experiments that exhibit a high degree of reproducibility, addressing the reproducibility crisis in the ifeld. DECAF framework includes several state-of-the-art components such as query rewriting, search functions under BoW and dense paradigms, and re-ranking functions. We evaluate the DECAF on two well-known conversational collections, CAsT '19 and CAsT '20, and provide the results as baselines for future practitioners.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Conversational Search (CS) [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] is an emerging paradigm which is drastically innovating
Information Retrieval (IR) by allowing users to issue their queries to the system in the form
of a conversation. This paradigm allows for a very intuitive and seamless interaction between
the human and the system since it is based on natural language sentences. Nevertheless, it
also presents important challenges due to the presence of complex speech structures in the
utterances, such as anaphoras, ellipses or coreferences. Therefore, the system needs to address
these natural language phenomena by keeping track of the conversation state.
      </p>
      <p>
        The advent of Neural Information Retrieval (NIR) models, fueled by Large Language
Models (LLMs), has been helping to overcome some of these challenges and to foster a capillary
adoption of CS in many diferent scenarios. Such techniques, given their complexity, were
originally used mostly for re-ranking [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. With their improvement both in terms of
efectiveness and eficiency, we also witnessed their adoption as first-stage retrieval systems [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ].
The plethora of CS systems that rely on these building blocks allow for addressing the user’s
information need in a number of diferent scenarios. Nevertheless, such a variety made the CS
scenario fragmented from the systems’ design perspective, with hundreds of ad-hoc custom
implementations and variants of IR models and other components, implemented using a wide
range of diferent languages and frameworks, often dificult to integrate together, if not
impossible at all. This causes a number of challenges both in the development of CS systems and in
their experimentation. Firstly, it is dificult to combine various state-of-the-art components
together, since they often come from diferent and/or incompatible libraries and packages; in
turn, this hampers the development of new and competitive approaches. Secondly, alternative
implementations of a component often lead to diferent performance or behavior; in turn, this
hampers the comparability of diferent end-to-end solutions, integrating alternative versions of
the same components. Thirdly, it is extremely dificult, if not impossible, to conduct systematic
experiments where diferent components are combined in all the possible ways, in a grid-like
way [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], in order to break down the contribution of diferent components and analyse their
interactions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Finally, it hinders the reproducibility of experiments, exacerbating the already
prevalent reproducibility crisis in current research [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
        ].
      </p>
      <p>
        To address these issues, we propose DECAF – moDular and Extensible Conversational seaArch
Framework, which is explicitly designed to allow for fast prototyping and development of CS
systems and to enable their systematic evaluation under the traditional Cranfield paradigm.
The framework is designed to allow the integration of all the components that characterize a
modern CS system, including query rewriting, search – both under the Bag-of-Words (BoW)
and dense paradigms – and re-ranking. While written primarily in Java to ensure compatibility
with traditional IR libraries, such as Lucene1, Terrier [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and Anserini [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], it also supports
modern Artificial Intelligence ( AI), Machine Learning (ML), and LLM-based techniques thanks
to the seamless integration of Python scripts. The framework already includes multiple
stateof-the-art (sota) components for query rewriting, traditional BoW similarity functions, NIR
approaches – both sparse and dense – and re-ranking functions. It also allows for the evaluation
on reference collections in the area, such as TREC CAsT 2019 and TREC CAsT 2020. The
components implemented in DECAF act as state-of-the-art baselines for future experiments.
They also provide a template for integrating and designing new components, to extend DECAF
itself. To show the capabilities of this framework, we demonstrate its application to the TREC
CAsT 2019 and TREC CAsT 2020 collections, reporting the results that diferent pipelines can
achieve. Our main contributions are the following:
• Design and develop the DECAF modular and easily extensible framework, which allows
us to seamlessly integrate CS components and instantiate CS pipelines.
• Provide a series of state-of-the-art components that can be used to implement a CS
pipeline.
• Evaluate several CS pipelines, created using DECAF, on two well-known and widely
adopted conversational collections, namely TREC CAsT 2019 and TREC CAsT 2020.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. DECAF Architecture</title>
      <p>In this section, we provide the overview of DECAF, focusing on, the principal modules of its
architecture, components implemented, and software requirements.</p>
      <p>CORPUS
CONVERSATION
CORPUS
PARSER</p>
      <p>INDEXER
CONVERSATION
UTT. UTT. UTT.
SEARCH PIPELINE</p>
      <p>QRY. GEN.</p>
      <p>QRY. GEN.</p>
      <p>RANKED DOC IDS
REWRITER</p>
      <p>SEARCHER</p>
      <p>RERANKER</p>
      <p>RUN WRITER</p>
      <p>TREC
RUN</p>
      <p>EXT.</p>
      <p>RESOURCES/
MODELS</p>
      <p>EXT.</p>
      <p>RESOURCES/
MODELS</p>
      <p>EXT.</p>
      <p>RESOURCES/
MODELS</p>
      <sec id="sec-2-1">
        <title>2.1. Main Modules</title>
        <p>DECAF relies on a modular architecture for index and search pipelines. In practice, to foster
extensibility, as well as standardization, each module of DECAF is defined as a Java interface. As
illustrated in Figure 1, the index pipeline revolves around two main modules: the Corpus Parser
and the Indexer. The former processes the corpus into a stream of documents, which is consumed
by the latter to index them. For the search pipeline, we adopt a multi-stage architecture which
employs the modules that are the most common for CS systems.</p>
        <p>Figure 1 shows the structure of the search pipeline:
• The Topics Parser reads in input a file – e.g., written in TREC format – and provides parsed
conversations and utterances to be processed by the rest of the framework.
• The Rewriter modifies the text of the utterances by performing pronoun disambiguation
and adding contextual information extracted from previous utterances in the conversation.
• The Searcher takes the (possibly rewritten) utterance text as input, generates the query
and retrieves a set of candidate documents to answer the provided question.
• The ranked list of documents generated as output by the Searcher is consumed by the
Reranker. This module is designed to apply complex and resource-consuming re-ranking
operations upon the Searcher output, boosting the performance of the CS pipeline.</p>
        <p>Furthermore, we exploit two additional utility models: the Query Generator and the Run
Writer. Both the searcher and the re-ranker modules exploit the Query Generator to obtain a
representation of the user utterance — possibly by combining it with previous ones — that is
directly used at retrieval time. Finally, the Run Writer is a utility module meant to write the run
on a file, so that it can be further used or evaluated.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Components Implemented</title>
        <p>Requirements The core of DECAF has been developed in Java, with the integration of Python
for machine-learning-oriented functionalities. It requires Java Development Kit (JDK) 11 and
Python 3.8 for execution. The framework is built upon Lucene 8.8.1.</p>
        <p>Every component is expected to implement the Java interface of the specific module, which
defines one or more methods specific to the performed job. The components implemented with
DECAF and described in the remainder of this section can be used by practitioners both as
baselines as well as templates to guide practitioners in extending DECAF. Every component of
the framework has some configurable parameters, which must be passed through constructor
arguments. We also implemented a configuration system, based on .properties files, that allows
the user to specify them in a user-friendly manner.</p>
        <p>Corpus Parser We implement three components that perform corpus parsing. The first
processes the passages contained within MS-MARCO version 12 dataset. Another parser
addresses the paragraph corpus of TREC CAR v2.0 3. The third parser processes any corpus
based on tab-separated files using the “ID &lt;tab&gt; Text &lt;newline&gt;” format. Furthermore,
it might be necessary to index documents from multiple sources with diferent formats at once.
The last corpus parser component eases this by allowing to instantiate multiple parsers that are
used to parse diferent corpora.</p>
        <p>
          Indexer The framework comes with three distinct indexer components a BoW indexer, a
SPLADE indexer, and a Dense indexer. The BoW indexer is a wrapper around Lucene indexing
operations. The BoW indexer component has been extended for the SPLADE indexer which
is specific to the homonyms neural retrieval model. It replaces the standard tokenization and
analysis pipeline performed by Lucene with the SPLADE model inference. Finally, the dense
indexer component is specific for dense retrieval models. It is built on top of two well-known
libraries: Transformers [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and Facebook AI Similarity Search (FAISS) [19].
Conversation and Utterance Components To provide a unified interface to the data, we
define two components, called Utterance and Conversation. The former is a data structure that
provides unified access to the utterance and subsequent transformations operated by diferent
components. The latter provides utilities to access the data and groups together utterances
belonging to the same conversation. At runtime, each module will extract the needed data (e.g.,
the textual content of the utterance, its rewritten version, or the ranked list associated) from
the Utterance component, passing through the Conversation interface. Upon completion of the
required operations, each module will save the computed results for a specific utterance (e.g., a
new rewriting of the utterance, the re-ranked run), within the Utterance component, so that
the next module can access it.
        </p>
        <p>The vast majority of implemented components take only two parameters: the Conversation
object containing the data regarding the specific conversation at hand, and the ID of the current
utterance on which we have to operate (e.g., we want to rewrite or we are retrieving the
documents for). This approach ensures great flexibility in the design of the components since
it is possible to access the entire data structure for the whole conversation – in particular to
previously issued utterances and retrieved responses. Furthermore, it allows for easily expanding
2https://msmarco.blob.core.windows.net/msmarcoranking/collection.tar.gz
3https://trec-car.cs.unh.edu/datareleases/v2.0/paragraphCorpus.v2.0.tar.xz
the framework, since each component can behave as a black-box building block operating only
on the Conversation and Utterance objects.</p>
        <p>Topics Parser DECAF provides five topic parsers designed explicitly to handle TREC CAsT
2019 and TREC CAsT 2020 evaluation topics. More in detail, for each collection, DECAF has a
parser for each type of utterance – i.e., either manual or automatic utterances. This component
takes in input the topics file and generates in output a stream of Conversation objects. Each of
them is further split into the individual Utterances that compose it.</p>
        <p>Rewriter The Rewriter module expands the original text of the utterance into the rewritten
text and stores it within the specific Utterance object. In DECAF, we provide two sota rewriting
approaches using of-the-shelf resources, either employing coreference resolution libraries
or pre-trained T5 models. In implementation terms, two library-based components carry
out coreference resolution. In particular, we implement one component based on AllenNLP
framework [20] and one using Fastcoref. Fastcoref is a coreference resolution utility based on
the LingMess [21] architecture. In Table 1, we dub this approach CR. For the second approach,
we employ a T5 model [22]. The particular instance of T5 used in the experimental part is
publicly available and pre-trained specifically for conversational search question rewriting 4. To
maintain the modularity of the framework, we also implement a rewriter which corresponds to
the “identity” operation and returns the original text unchanged. It should be used in all cases
where the utterances have been rewritten externally from the framework or if the practitioner
does not wish to carry out any form of rewriting.</p>
        <p>Query Generator The whole conversation is considered as input for components
implemented within this module, producing as output a representation of the utterance that embeds
the context. Within DECAF, we provide three diferent query generator components. The FLC
query generator takes in input the utterances for a given conversation and outputs a weighted
sum of the rewritten text of the First, Last and Current (FLC) utterances. The rationale is
that the first utterance often gives the general topic of the conversation, while the previous
one is the most likely to be referenced again by the current utterance. The output of this
component is meant to synergize with the rewriter’s efort to bring contextual information into
the current query, especially useful when the quality of its output is less than ideal. A second
query generator provided within DECAF considers the concatenation of the rewritten text for
all previous utterances of the current conversation. The final query generator implemented
(dubbed C in Table 1) considers only the – possibly rewritten – text of the current utterance,
without taking into account any of the previous ones. To operate, the query generator access
the Utterance object referring to the current utterance, and possibly to the Utterance objects
defined for previous utterances.</p>
        <p>Searcher The searchers are specular to the indexers used in the indexing phase, with three
diferent searcher components, one for BoW Lucene-based similarity functions, one for SPLADE
and one for dense models.
4https://huggingface.co/castorini/t5-base-canard</p>
        <p>
          The first Searcher component, the BoW one, by being based on Lucene, can be instantiated
with any of the classical BoW models already implemented in it, such as BM25 [23], Language
Model (LM) [24] or the Vector Space Model [25]. For example, in the experimental benchmarking
reported in Section 3, we exploit BM25. BM25 [23] is BoW IR model that ranks documents based
on the occurrence and frequency of terms. The second similarity function that we take into
consideration is SPLADE. SPLADE [
          <xref ref-type="bibr" rid="ref9">9, 26</xref>
          ] is a sparse NIR model that learns sparse representation
for both queries and documents via the BERT MLM head and sparse regularization.We separate
it from the previous component, even though they are both BoW because SPLADE requires
computing the BoW sparse representation of the query. In our experimental analysis, we use a
publicly available set of weigths5. Finally, we implement a component for dense retrieval that
can be used with FAISS indexes. In particular, we instantiate it with BERT. We exploit a publicly
available BERT instance fine-tuned specifically for IR6. Components implementing this module
invoke the Query Generator sub-component that provides them with a searcher-specific query
representation directly used for retrieval. Upon retrieval completion, components within the
Searcher module must save the top-k retrieved document IDs within the Utterance component.
Reranker This module must set the final ranked list of documents in the Utterance object.
This final ranking is the output of the whole search pipeline for the current utterance.
        </p>
        <p>
          Components implemented within the reranker module consider the documents included in
the ranking list produced by the Searcher and generate a new relevance score for each of them.
At the current time, DECAF comes with a reranker component: Transformers. The Transformers
reranker employs the Transformers (Hugging Face) [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] library to apply machine-learning
models, such as BERT, to the text of both the rewritten query and of the documents retrieved by
the Searcher, then evaluate the similarity between them. Notice that the specific transformers
model used can be chosen at runtime, by specifying it in the properties file. We experiment with
BERT, since several works already observed the efectiveness of BERT for the re-ranking task
in the CS and IR domains [27, 28, 29]. Finally, we assume that a user might not be interested
in re-ranking documents. In that case, we included the “identity” reranker which returns the
ranked list of documents generated by the Searcher.
        </p>
        <p>Run Writer This final module can be instantiated into two modalities. The first component –
dubbed Trec Eval – produces a run using the standard TREC eval format. In particular, it saves
inside the runs sub-folder of the framework a tab-separated file with six columns: the query id,
the Q0 placeholder, the document id and ranking, the retrieval score, and the user-specified id of
the run. Secondly, to ease the debugging, the “debug” component saves on file both ranked lists
produced by the Searcher and Reranker together with a file containing the top-  documents
retrieved by the system to allow for manual inspection and precise failure analysis. DECAF is
available as open source under the Creative Commons Attribution-ShareAlike 4.0 International
License at the following address: https://github.com/alemarco96/DECAF.
5https://huggingface.co/naver/eficient-splade-V-large-query
6https://huggingface.co/sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>The framework has been tested on TREC CAsT 2019 and TREC CAsT 2020 collections7. We report
four used measures: Recall (R)@100, Mean Reciprocal Rank (MRR), normalized Discounted
Cumulative Gain (nDCG)@3, and nDCG@10, computed using the trec_eval8 tool. Following
the oficial evaluation settings on TREC CAsT 2020 dataset, we consider documents as relevant
if their relevance score is ≥ 2 [30].</p>
      <p>
        Table 1 shows the experimental results on TREC CAsT 2019. The initial experiment (#1)
evaluates the performance of the BM25 model on automatic utterances: the system performs
poorly due to the lack of contextual information. Results are in line with those observed
by [31] with respect to the baseline used for TREC CAsT 2019 (LMs). Experiments #2 and
#3 add the Rewriter component to the pipeline, which brings forward implied substantives
into the text of the utterance, resulting in improved performance. Run #3 employs T5 and
achieves double the performance w.r.t the automatic baseline across all measures. The FLC
variant (used in Experiments #4 and #5) combines the first, previous and current utterances’
text. This simple heuristic is beneficial, particularly in cases where the rewriter fails to correctly
expand and disambiguate pronouns with the relative entity. Again, results are consistent with the
7Due to space reasons, we report results only for TREC CAsT 2019. See [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for TREC CAsT 2020 results.
8https://github.com/usnistgov/trec_eval
observations made in the TREC CAsT 2019 overview [31]. Experiments #6 to #9 demonstrate that
performing an additional re-ranking step using a BERT model can further improve performance
for all experimental settings. A comparison between experiment #1 and #6 shows that, despite
the absence of any form of rewriting, the performance is significantly improved by using BERT
as reranker, thus alleviating the issues related to the “context representation” introduced by the
CS setting. The performance of our implementation of the BERT-based re-ranking strategy is
similar to the average performance of systems submitted at TREC CAsT 2019 [31]. If the same
BERT model is used for first-stage dense retrieval (Experiment #10), it achieves performance
similar to a standard lexical model such as BM25 (experiment #3).Experiments #11 and #12
evaluate SPLADE as a first-stage retrieval method. The particular SPLADE instance employed
has been fine-tuned for passage retrieval, utilizing two distinct models for documents and
queries. It performs document expansion by adding a large number of terms not found in the
original text while producing sparser representations for queries. Test #12 shows that SPLADE
achieves the best results, outperforming BM25 with BERT (experiment #9) by 14.4% in terms
of recall, 4.0% for nDCG@3, and 8.9% for nDCG@10 when the T5-based rewriter is used. It
also surpasses all original TREC CAsT 2019 automatic submissions, obtaining an improvement
against the Best Automatic (BA) of respectively 25.0%, 12.4%, 23.3%, and 23.4% for the four
measures considered. We now focus on the second part of Table 1, where the manually rewritten
utterances are used. Notice that, the performance observed on this kind of utterance represents
an upper bound on the performance that can be achieved by retrieval systems. In fact, in a real
case scenario, such utterances would not be available. The performance diferences between
automatic and manual utterances are mostly independent of the retrieval model utilized, with
average diferences of 8.6% for recall, 4.5% for MRR, 8.0% for nDCG@3 and 6.9% for nDCG@10.
These experiments show that T5-based query rewriting methods are very efective on the TREC
CAsT 2019 dataset. The best performance is achieved again using the SPLADE model, similar to
the results observed for the automatic runs. When comparing our best manual run to the Best
Manual (BM) run among the original submissions, we observe slightly lower scores, especially
for the nDCG@10 measure with a diference of 13.3%.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In this work, we have presented DECAF, a novel resource for conducting experiments within the
Conversational Information Seeking (CIS) scenario. This work is motivated by the constantly
growing plethora of heterogeneous CS systems that have been recently devised thanks to
the advent of LLMs. DECAF has been designed to favour comparability between systems,
fast prototyping and reproducibility, and in turn, alleviate the current reproducibility crisis.
Therefore, DECAF has been designed around three key features: modularity, expandability and
reproducibility. DECAF comes with a set of state-of-the-art components already implemented,
including query rewriting, searching and re-ranking. The framework is also flexible enough to
integrate additional components without much efort. We have evaluated several CS pipelines
instantiated through DECAF on two well-known collections, TREC CAsT 2019 and TREC CAsT
2020 . Future work will concern the extension of DECAF with new components as well as
support for the mixed-initiative task.
[19] J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with gpus, IEEE Trans. Big
Data 7 (2021) 535–547. URL: https://doi.org/10.1109/TBDATA.2019.2921572. doi:10.1109/
TBDATA.2019.2921572.
[20] M. Gardner, J. Grus, M. Neumann, O. Tafjord, P. Dasigi, N. F. Liu, M. E. Peters, M. Schmitz,
L. Zettlemoyer, Allennlp: A deep semantic natural language processing platform, CoRR
abs/1803.07640 (2018). URL: http://arxiv.org/abs/1803.07640. arXiv:1803.07640.
[21] S. Otmazgin, A. Cattan, Y. Goldberg, Lingmess: Linguistically informed multi expert
scorers for coreference resolution, CoRR abs/2205.12644 (2022). URL: https://doi.org/10.
48550/arXiv.2205.12644. doi:10.48550/arXiv.2205.12644. arXiv:2205.12644.
[22] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu,
Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach.</p>
      <p>Learn. Res. 21 (2020) 140:1–140:67. URL: http://jmlr.org/papers/v21/20-074.html.
[23] S. E. Robertson, H. Zaragoza, The probabilistic relevance framework: BM25 and beyond,
Found. Trends Inf. Retr. 3 (2009) 333–389. URL: https://doi.org/10.1561/1500000019. doi:10.
1561/1500000019.
[24] C. X. Zhai, Statistical language models for information retrieval: A critical review, Found.</p>
      <p>Trends Inf. Retr. 2 (2008) 137–213. URL: https://doi.org/10.1561/1500000008. doi:10.1561/
1500000008.
[25] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Inf. Process.</p>
      <p>Manag. 24 (1988) 513–523.
[26] T. Formal, C. Lassance, B. Piwowarski, S. Clinchant, SPLADE v2: Sparse lexical and
expansion model for information retrieval, CoRR abs/2109.10086 (2021). URL: https:
//arxiv.org/abs/2109.10086. arXiv:2109.10086.
[27] I. Mele, C. I. Muntean, F. M. Nardini, R. Perego, N. Tonellotto, O. Frieder, Topic propagation
in conversational search, in: J. X. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock,
J. Wen, Y. Liu (Eds.), Proceedings of the 43rd International ACM SIGIR conference on
research and development in Information Retrieval, SIGIR 2020, Virtual Event, China,
July 25-30, 2020, ACM, 2020, pp. 2057–2060. URL: https://doi.org/10.1145/3397271.3401268.
doi:10.1145/3397271.3401268.
[28] R. F. Nogueira, K. Cho, Passage re-ranking with BERT, CoRR abs/1901.04085 (2019). URL:
http://arxiv.org/abs/1901.04085. arXiv:1901.04085.
[29] S. MacAvaney, A. Yates, A. Cohan, N. Goharian, CEDR: contextualized embeddings
for document ranking, in: B. Piwowarski, M. Chevalier, É. Gaussier, Y. Maarek, J. Nie,
F. Scholer (Eds.), Proceedings of the 42nd International ACM SIGIR Conference on Research
and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019,
ACM, 2019, pp. 1101–1104. URL: https://doi.org/10.1145/3331184.3331317. doi:10.1145/
3331184.3331317.
[30] J. Dalton, C. Xiong, J. Callan, Cast 2020: The conversational assistance track overview, in:
E. M. Voorhees, A. Ellis (Eds.), Proceedings of the Twenty-Ninth Text REtrieval Conference,
TREC 2020, Virtual Event [Gaithersburg, Maryland, USA], November 16-20, 2020, volume
1266 of NIST Special Publication, National Institute of Standards and Technology (NIST),
2020. URL: https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.C.pdf.
[31] J. Dalton, C. Xiong, J. Callan, TREC cast 2019: The conversational assistance track overview,
CoRR abs/2003.13624 (2020). URL: https://arxiv.org/abs/2003.13624. arXiv:2003.13624.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alessio</surname>
          </string-name>
          , G. Faggioli,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Ferro, DECAF: a Modular and Extensible Conversational Search Framework</article-title>
          , in
          <source>: SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Taipei, Taiwan,
          <source>July 23-27</source>
          ,
          <year>2023</year>
          , ACM,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zamani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Trippas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dalton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Radlinski</surname>
          </string-name>
          , Conversational Information Seeking.
          <article-title>An Introduction to Conversational Search, Recommendation, and Question Answering, arXiv.org, Information Retrieval (cs</article-title>
          .IR) arXiv:
          <fpage>2201</fpage>
          .08808 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cavedon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <source>Conversational search - A report from dagstuhl seminar 19461</source>
          , CoRR abs/
          <year>2005</year>
          .08658 (
          <year>2020</year>
          ). URL: https://arxiv. org/abs/
          <year>2005</year>
          .08658. arXiv:
          <year>2005</year>
          .08658.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          , G. de Melo,
          <article-title>PACRR: A position-aware neural IR model for relevance matching</article-title>
          , in: M.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hwa</surname>
          </string-name>
          , S. Riedel (Eds.),
          <source>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2017</year>
          , Copenhagen, Denmark, September 9-
          <issue>11</issue>
          ,
          <year>2017</year>
          , Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>1049</fpage>
          -
          <lpage>1058</lpage>
          . URL: https://doi.org/10.18653/v1/d17-
          <fpage>1110</fpage>
          . doi:
          <volume>10</volume>
          .18653/v1/d17-
          <fpage>1110</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Power</surname>
          </string-name>
          ,
          <article-title>End-to-end neural ad-hoc ranking with kernel pooling</article-title>
          , in: N.
          <string-name>
            <surname>Kando</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Sakai</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          de Vries, R. W. White (Eds.),
          <source>Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Shinjuku, Tokyo, Japan,
          <source>August</source>
          <volume>7</volume>
          -
          <issue>11</issue>
          ,
          <year>2017</year>
          , ACM,
          <year>2017</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          . URL: https://doi.org/10.1145/3077136.3080809. doi:
          <volume>10</volume>
          .1145/3077136.3080809.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , X. Cheng,
          <article-title>A deep top-k relevance matching model for ad-hoc retrieval</article-title>
          , in: S. Zhang, T. Liu,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          Li (Eds.),
          <source>Information Retrieval - 24th China Conference, CCIR</source>
          <year>2018</year>
          , Guilin, China,
          <source>September 27-29</source>
          ,
          <year>2018</year>
          , Proceedings, volume
          <volume>11168</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2018</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>27</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -01012-
          <issue>6</issue>
          _2. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -01012-6\ _2.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tang</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Overwijk</surname>
          </string-name>
          ,
          <article-title>Approximate nearest neighbor negative contrastive learning for dense text retrieval</article-title>
          ,
          <source>in: 9th International Conference on Learning Representations, ICLR</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , Austria, May 3-
          <issue>7</issue>
          ,
          <year>2021</year>
          , OpenReview.net,
          <year>2021</year>
          . URL: https://openreview.net/forum?id=zeFrfgyZln.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ma,
          <article-title>Optimizing dense retrieval model training with hard negatives</article-title>
          , in: F. Diaz,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Sakai (Eds.),
          <source>SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Virtual Event, Canada,
          <source>July 11-15</source>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>1503</fpage>
          -
          <lpage>1512</lpage>
          . URL: https://doi.org/10.1145/3404835.3462880. doi:
          <volume>10</volume>
          .1145/3404835.3462880.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          , S. Clinchant,
          <article-title>SPLADE: sparse lexical and expansion model for ifrst stage ranking</article-title>
          ,
          <source>CoRR abs/2107</source>
          .05720 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2107.05720. arXiv:
          <volume>2107</volume>
          .
          <fpage>05720</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2009</year>
          :
          <article-title>Grid@clef pilot track overview</article-title>
          , in: C. Peters,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kurimo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mostefa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Peñas</surname>
          </string-name>
          , G. Roda (Eds.),
          <source>Multilingual Information Access Evaluation I. Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2009</year>
          , Corfu, Greece,
          <source>September 30 - October 2</source>
          ,
          <year>2009</year>
          , Revised Selected Papers, volume
          <volume>6241</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2009</year>
          , pp.
          <fpage>552</fpage>
          -
          <lpage>565</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -15754-7_
          <fpage>68</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -15754-7\_
          <fpage>68</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Silvello</surname>
          </string-name>
          ,
          <article-title>A general linear mixed models approach to study system component efects</article-title>
          , in: R.
          <string-name>
            <surname>Perego</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Aslam</surname>
            ,
            <given-names>I. Ruthven</given-names>
          </string-name>
          , J. Zobel (Eds.),
          <source>Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2016</year>
          , Pisa, Italy,
          <source>July 17-21</source>
          ,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          . URL: https: //doi.org/10.1145/2911451.2911530. doi:
          <volume>10</volume>
          .1145/2911451.2911530.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Critically examining the "neural hype": Weak baselines and the additivity of efectiveness gains from neural ranking models</article-title>
          , in: B.
          <string-name>
            <surname>Piwowarski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chevalier</surname>
            , É. Gaussier,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Maarek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Scholer</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2019</year>
          , Paris, France,
          <source>July 21-25</source>
          ,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>1129</fpage>
          -
          <lpage>1132</lpage>
          . URL: https://doi.org/10.1145/3331184.3331340. doi:
          <volume>10</volume>
          .1145/3331184.3331340.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Dacrema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <article-title>Are we really making much progress? A worrying analysis of recent neural recommendation approaches</article-title>
          , in: T.
          <string-name>
            <surname>Bogers</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Said</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Brusilovsky</surname>
          </string-name>
          , D. Tikk (Eds.),
          <source>Proceedings of the 13th ACM Conference on Recommender Systems, RecSys</source>
          <year>2019</year>
          , Copenhagen, Denmark,
          <source>September 16-20</source>
          ,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>109</lpage>
          . URL: https://doi.org/10.1145/3298689.3347058. doi:
          <volume>10</volume>
          .1145/3298689.3347058.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Reproducibility challenges in information retrieval evaluation</article-title>
          ,
          <source>ACM J. Data Inf. Qual</source>
          .
          <volume>8</volume>
          (
          <issue>2017</issue>
          )
          <article-title>8:1-8:4</article-title>
          . URL: https://doi.org/10.1145/3020206. doi:
          <volume>10</volume>
          .1145/3020206.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kharazmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scholer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vallet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <article-title>Examining additivity and weak baselines</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>34</volume>
          (
          <year>2016</year>
          )
          <volume>23</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          :
          <fpage>18</fpage>
          . URL: https://doi.org/10.1145/2882782. doi:
          <volume>10</volume>
          . 1145/2882782.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , D. Johnson,
          <article-title>Terrier information retrieval platform</article-title>
          , in: D. E. Losada,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Fernández-Luna</surname>
          </string-name>
          (Eds.),
          <source>Advances in Information Retrieval, 27th European Conference on IR Research</source>
          , ECIR
          <year>2005</year>
          , Santiago de Compostela, Spain, March
          <volume>21</volume>
          -23,
          <year>2005</year>
          , Proceedings, volume
          <volume>3408</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2005</year>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>519</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -31865-1_
          <fpage>37</fpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>540</fpage>
          -31865-1\_
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trotman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Chattopadhyaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Foley</surname>
          </string-name>
          , G. Ingersoll, C. MacDonald, S. Vigna,
          <article-title>Toward reproducible baselines: The open-source IR reproducibility challenge</article-title>
          , in: N.
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hauf</surname>
          </string-name>
          , G. Silvello (Eds.),
          <source>Advances in Information Retrieval - 38th European Conference on IR Research</source>
          , ECIR
          <year>2016</year>
          , Padua, Italy, March
          <volume>20</volume>
          -23,
          <year>2016</year>
          . Proceedings, volume
          <volume>9626</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2016</year>
          , pp.
          <fpage>408</fpage>
          -
          <lpage>420</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -30671-1_
          <fpage>30</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -30671-1\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          , in: Q. Liu, D. Schlangen (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20</source>
          ,
          <year>2020</year>
          , Association for Computational Linguistics,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2020</year>
          .emnlp-demos.6. doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>