<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Intra-document Block Pre-ranking for BERT-based Long Document Information Retrieval - Abstract</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minghan Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Gaussier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Univ. Grenoble Alpes</institution>
          ,
          <addr-line>CNRS, LIG, Grenoble</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Information retrieval using transformer architectures, especially pretrained models like BERT, has seen great improvements. However, due to the quadratic complexity of the self-attention mechanism, for long documents, directly using such models is unsatisfactory. Truncating long documents is a widely adopted approach. Other researchers also propose to separate a long document into passages, each of which can be treated by a standard BERT model. The other solution is modifying the self-attention mechanism to make it sparser. However, these approaches either lose information or have high computational complexity and memory requirement. We propose a slightly diferent approach that firstly pre-ranks passages within a long document according to the query, after which the filtered top-ranking passages are combined for later ranking to obtain the document relevance score. Experiments on IR collections demonstrate the SOTA level efectiveness of the proposed approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Neural IR</kwd>
        <kwd>Document Representation for IR</kwd>
        <kwd>BERT-based Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Document information retrieval (IR) is used in many applications in our daily lives, including
as web search. Benefiting from deep neural networks, Neural Information Retrieval (Neural
IR) has led to the development of numerous interesting IR models. The transformer model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
which is based on the multi-head attention mechanism, has shown to be more parallelizable
and of greater quality than recurrent neural network models. Based on the transformer encoder,
Devlin et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose Bidirectional Encoder Representations from Transformers (BERT) by
pre-training it on large scale corpus using self-supervised learning. Fine-tuning on BERT-like
models enables one to produce cutting-edge models on a variety of tasks including information
retrieval [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ]. Despite its efectiveness and intuitive characteristics, the amount of input
tokens is restricted to 512 due to the quadratic complexity of the self-attention mechanism,
which is less than the length of a long document.
      </p>
      <p>
        To tackle this issue in IR, three techniques have been presented. The first kind is truncation
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] which directly uses the beginning tokens in long documents. The second type involves
segmenting long documents into shorter passages, where a hierarchical architecture can be
used. The last focuses on modifying the self-attention to use a sparser attention mechanism.
      </p>
      <sec id="sec-1-1">
        <title>Query-document</title>
        <p>Relevance Scoren
(another document)
Query-document
Relevance Score1
Label1</p>
      </sec>
      <sec id="sec-1-2">
        <title>Deep Neural IR Network</title>
        <p>query</p>
        <p>
          However, all three techniques have drawbacks such as information loss, high computational
costs, memory requirements or requiring modifying CUDA kernels [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ].
        </p>
        <p>
          From a diferent perspective, we propose a new framework [
          <xref ref-type="bibr" rid="ref7 ref9">7, 9</xref>
          ] for long document
information retrieval. Similar to human judgement process [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], this framework firstly searches
relevance blocks within a long document according to the query. Then top ranking blocks are
combined as the query-directed summary which suits the BERT’s capacity. Finally the relevance
score of a long document is obtained via a BERT model on the query and summary which can
be regarded as an aggregation of local relevance information.
        </p>
        <p>This extended abstract describes this framework and the remainder of the paper is organized
as follows: Section 2 will describe the proposed framework with classical IR approach for block
ranking. While section 3 will describe two semantic block matching approaches. Then Section
4 will show the experimental results.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The proposed architecture with classical IR functions</title>
      <p>
        To begin, we introduce the proposed framework, which utilizes classical IR functions such
as BM25 for intra block ranking. It is illustrated in Figure 1. The query-block scoring part,
as can be seen, is used to identify relevant blocks across the whole document, which may be
regarded as a pre-ranking strategy. A neural IR network that generates relevance scores for a
learning-to-rank (LTR) loss is represented by the Deep Neural IR Network. In this study, we
focus here on two state-of-the-art neural IR models, namely Vanilla BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and PARADE [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
A long document is firstly segmented into blocks, then the query-block scoring step firstly picks
the relevant blocks according its retrieval status value (RSV) with the query using e.g. BM25:
 (, )25 = ∑︀∈∩  () · 1· (1− +· )+ , where  is the length of block ,
 the average length of the blocks in , and 1 and  are two hyperparameters.
      </p>
      <p>The IDF is based on documents rather than blocks, as using blocks instead of documents
might lead to bias, as important words in a document are likely to appear in many blocks.
KeyB(vBERT) We call the model KeyB(vBERT) when the deep neural IR network is a BERT
model. The most relevant blocks are concatenated together in their order of appearance in
query
bbbbbbbblllllllloooooooocccccccckkkkkkkk86543271 (BdEoSbRcclTlooeSrCcvienekalgelpclBaetclTvoiotceypk)ls qqSCSSCbuuloEEELLeecrSrSkPPPyy1 block ssc(aoemrviBnaegElmmRooTddeells) in diferent time slices
block9 block7
blockn blockn
One Long Document SEP</p>
      <p>CLS
EmbeddingL(ienveaalrmLoadyeel)r bBeEfRorTe, edvoecrlyevbelolctokkisensscoinrepdu,t to</p>
      <p>only for block scoring, no gradient.</p>
      <p>(traBinEmRoTdel) ECmLSbeddinLg(tirnaeinarmLoadyeel)r (QaQuSneocurtoeyhrr-eyder-o1ddcoocSc)corenLLToRss LLaabbeelln1
the document and with the query to form the input of BERT. The number  of selected blocks
depends on the capacity of BERT (512 tokens).</p>
      <p>
        KeyB(PARADE) PARADE [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a cutting-edge model that produces a query-document
representation from query-passage representations. Denoting by  the ℎ passage and  the

corresponding query-passage representation, one has:  =  (, ). The query-passage

representations are then aggregated to obtain the query-document representation. We propose
here to select a fixed, small number of passages, denoted by , to address PARADE model’s high
complexity for large numbers of passages and the potential issue of including noise signals. As
shown in Figure 1, the selecting key block stage allows for eficiency and efectiveness.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Learning to select blocks</title>
      <p>
        KeyB(vBERT) We propose here a strategy that aims at exploiting BERT to compute the
relevance score of a block. The overall architecture of the model proposed is depicted in Figure 2,
in which the same BERT model and linear layer are used at diferent time slices, first to compute
a query-block representation, from which ([CLS] embedding from BERT) the relevance score of
the block is derived, and then to compute the query-document representation ([CLS] embedding)
based on the top ranked blocks, finally to obtain the score of the document. This second part is
identical to the KeyB(vBERT) model, the only diference lying in the way the blocks are selected.
For the first part, both the BERT model and the linear layer are just utilized for scoring and are
not trained (hence the phrase "eval model" used in Figure 2) which reduces the complexity.
Extend for late interaction approach (ICLI) Previous approaches are interaction based
methods for long documents which is computation expensive. We propose to seek a solution for
late interaction [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] based approach that can handle long documents. The late interaction based
method pre-stores the contextualized tokens which are learned and interacts with query tokens.
To be specific, each query token interacts with the document tokens and obtains the maximum
similarity, then all query tokens’ obtained similarities are summed as the final query-document
relevance score. Despite its eficiency, ColBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] cannot handle long documents.
      </p>
      <p>We address this problem here through a BERT-based dense intra-ranking and contextualized
late interaction (ICLI) with multi-task learning and the architecture is shown in Figure 3. Firstly a
long document is also segmented into passages. Then each contextualized token in the passages
intra passage ranking
FFN2
query CLS</p>
      <p>FFN1
BERT
…
…
pre-stored
select first passage and other top
ranking passages for late interaction score
fine-grained aggregation
interaction W</p>
      <p>FFN2</p>
      <p>doc score
FFN1</p>
      <p>
        BERT
passage1 CLS …
pa…ssageN … CLS …
from a long document
can be obtained by the BERT model. The tokens are passed to a one-layer feedforward neural
network   1 to obtain the compressed embeddings (dimension size 128). The query tokens
and passage tokens are interacted as in ColBERT. A long document may contains many passages
and plenty of potentially not relevant passages. Similar as above methods, we propose to select
relevant passages before late interaction. In fact this can be done by BM25 and we call this
ICLI-BM25. To deal with the potential issue of exact matching, we also want to include semantic
matching in this approach. Inspired by the eficient dense retrieval, we want to take advantage
of the [CLS] embeddings and use dot-product. To do so, the [CLS] embedding is also passed to
a one-layer feedforward neural network   2 to obtain the embedding (dimension size 128)
for dense passage ranking. The first passage is always selected so that the [CLS] dot-product
of the first passage can be compared with the document level label to obtain the loss ℒ1 and
train the model to generate good [CLS] embeddings. The document score is the aggregation
of passage scores through a weighted sum, then another loss is obtained ℒ2 for training good
document embeddings. We use multi-task learning to train the model. As the two losses may
have diferent scales, we combine them to obtain the final loss in a parameter learning way:
ℒ = 2112 ℒ1 + 2122 ℒ2 + log(1 +  12) + log(1 +  22), where  1 and  2 are two learning parameters
for multi-task learning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which enforce positive regularization. During deployment, each
document’s contextualized tokens are pre-computed and stored for eficient late interaction.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and conclusion</title>
      <p>
        We report here the results on the document reranking task of TREC 2019 Deep Learning Track
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        We take other baseline results from QDS-Transformer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and implement
PARADETransfomer [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which is noted as PARADE and the proposed KeyB methods with pairwise hinge
loss, each model is trained using Adam optimizer (the transformer layers are trained with a rate
of 2e-5 while the linear layer with a rate of 1e-3) and each batch contains 2 positive-negative
document pairs. Following [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], 16 passages are obtained for the original PARADE each with 225
tokens and stride size 200. For the variant of PARADE we have proposed, we have used BM25
to select the top 5 passages balancing efectiveness and eficiency. ColBERT is also implemented
and for the proposed ICLI methods, pairwise RankNet loss is used with a learning rate of 1e-5
and each batch contains 8 positive-negative pairs. The “BERT-Base, Uncased, L=12, H=768”
pre-trained language model is used in all neural IR models based on BERT.
      </p>
      <p>Results Experimental results are displayed in Table 1. The overall average results of
KeyB(vBERT) models, particularly KeyB(vBERT), which employs the BERT itself to choose
blocks, exceed the baseline models by a large margin and achieve SOTA level efectiveness,
with a score of 0.707 for NDCG@10. Similar to KeyB(vBERT) models, experimental results
show the proposed KeyB(PARADE5)25 is also efective. In terms of NDCG@10, the proposed
approach with five passages outperforms the original PARADE with 16 passages. In terms of the
suggested late interaction based approach ICLI, the results reveal that the proposed approaches
outperform baseline models when extended for long document retrieval, with the exception
of PARADE in terms of MAP. ICLI surpasses the original ColBERT approach, with ICLI-dot
achieving 0.705 in terms of NDCG@10, which is 8.46 percent greater than the original ColBERT
method.</p>
      <p>Comparing with sparse attention based methods, it is shown that the proposed select blocks
models obtains comparable or better results. They all outperform QDS-Transformer in terms
of NDCG@10. KeyB(vBERT) and KeyB(PARADE5)25 obtain better results in terms of
MAP, while others are slightly lower. It’s worth mentioning that, unlike QDS-Transformer, our
methods don’t necessitate altering CUDA kernels.</p>
      <p>In conclusion, the proposed KeyB models are interaction based models while ICLI models
are eficient late-interaction models for long document retrieval. These results show that
the proposed pre-ranking framework for IR is efective, and that using learning or semantic
matching for block selection has more potential. In the future, we will also seek such a solution
for PARADE model.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and the
Chinese Scholarship Council (CSC) grant No.201906960018.
relevance matching, in: Proceedings of the 2017 Conference on Empirical Methods in
Natural Language Processing, 2017, pp. 1049–1058.
[15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).
[16] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, 2020.
arXiv:2004.05150.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>6000</fpage>
          -
          <lpage>6010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>Passage re-ranking with bert</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>04085</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>MacAvaney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goharian</surname>
          </string-name>
          , Cedr:
          <article-title>Contextualized embeddings for document ranking</article-title>
          ,
          <source>in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1101</fpage>
          -
          <lpage>1104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          ,
          <article-title>Deeper text understanding for ir with contextual neural language modeling</article-title>
          ,
          <source>in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>985</fpage>
          -
          <lpage>988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          , S. MacAvaney,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Parade:
          <article-title>Passage representation aggregation for document reranking</article-title>
          , arXiv preprint arXiv:
          <year>2008</year>
          .
          <volume>09093</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gaussier</surname>
          </string-name>
          , Keybld:
          <article-title>Selecting key blocks with local pre-ranking for long document information retrieval</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>2207</fpage>
          -
          <lpage>2211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Wang, Long document ranking with query-directed sparse transformer</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4594</fpage>
          -
          <lpage>4605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Popa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chagnon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. G.</given-names>
            <surname>Cinar</surname>
          </string-name>
          , É. Gaussier,
          <article-title>The power of selecting key blocks with local pre-ranking for long document information retrieval</article-title>
          ,
          <source>ArXiv abs/2111</source>
          .09852 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>H. C. Wu</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          <string-name>
            <surname>Luk</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-F. Wong</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kwok</surname>
          </string-name>
          ,
          <article-title>A retrospective study of a hybrid documentcontext based retrieval model</article-title>
          ,
          <source>Information processing &amp; management 43</source>
          (
          <year>2007</year>
          )
          <fpage>1308</fpage>
          -
          <lpage>1331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Colbert: Eficient and efective passage search via contextualized late interaction over bert</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Körner</surname>
          </string-name>
          ,
          <article-title>Auxiliary tasks in multi-task learning</article-title>
          , arXiv preprint arXiv:
          <year>1805</year>
          .
          <volume>06334</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <article-title>Overview of the trec 2019 deep learning track</article-title>
          , arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>07820</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          , G. de Melo,
          <article-title>Pacrr: A position-aware neural ir model for</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>