<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shaurya Rohatgi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jian Wu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C. Lee Giles</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pennsylvania State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper elaborates on our submission to the ARQMath track at CLEF 2020. Our primary run for the main Task-1: Question Answering uses a two-stage retrieval technique in which the rst stage is a fusion of traditional BM25 scoring and tf-idf with cosine similarity-based retrieval while the second stage is a ner re-ranking technique using contextualized embeddings. For the re-ranking we use a pre-trained robertabase model (110 million parameters) to make the language model more math-aware. Our approach achieves a higher NDCG0 score than the baseline, while our MAP and P@10 scores are competitive, performing better than the best submission (MathDowsers) for text and text+formula dependent topics.</p>
      </abstract>
      <kwd-group>
        <kwd>Math Information Retrieval aware search</kwd>
        <kwd>Math formula search</kwd>
        <kwd>Math Embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>{ Achieve a competitive score that beats the best baseline in terms of the</p>
      <p>NDCG0 score.
{ Achieve better results than the best submission for Text and Text+Math
dependent posts.
{ Propose a Masked Language Model1 used for re-ranking candidate answers
containing both text and math formulas.</p>
      <p>Our paper has the following Sections. Section 2 discusses our two-stage
retrieval approach which includes indexing and re-ranking phases. Section 3
describes our experimental setup, system con guration, and a detailed comparison
of our results with the best submission. Conclusions are in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Our Approach</title>
      <p>
        Here we describe our two-stage cascade candidate retrieval and ranking
approach. tf-idf with cosine similarity and an o -the-shelf BM25-based search
platform [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are used for the rst stage. This is relatively computationally cheaper
than the second more expensive re-ranking stage. For the second stage, we use a
pre-trained language model to obtain semantically rich embeddings for the
candidates obtained from the rst stage. We then use these embeddings to rank the
candidates using a cosine distance to the topic/query embedding. This ensures
that the ranking captures semantics between the query and the documents. We
only rely on the content of the post, which contain raw text and math
formulas, not using external information such as votes/scores, user id, user score, post
tags, and linked duplicate posts.
2.1
      </p>
      <p>First-Phase: Retrieval
The aim of this phase is to get a su cient number of relevant candidates to be
re-ranked in the next stage. Past work has used BM25 scoring for this but we
add cosine similarity ranking as well.</p>
      <p>Indexing: We convert the MSE dataset from XML to JSON format so that it
can be ingested by the search platforms we use. Because indexing all answers will
potentially lose information in the questions, we generate a document by
concatenating the question (Q) post text (title and body) with their corresponding
answer (A) posts text (body). As such, the question body and title concatenated
with the answer body become a document - one unit of retrieval. This allows us
to remove questions without corresponding answers and represent the relevant
posts as one large document because the relevant information the user is seeking
could be in either the answer or the question post. This results in a total of
1,435,643 Q+A posts to be indexed.</p>
      <p>We index the data using two o -the-shelf libraries - Elasticsearch (ES) and
Anserini. We use two di erent libraries because each has its strengths and
weaknesses. Anserini seems to perform better than ES in terms of relevance ranking
1 https://huggingface.co/shauryr/arqmath-roberta-base-1.5M
and recall, which we observed when searching and evaluating the three training
queries for the Question Answering Task provided by the organizers.
Elasticsearch has a more scalable implementation of tf-idf with cosine similarity which
is slow for a large dataset when using Anserini. For the formulas, we use the raw
LATEX strings. We re-rank answers based on contextualized text and formulas
embeddings using RoBERTa.</p>
      <p>
        Retrieval: Once all posts in Elasticsearch and Anserini have been indexed, we
query the topics/queries for Task-1 to get the top-1000 candidates. For each
task, each index is queried independently and later fused using Reciprocal Rank
Fusion (RRF) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which then ranks the documents using a naive scoring formula.
Given a set of posts P to be ranked and a set of rankings R from di erent scoring
schemes, for each permutation on 1 jP j, we compute
      </p>
      <p>RRF score(p 2 P ) = X
r2R</p>
      <p>
        1
k + r(p)
;
(1)
where the original work [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] suggests k = 60, a hyper-parameter which we keep
constant. The fusion of results from two di erent search platforms above
significantly increases the performance numbers as demonstrated in Section 3.
2.2
      </p>
      <p>Second-Phase: Re-Ranking
Here we describe how we pretrain our language model and re-rank candidates
obtained in the last phase.</p>
      <p>
        BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is a self-supervised approach for ne-tuning a deep transformer
encoder [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Given a sequence, BERT learns a contextualized vector representation
for each token. The input representations are fed into a stack of multi-layer
bidirectional transformer blocks, which uses self-attention to compute semantically
rich text representations by considering the whole input sequence.
      </p>
      <p>
        BERT-based systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] have shown signi cant performance in the
recent tasks of TREC-2019 deep learning track [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with the Microsoft MARCO
dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Participants for these tasks used query-document pairs from the
training data to train transformer-based models to predict relevance. However,
such models rely on a massive amount of data, which is not available in our task.
Therefore, instead of training for relevance, we leveraged an unsupervised model
by calculating the semantic similarities of queries and documents and later using
them for re-ranking.
      </p>
      <p>
        We choose the RoBERTa model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] over BERT because in our experiments
Vanilla RoBERTa achieves better NDCG0 scores than Vanilla BERT for the
three preliminary training posts provided by the task organizers. Also, RoBERTa
converges in fewer training steps than BERT as it gets rid of the computationally
expensive next sentence prediction task. RoBERTa attains better performance
on various NLP tasks than BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We start with the initial weights of the
roberta-base model and further pretrain the model using MSE data. The language
model is trained for a mask prediction task. Once we are done training we use
      </p>
      <p>ES
Anserini</p>
      <p>Runtime in seconds
0.5
1.0
1.5
2.0
2.5
the Masked Language Model (MLM) to get contextualized token embeddings
averaged over the sequence length to represent the candidates from Phase-1 into
a 768 dimension vector. Similarly, we obtain topic/query embeddings and rank
the candidates using their cosine distance.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>Our experiments were conducted on a 24 core machine with Intel(R) Xeon(R)
Silver 4116 CPU @ 2.10GHz with 256GB of RAM with 4 RTX 2080 Ti GPUs.
Default con gurations were adopted in ES (cosine similarity; tf-idf) and Anserini
(BM25). Anserini runs on a single thread while ES uses a multi-threaded
function. In Figure 1 we compare runtimes of BM25 ranking using Anserini and tf-idf
based cosine similarity ranking using ES. The size of ES Q+A index is 3.6GB
whereas for Anserini the size is 2.2GB.</p>
      <p>Pretraining RoBERTa: We use the transformers2 library's implementation of
RoBERTa to train the MLM. We start with the original weights released by
the authors for roberta-base and then further pretrain the model on the MSE
dataset. Fortunately, BASE-vocabulary used in the original was able to cover the
whole MSE dataset so did not train from scratch. Further, we reduce the batch
size to 4 per GPU and increase step size to facilitate gradient accumulation.
We kept the maximum sequence length of 512 to accommodate longer posts.3
Q+A posts are usually longer than 512 tokens so we had to break the posts into
chunks before feeding them to our system.</p>
      <p>
        Once we are done with pretraining our MLM, we use it to extract the
embeddings for each token in the candidate posts from the rst-phase. For longer
Q+A posts, we had to nd a way of truncating them so that the sequence can t
2 https://github.com/huggingface/transformers
3 Link to training details
in the 512 token window. We experimented with the head tail approach [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
The head tail approach gave a lower NDCG0 score for the three preliminary
train topics. Therefore, we use the head approach, in which we keep the rst 510
tokens from the Q+A post and leave two token spaces for [CLS] and [SEP ].
This gives better results for the trained topics. This is likely because our Q+A
posts include question text, so if you can nd a similar question to a topic, then
the answer to the similar question has a higher chance of being an answer to the
topic.
      </p>
      <p>
        We show in Table 1 how our system compares with other submissions and
the baselines. We only include the best submissions for baselines and other
systems. Our system BM25+tf+tf.BERT clearly beats the best baseline in terms
of NDCG0. Before the challenge, we had only three train topics provided by the
organizers on which we could test our approach. For these three topics tf-idf with
cosine similarity was running better than BM25 scoring, so we did not include
BM25 in our nal submission. Later we added BM25 scoring which showed a
greater increase in the number of relevant retrieved documents. It can be seen
that tf.BERT that selects candidates using tf-idf similarity and re-ranking
using our pretrained RoBERTa model does not achieve the best performance. But
when the rankings of tf.BERT and tf-idf are fused, NDCG0 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] score is
substantially boosted. This is because the BERT ranking solely relies on semantic
similarity whereas the tf-idf replies only on word frequency. Fusing these two
runs seems reasonable since the posts which have a higher rank in both the runs
are boosted even higher in the nal ranking list. Note that BM25+tf+tf.BERT
achieves the maximum number of relevant retrieved posts. It is important to
20
sd
iiftrs15
cop
o
e10
b
m
u
N
5
0
8
      </p>
      <p>8
4
23
note that the tf runs are not as consistent since ES does sharding and
roundrobins between di erent shard searching. Thus, we did multiple runs to achieve
the above scores.</p>
      <p>
        Comparison with Other submissions: Our best unsubmitted run,
BM25+tf+tf.BERT was compared with other submissions, among which
MathDowsers is the system that achieved one of the best results in Task 1.
Submissions by Approach0 and MathDowsers leveraged Tangent-S [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and Tangent-L
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which are Symbol Layout and Operator Tree-based systems giving more
attention to the math formula in the text. Our post-hoc system does not use a
di erent representation for formula and text, and therefore seems to su er for
formula dependent topics. We believe representing math content as trees and
separately from the text content could have bene ted our scores.
      </p>
      <sec id="sec-3-1">
        <title>Text DeFpoernmduelnace Text + Formula</title>
        <p>(a) Number of topics for which each
system had higher NCDG0 than the other
system.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Text DeFpoernmduelnace Text + Formula</title>
        <p>(b) Average NCDG0 over total topics
depending on di erent types of topics.</p>
        <p>In Figure 2 we see that our system is better for the topics that depend on
text and text+formula. For Text dependent topics MathDowsers had 4 topics
for which there NDCG0 score was higher than our best run, while our system
performed better in 8 topics. For the 31 text+formula dependent topics our
system had better NDCG0 score for 17 topics. MathDowsers achieves a higher
( 96:4% higher) NDCG0 score for the topics which are only formula dependent.
In contrast, our system is better ( 22:42%) at ranking posts containing both
formula and text. This is attributed to the contextualized embeddings which our
pretrained MLM can produce. It models equations with surrounding text and
hence has better performance in text+formula topics. The di erence ( 77:2%)
is even more when we compare text dependant topics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Overall, our participation in the ARQMath Track helped in our understanding
of how to improve multi-modal search (text+formula) by exploiting
state-ofthe-art text embedding and information retrieval models. In terms of e
ectiveness, our most e ective post-hoc run BM25+tf+tf.BERT was able to outperform
the baselines and all submissions except Mathdowsers in terms of NDCG0. Our
post-hoc system achieved the best results for queries that depend on text and
text+formula.</p>
      <p>Future work would include the use of the MSE dataset to get question-answer
post pairs to train a better ranking model. It would also be helpful to investigate
how important a formula is in a question post to retrieve relevant answers.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We would like to thank members of the ARQMath lab at the Department of
Computer Science in Rochester Institute of Technology for organizing this track.
Special thanks to Behrooz Mansouri for providing the dataset, initial analysis
of topics, and starter code to all the participants of the task; it made it easier
for us to pre-process the data and jump directly to the experiments which have
been presented in this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bajaj</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNamara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.:
          <article-title>Ms marco: A human generated machine reading comprehension dataset</article-title>
          .
          <source>arXiv preprint arXiv:1611.09268</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cormack</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buettcher</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Reciprocal rank fusion outperforms condorcet and individual rank learning methods</article-title>
          .
          <source>In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <volume>758</volume>
          {
          <issue>759</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Craswell</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yilmaz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Overview of the trec 2019 deep learning track</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>07820</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Deeper text understanding for ir with contextual neural language modeling</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <volume>985</volume>
          {
          <issue>988</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompa</surname>
            ,
            <given-names>F.W.</given-names>
          </string-name>
          :
          <article-title>Choosing math features for BM25 ranking with tangent-l</article-title>
          .
          <source>In: Proceedings of the ACM Symposium on Document Engineering</source>
          <year>2018</year>
          . pp.
          <volume>1</volume>
          {
          <issue>10</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gururangan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marasovic</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swayamdipta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Don't stop pretraining: Adapt language models to domains and tasks</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>10964</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mansouri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
          </string-name>
          , R.:
          <article-title>Finding old answers to new math questions: The arqmath lab at clef 2020</article-title>
          . In: Jose,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Magalha~es, J.,
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            ,
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.) Advances in Information Retrieval. pp.
          <volume>564</volume>
          {
          <fpage>571</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Nogueira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Passage re-ranking with bert</article-title>
          .
          <source>arXiv preprint arXiv:1901</source>
          .
          <volume>04085</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nogueira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Multi-stage document ranking with bert</article-title>
          .
          <source>arXiv preprint arXiv:1910</source>
          .
          <volume>14424</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sakai</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kando</surname>
          </string-name>
          , N.:
          <article-title>On information retrieval metrics designed for evaluation with incomplete relevance assessments</article-title>
          .
          <source>Information Retrieval</source>
          <volume>11</volume>
          (
          <issue>5</issue>
          ),
          <volume>447</volume>
          {
          <fpage>470</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>How to ne-tune bert for text classi cation?</article-title>
          <source>In: China National Conference on Chinese Computational Linguistics</source>
          . pp.
          <volume>194</volume>
          {
          <fpage>206</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          .
          <source>In: Advances in neural Information processing systems</source>
          . pp.
          <volume>5998</volume>
          {
          <issue>6008</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Anserini: Enabling the use of lucene for information retrieval research</article-title>
          .
          <source>In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <volume>1253</volume>
          {
          <issue>1256</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Simple applications of bert for ad hoc document retrieval</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>10972</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Yilmaz</surname>
            ,
            <given-names>Z.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Applying bert to document retrieval with birch</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations</source>
          . pp.
          <volume>19</volume>
          {
          <issue>24</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohatgi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
          </string-name>
          , R.:
          <article-title>Accelerating substructure similarity search for formula retrieval</article-title>
          . In: Jose,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Magalh~aes, J.,
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            ,
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.) Advances in Information Retrieval. pp.
          <volume>714</volume>
          {
          <fpage>727</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>