<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BIT.UA at BioASQ 8: Lightweight neural document ranking with zero-shot snippet retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>University of Aveiro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IEETA tiagomeloalmeida@ua.pt</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Aveiro, DETI/IEETA</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1941</year>
      </pub-date>
      <abstract>
        <p>This paper presents the participation of the University of Aveiro Biomedical Informatics and Techologies (BIT) group in the eighth edition of the BioASQ challenge for the document and snippet retrieval tasks. Our system follows a two-stage retrieval pipeline, where a group of candidate documents is retrieved based on BM25 and reranked by a lightweight interaction-based model that uses the context of exact matches to re ne the ranking. Additionally, we also show a zero-shot setup for snippet retrieval based on the architecture of our interaction based model. Our system achieved competitive results scoring at the top or close to the top for all the batches, with MAP values ranging from 33.98% to 48.42% in the document retrieval task, although being less e ective on snippet retrieval.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Last year (2019), PubMed indexed almost one and a half million articles, which
is equivalent to almost three new articles indexed every minute.3 As a
consequence, it is continually more time consuming for a biomedical expert to
successfully search this unprecedented amount of available information. So, given
the current arti cial intelligence (AI) revolution, it is clear that such systems
can be exploited to aid with this searching task and ultimately help researchers
to rapidly nd consistent information about their research topic.</p>
      <p>The BioASQ [25] challenge provides annual competitions on document
classi cation, retrieval and question-answering applied to the biomedical domain.
These competitions are notable for continuously pushing the development of
intelligent systems capable of tackling the previously enunciated problem.</p>
      <p>This paper describes the participation of the Biomedical Informatics and
Techologies (BIT) group in the eighth edition of the BioASQ challenge, speci
cally in the document and snippet retrieval tasks of BioASQ 8b Phase A. More
precisely, the objective is to retrieve, from the PubMed/MEDLINE collection,
the most relevant articles and documents snippets for a given biomedical
question written in English.</p>
      <p>
        Our approach is an evolution of a previous work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that develops and
applies a two-stage retrieval system to the biomedical searching problem. More
concretely, it uses the Elasticsearch engine with BM25 weighing scheme to
reduce the search space and then applies a neural ranking model in this smaller
space to produce a nal ranking order. In this work, we focus on improving the
neural ranking model by simplifying the previous architecture and by adopting
some modi cations based on new assumptions. Furthermore, one of the
enhancements enables us to directly extract the importance that the model assigns to
each document passage without the need of training the model on this speci c
task, which makes it a zero-shot learner. In other words, the neural ranking
model is only trained to predict the relevance of an entire document for a given
question.
      </p>
      <p>The nal neural ranking model, presented here, has only 620 trainable
parameters, making it an extremely lightweight approach when compared to
transformer based models which are the current state-of the-art for NLP related tasks.</p>
      <p>Our submissions achieved the top and close to the top positions for
every document retrieval batch and also showed interesting results for all of the
snippet retrieval batches. These are insightful results, that show the
potential of our lightweight neural ranking model and demonstrate a potential
zeroshot learning setup that can be easily extended to a snippet retrieval task.
The full network con guration is publicly available at https://github.com/
bioinformatics-ua/BioASQ_CLEF, together with code for replicating the
results presented in this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        In classical IR methods, a ranking function is parameterized by a set of
handcrafted features to score the relevance of a query-document pair. Nowadays,
recent works on the application of deep learning methods to IR, and
questionanswering in particular, have shown very good results. In this new perspective,
commonly referred to as neural IR, the ranking function is approximated by a
neural network that learns its parameters from large data collections. In the
literature, neural models are usually subdivided into two categories based on their
architecture. In one category, the models learn a semantic representation of the
texts and use a similarity measure to score each query-document pair. Examples
in this representation-based category include the Deep Structured
Semantic Model (DSSM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the Convolutional Latent Semantic Model (CLSM)
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. On the other hand, in interaction-based approaches, query and document
matching signals are captured and then fed to a neural network that produces a
ranking score based on the extracted matching patterns over these signals.
Examples include the Deep Relevance Matching Model (DRMM) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and DeepRank
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Since 2018, transformer-based architectures, like GPT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
have been revolutionizing the NLP eld, showing outstanding performance in
the majority of tasks. These are large models that explore transfer learning
techniques by leveraging the knowledge learned on enormous text collection.
Following this trend, some promising works show positive results when applying
this type of models for the ad-hoc retrieval task [
        <xref ref-type="bibr" rid="ref12 ref3 ref4">3, 4, 12</xref>
        ]. However, despite the
indisputable performance presented by these architectures, it is also undeniable
that the dimension of such models is a major drawback, that makes it almost
impossible for some institutions to deploy or even use these models given their
demanding computational costs.
      </p>
      <p>
        Endorsed by the annual BioASQ competition, biomedical IR became a
challenge with a wide range of di erent solutions, either based on traditional IR,
neural IR, or a combination of both. For example, the system proposed by the
USTB PRIR team [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] uses query enrichment strategies, Sequential Dependence
Models (SDM) and pseudo-relevance feedback to obtain a list of relevant
documents. This traditional approach scored in the top positions between the third
and fth edition, which highlights early challenges of applying neural models to
this task. The system proposed by the AUEB team [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was the rst to show
some evidence that deep neural models are capable of outscoring the traditional
models by scoring at the top positions in the sixth and seventh editions. Their
system uses a variation of DRMM [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] or BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to rerank the top 100
documents recovered using the BM25 scheme [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The importance of the reranking
step is evidenced by comparing the results to another work that submitted the
top documents directly retrieved based on BM25 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Base architecture</title>
      <p>The main objective of this phase is to reduce the enormous searching space by
only selecting the most top-N potential relevant documents for a given
question. Given the large dimension of the article collection (approximately 30
million scienti c articles), it is important to consider an e cient solution capable
of handling this growing collection. With this in mind, we decided to rely on
ElasticSearch (ES) with the BM25 weighting scheme described in Equation 1.
As mentioned before, only the exact matching signals are considered during this
retrieval phase.
(1)
(2)
IDF (qi) = ln(1 +</p>
      <p>C</p>
      <p>f (qi) + 0:5
f (qi) + 0:5
);
weight(qi; D) = IDF (qi)
f (qi; D)</p>
      <p>(k1 + 1)
f (qi; D) + k1(1
b + b
avjgDl(jD) )
:</p>
      <p>Equation 1 presents the weighting scheme of each query term qi with respect
to a document D, where C corresponds to the total number of documents in the
collection, f (qi) represents the number of documents that contain the term qi,
f (qi; D) represents the frequency of term qi in document D, jDj corresponds to
the total number of terms in document D, i.e., its length, avgt(D) represents the
average length of the documents in the collection, and k1, b are hyperparameters
that should be netuned for the collection.</p>
      <p>At last, given the weight of each query term with respect to a document,
weight(qi; D), the nal query-document score is computed by taking a
summation of each query term weight, as shown in Equation 2.</p>
      <p>score(Q; D) = X weight(qi; D):</p>
      <p>qi2Q
3.2</p>
      <sec id="sec-3-1">
        <title>Phase-II</title>
        <p>The second phase has the objective of reranking the previously retrieved top-N
documents by taking into consideration additional matching signals to produce
the nal ranking order. The rationale here is that the previous step only considers
the exact matching signals, i.e., only the words that appear both on the query
and the document are taken into account and weighted to produce the phase-I
ranking. So a more powerful neural solution may be able to learn how to better
explore the context where these exact matches occur.</p>
        <p>
          More precisely, our model is inspired by the DeepRank [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] architecture and
represents a direct enhancement of our previous work [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], with the following
major di erences:
        </p>
        <p>
          Passages no longer follow the query-centric assumption and now
correspond directly to entire document sentences;
The detection network and the measure network were simpli ed and now
form the interaction network;
The passage position input was dropped;
The contributions of each passage to the nal document score are now
assumed to be independent, replacing the self-attention proposed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ];
The pooling step now receives more operators, namely average and average
over k-max;
Calculation of the passage relevance score was simpli ed.
        </p>
        <p>The intuition behind this model is to make a thorough evaluation of the
document passages where the exact matches occur, by taking into consideration
their context. More precisely, this model explores the interactions presented in
the entire passage of each exact match and makes a more re ned judgment of
the passage relevance based on that.</p>
        <p>The updated architecture is depicted in Figure 2 and described here in detail
in order to keep this paper self-contained. First, let us de ne a query as a
sequence of terms q = fu0; u1; :::; uQg, where ui is the i-th term of the query and
Q the size of the query; a document passage as p = fv0; v1; :::; vT g, where vk
is the k-th term of the passage and T the size of the passage; and a document
as sequence of passages D = fp0; p1; :::; pN g.</p>
        <p>From the architecture presented in Figure 2 it is observable that a document
is rst split into individual sentences, i.e., a sequence of passages. In this step,
we rely on the nltk.PunktSentenceTokenizer4, that implements an unsupervised
algorithm for sentence splitting and shows good results on majority of European
languages. Then, passages are grouped with each query-term occurring in the
passage, and the resulting structure is fed to the interaction network together
with the full query to calculate relevance scores for each passage. The nal
document score is produced in the aggregation network taking into consideration
each passage score and the relative importance of each query term.</p>
        <p>
          In more detail, the Grouping by q-term block associates each passage with
each query term that appears in the passage. Formally, this step produces a set of
document passages aggregated by each query term as D(ui) = fpi0; pi1; :::; piP g,
where pij corresponds to the j-th passage with respect to the query term ui.
4 https://kite.com/python/docs/nltk.tokenize.punkt.PunktSentenceTokenizer
This aggregated ow facilitates considering the weight of each query term in
downstream calculations in a straightforward way, as proposed in DRMM [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>The Interaction network was designed to independently evaluate each
query-passage interaction, producing a nal relevance score per sentence. In
detail, it receives as input the query q and the aggregated set of passages
D(ui) and creates for each query-passage pair a similarity tensor (interaction
matrix) S 2 [ 1; 1]Q T , where each entry Sij corresponds to the cosine
similarity between the embeddings of the i-th query term and j-th passage term,
Sij = ku~u~ikiT kv~vj~jk . Next, an x by y convolution followed by a concatenation of the
global max, average and average k-max pooling operation are applied to each
similarity tensor, to capture multiple local relevance signals from each feature
map, as described in Equation 3,
(3)
(4)
x y
him;j = X X wsm;t</p>
        <p>s=0 t=0
hmmax = max(hm); m = 1; :::; M ;</p>
        <p>Si+s;j+t + bm ;
hamvg = avg(hm); m = 1; :::; M ;
hamvg kmax = avg(k-max(hm)); m = 1; :::; M ;</p>
        <p>h = fhmax; havg; havg kmaxg:</p>
        <p>Here, w and b are trainable parameters, the symbol `;' represents the
concatenation operator, M corresponds to the total number of lters and the vector
~h encodes the local relevance between each query-passage, extracted by these
3M 1
pooling operations. At this point, the aggregated set of passages D(ui) is now
represented by their respective vectors ~h, i.e., D(ui) = fh~p0 ; h~p1 ; :::; h~pP g.</p>
        <p>The nal step of the interaction network is to convert these passage
representations ~h to a nal relevance score, for which we employed a fully connected
layer with sigmoid activation, Equation 4,</p>
        <p>rui
P 3M
= ( h~ui w~</p>
        <p>P 3M 3M 1
+ b ):
1 1</p>
        <p>The aim here is to derive a relevance score, relevant (1) or irrelevant (0),
directly from the information that was extracted by the pooling operators. So,
after this stage the aggregated set of passages D(ui) is represented by this
relevance score, i.e., D(ui) = frp0 ; rp1 ; :::; rpP g = r~ui .</p>
        <p>
          The aggregation network, as already mentioned, takes into consideration
the importance of each query term by using a gating mechanism, similar to
DRMM [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], over the aggregated set of passages, as described in Equation 5. That
is, each passage score is weighted by the importance of its associated query term,
following the intuition that in a query di erent terms carry di erent importance
with respect to the nal information goal.
        </p>
        <p>Here, w is a trainable parameter and x~ui corresponds to the embedding vector
of the ui query term. Then the distribution of the query term importance, a, is
computed as a softmax and applied to the respective passages scores, r~ui .</p>
        <p>To produce the nal document score, a scorable vector ~s is created by
performing a summation alongside the query-term dimension of s~ui . Note that in
this step we could have explored other ways to produce this nal vector,
however, this approach seems to empirically work. Finally, this scorable vector ~s is
fed to a Multi-Layer Percepreton (MLP) to produce the nal ranking score, as
summarized in Equation 6.</p>
        <p>score = M LP ( X</p>
        <p>s~ui )
ui2Q
cui = w~ x~ui ;
1 E E 1</p>
        <p>ecui
aui = Puk2Q ecuk
s~ui = aui
P 1 1 1
r~ui ;
P 1
;
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Snippet Retrieval</title>
        <p>
          As initially stated in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], this architecture has an interesting property that
enables us to directly infer the relevance of each passage according to the model
perspective, i.e., the passages scores that most contribute to the nal document
score. We can therefore derive a nal score for each passage as the score given
by the interaction network weighted by the query term importance, which was
already computed and corresponds to the vector s~ui .
        </p>
        <p>
          It is important to note that extracting the most relevant passages per
document is not the same as producing a ranked list of passage relevance as intended
in the BioASQ competition, which implies comparing passages between di
erent documents. In our case, however, passage scores are not directly comparable
since they are obtained with respect to their document, which involves di erent
distributions. So, similarly to [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we assume that passages from documents with
higher document scores are more relevant than passages from documents with
a lower score, which seems intuitive. We therefore obtain the list of passages
by collecting, from the top ranked documents, all passages with a score above
a set threshold. However, a better approach could be explored in the future by
producing scores that take into consideration the passage itself and the score of
the respective document.
        </p>
        <p>
          Furthermore, it is noteworthy to reinforce that this strategy works in an
unsupervised manner, in the sense that the model does not take into consideration
the gold-standard of the passage relevance, but instead produces this relevance
based on what is important to increase the nal score of a relevant document,
according to the document gold-standard. From another perspective, we can argue
that the model is pretrained on the document gold-standard and then applied
(5)
(6)
to the snippet retrieval task, making this a zero-shot learning setup since it was
never trained on the passage gold-standard.
Moved by the interesting results, especially in terms of snippet performance,
reported in the previous BioASQ challenge [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], we also tried to implement a joint
training methodology that explores both document and snippet gold-standards,
instead of only training with the document gold-standard. More precisely, we
compute the binary cross entropy loss over the passage relevance from Equation
4. Then we added the average cross entropy loss of each passage to the document
pairwise loss and trained the model over this combination of the two losses. Note
that the architecture for document score and snippet retrieval remained the
same, since our main idea at this point was to exploit the snippet gold-standard
to, through supervision, enforce the model to distinguish relevant from
nonrelevant passages. Furthermore, as will be addressed in the following sections
and discussed in Section 5, this idea empirically failed to improve the model
performance.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Submission and Results</title>
      <p>In the section, we start by detailing the data collection and some pre-processing
steps that are common to our o cial submissions for the 5 batches. Then we
independently show our results for each batch since we continuously re ned
our base solution by better netuning the hyperparameters and changing small
aspects of the architecture.
4.1</p>
      <sec id="sec-4-1">
        <title>Collection and Pre-processing</title>
        <p>
          In this edition of the BioASQ challenge, the document collection was the 2019
PubMed/MEDLINE annual baseline consisting of almost 30 million articles.
However, only roughly 66% of the articles had title and abstract, so following
previous observations [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we decided to discard the remaining 34%, which were
rarely relevant according to the gold-standard. At this point, our collection had
approximately 20 million documents that were indexed (title and abstract) with
Elasticsearch using the english text analyzer, which automatically performs
tokenization, stemming and stopword ltering.
        </p>
        <p>
          We adopted a custom tokenizer that uses simple regular expressions to
exclude none alphanumeric characters except the hyphen, since many words in the
biomedical domain contain a hyphen, like chemical substances. This way we keep
these words intact, which enhances the detection of important exact matches.
We also trained 200-dimensional word embeddings using the GenSim [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
implementation of word2vec [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], with the 20 million documents (title and abstract)
following the described tokenization, which produced a vocabulary of
approximately 4 million tokens. We used the default con guration of the word2vec
algorithm and xed the embeddings matrix during the training of the neural
ranking model.
For training our neural ranking model, we used the gold-standard data from
the 1-7 editions of BioASQ, with the exception of one test batch of the seventh
edition that we used for validation. Contrarily to our previous work [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we
adopted the pairwise cross-entropy loss, as suggested by Hui et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and shown
in Equation 7.
        </p>
        <p>Since the BioASQ data only provides a list of relevant (positive) documents
per query, we sampled the negative documents as the documents that were
retrieved by the ES but did not appear in the gold-standard. Another important
note is that only the top 10 documents per submission are analyzed by experts
in terms of relevance, which may produce an incomplete gold-standard, i.e.,
positive documents may not be judged since they were not retrieved by participating
systems and hence are taken as negative documents during training. To
exacerbate the problem, the gold-standard was built as a concatenation of the judged
relevance of documents from di erent years, which implies a di erent snapshot
of the document collection. To alleviate this problem, we restrict the ES search
by year so that only the available documents at that time are available to the
model training.</p>
        <p>
          We gave a major emphasis to training/validation in order to gain a better
intuition of the model behavior and what con guration should be followed in
each batch. The neural ranking model was trained using the Adam [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] optimizer,
alongside with modern techniques like learning rate nder and cyclical learning
rates [24]. The netuning of this model was a rolling process that took the
duration of the 5 batches. More concretely, we searched the kernel size for the
convolution, the total number of lters, the pooling operation, the activation
functions, and other minor details that ended up not in uencing the overall
performance. To summarize, Table 1 shows the model con guration that seems
to be the strongest producing a model with only 620 trainable parameters.
        </p>
        <p>
          The model was implemented in TensorFlow5 and is available at https:
//github.com/bioinformatics-ua/BioASQ_CLEF. The entire training process
was conducted with the help of an in-house toolbox that implements pairwise
training in TensorFlow. 6.
(7)
The BioASQ evaluation is divided in two stages. In the rst stage, the
submissions are evaluation against a gold-standard annotated by biomedical experts. In
5 https://www.tensorflow.org/
6 The toolbox is open-sourced here: https://github.com/T-Almeida/mmnrm
a second stage, the biomedical experts will manually annotate the relevance of
the retrieved documents from each submission. At the time of writing, only the
results for the rst stage are available, corresponding to the results presented in
this paper. In terms of numerical evaluation, the organizers automatically
compute ve measures (Mean Precision, Recall, F-Measure (F1), MAP and GMAP)
over each submission given the current gold-standard. According to the
challenge evaluation guidelines [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], the overall system rankings are based on the
MAP measure.
        </p>
        <p>Our group submitted ve runs for each of the batches, which can be
identied by the pre x \bioinfo" on the o cial results7. In the following sections we
present a summary table of the results, comparing our ve submissions to the
top competitor in each batch, i.e., with the top performing system excluding our
systems.
4.4</p>
      </sec>
      <sec id="sec-4-2">
        <title>Submission and Results for Batch 1</title>
        <p>For the rst batch, the BioASQ organizers received a total of 21 submissions
from 8 teams8. For this run, our main idea was to validate the performance of
our phase-I retrieval mechanism and to test if our phase-II reranking model was
indeed boosting the original ranking order. As a summary, we submitted one
run with the results coming from phase-I, i.e., the BM25 ranking order that
was netuned on the validation set, while the remaining runs were produced by
reranking the phase-I results with our neural ranking model:
bioinfo-0: A netuned BM25 run produced by the ElastisSearch;
bioinfo-1 to 4: Neural reranking of the Top-250 documents produced by the
netunned BM25.</p>
        <p>At the time of this submission, our ranking model was still in an initial phase
of development, which means that it did not completely follow the architecture
presented in Section 3.2. More concretely, the model only used the max-pooling
operator and a simple linear combination was used for producing the nal
document score, instead of an MLP.
7 http://participants-area.bioasq.org/results/8b/phaseA/
8 This number is a speculation based on the names of the submission</p>
        <p>Table 2 re ects the rst stage of the BioASQ evaluation for our
submissions. In terms of document retrieval, the \bioinfo-3" submission achieved the
top score in terms of MAP, which means it was the best performing system in
this batch. Additionally, the \bioinfo-2" submission was the second-best
performing system and achieved the best result in terms of recall and GMAP. For
the snippet retrieval, our best performing system achieved fth place and also
showed interesting results in terms of recall and F-measure when compared to
the top-performing system.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Submission and Results for Batch 2</title>
        <p>The second batch received a total of 26 submissions from 9 teams. Our system
was built directly on the validation performed for the gold-standard of the
previous test batch and the validation set. More precisely, we tested the addition
of more pooling operators (average and average over k-max) and the addition
of the MLP for scoring, which empirically proved to be bene cial. Additionally,
we also decided to pursue a joint training approach described in Section 3.4 and
the following list presents a summary of each submitted system:
bioinfo-0: Neural reranking model with joint training (snippets and
documents);
bioinfo-1,3 and 4: Neural reranking described in Section 3.2;
bioinfo-2: Neural reranking using max and average pooling operator.</p>
        <p>Table 3 shows the performance of the submitted systems, overall only the
system that was trained in joint fashion achieved a poor performance. For
document retrieval, our top-performing system was the \bioinfo-3", which achieved
third place in the overall ranking and also the best score in terms of GMAP.
For the snippet retrieval, our best performing system achieved the fourth place
on the overall ranking and similarly to the previous batch showed interesting
results in terms of recall and F-measures when compared to other systems.
Contrarily to the previous sections, we now present the results for the third,
fourth and fth batches in the same section, since the submissions for the
different batches all follow the same description:
bioinfo-0: Ensemble of multiple Neural reranking models;
bioinfo-1 to 4: Neural reranking described in Section 3.2.</p>
        <p>The organizers received a total of 28 submissions from 9 teams for the third
batch, 26 submissions from 11 teams for the fourth batch, and 25 submissions
from 9 teams for the last batch. Given that the proposed joint training seemed to
deteriorate the overall performance we decided to keep the focus on the current
solution and leave as future work a reformulation of the joint training idea. So,
we replaced the joint training submission with a submission that used a naive
ensemble of multiple neural reranking models that were trained during
validation. Note that for the ensemble run we did not produce a ranked list of snippets
since the proposed snippet algorithm does not support multiple relevance values,
from di erent sources, per passage.</p>
        <p>Table 4 presents a summary of the results obtained for the last three batches.
Focusing now on the document retrieval task, \bioinfo-3" was our best
performing system in the third batch achieving a fourth place in the overall ranking
and, additionally, \bioinfo-0" was the best system in terms of recall and GMAP.
Similarly, \bioinfo-3" was our best performing system in the fourth batch, with
a fourth place, and \bioinfo-0" achieved the best result in terms of recall. For
the fth batch, \bioinfo-4" achieved the overall best performance, ranking rst
place in both MAP and recall. We also achieved the top score in terms of GMAP
with the \bioinfo-1" submission. In terms of snippet retrieval, the best ranking
was a fth place on the third and fth batches.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>In this section we discuss the previously presented results, analyzing rst the
overall performance on the document retrieval task followed by the results on
System
snippet retrieval. We complement this discussion with our considerations on
what was successful and what has failed.</p>
      <p>Addressing the results presented in Tables 2, 3 and 4, we consider that our
system had an extremely competitive permanence, being in the top position for
the rst and fth batch and close to the top in the remaining batches.
Additionally, we note that at least one of our submissions achieved the best performance
in at least one metric for all the batches. Furthermore, if we look at the GMAP
metric it is observable that our systems achieved the best results in all but the
fourth batch.</p>
      <p>With respect to the neural reranking performance comparative to the phase-I
ranking, we can see in Table 2 that every neural submission was able to improve
the original BM25 ranking order, what is in accord with our speculation and
validation results. So, these results also seem to be according to our proposed
idea of better exploring the context where the exact match occurs to produce a
more re ned judgment that contributes to the nal score.</p>
      <p>As previously said, after the rst batch and based on some validation tests
we decided to change the model by adding more pooling operations and the
MLP. However, at a rst glance, according to the results on the second, third,
MAP GMAP
and fourth batches, it seems that these changes were not bene cial, since the
system was not able to achieve the top performance, similarly to the rst batch.
However, we argue that this discrepancy can also be a consequence of some
improvement of the competitors systems after the rst batch. Additionally, we
experiment the updated architecture on the rst batch and were able to easily
achieve a MAP score of over 35%, surpassing the previous best.</p>
      <p>
        Finally, the only metric for which our system does not seem to be able to
achieve competitive results is F-measure. However, as noted previously [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], a
system that outputs con dence scores instead of ranking scores seems to be able
to achieve higher performance in terms of this metric. A possible explanation
relies on the BioASQ data and more properly on the questions that have only
a few true positive documents9 in the entire collection. In this case, a system
based on con dence scores can easily create a ranked list with fewer than 10
documents (the maximum considered per question), since it selects the relevant
documents based on a threshold value over the con dence scores. So, for this
type of questions, a system based on con dence is more likely to achieve higher
values of Precision and Recall (resulting in a higher F1 measure) when compared
to a ranking system that will obtain a higher Recall but lower Precision since it
always outputs the top 10 documents.
      </p>
      <p>In terms of snippet retrieval, the submitted system did not present
competitive results when compared to the top submissions, being the best performance
a fourth place in the second batch. However, given that our method does not use
the snippet gold-standard for training and follows a naive ranking approach, we
consider these results encouraging, especially in terms of recall and F-measure,
and with the potential to be better explored in future work.</p>
      <p>
        Concerning the joint training approach, we considered that it has
empirically failed. More precisely, it seems that our intuition to improve the passage
relevance with supervision may be more challenging to achieve. One problem
is the notion of passage relevance since most of the time a relevant snippet in
the gold-standard encompasses multiple sentences, which the model will see and
score as independent. So, this supervision may be forcing the model to boost the
relevance score of sentences that in isolation carry week matching signals,
ending up hindering the overall matching signal extraction. Another problem lies on
the naive implementation of the snippet retrieval algorithm. Another idea is to
directly use the snippet gold-standard to produce ranking scores, more similar
to the winning approach in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we propose a two-stage retrieval pipeline to address the biomedical
retrieval problem. Our system rst uses BM25 to selected a pool of potential
relevant candidates that are then reranked by a neural ranking model. Contrarily
to the NLP trend, we focused on building a lightweight interaction based model,
9 less than 10 documents
which yields a nal model with only 620 trainable parameters. The proposed
architecture can also be used to produce relevance scores for each document
passage according to the model perspective of relevance. This property enables
us to perform passage retrieval in a zero-shot learning setup.</p>
      <p>The proposed pipeline was evaluated on the eighth edition of BioASQ, were
it achieved competitive results for the document retrieval task, being on top and
close to the top in all batches. In the snippet retrieval task, it showed interesting
results given that they were produced by a naive algorithm in a zero-shot learning
setup.</p>
      <p>As future work, there are minor questions still open especially on the
aggregation network con guration. Additionally, an interesting route is to compare the
current architecture with a direct, but parameter-greedy, extension that uses a
state-of-the-art transformer based model, such as BERT, which are well suited to
our objective of better evaluating the passage context. This may be achieved by
replacing the word2vec embeddings by the context-aware embeddings produced
by these models or by completely replacing the interaction network.
24. Smith, L.N.: Cyclical learning rates for training neural networks (2015)
25. Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers,
M., Wei enborn, D., Krithara, A., Petridis, S., Polychronopoulos, D.,
Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artieres, T., Ngonga Ngomo,
A.C., Heino, N., Gaussier, E., Barrio-Alvers, L., Paliouras, G.: An overview of the
bioasq large-scale biomedical semantic indexing and question answering
competition. BMC Bioinformatics 16, 138 (04 2015).
https://doi.org/10.1186/s12859-0150564-6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Almeida</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Calling attention to passages for biomedical question answering</article-title>
          . In: Jose,
          <string-name>
            <given-names>J.M.</given-names>
            ,
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Magalha~es, J.,
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            ,
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.) Advances in Information Retrieval. pp.
          <volume>69</volume>
          {
          <fpage>77</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -45442- 5 9
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brokos</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liosis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>AUEB at BioASQ 6: Document and Snippet Retrieval (sep</article-title>
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1809</year>
          .06366
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Deeper text understanding for ir with contextual neural language modeling</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . p.
          <volume>985</volume>
          {
          <fpage>988</fpage>
          . SIGIR'
          <volume>19</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3331184.3331303
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Context-aware document term weighting for ad-hoc search</article-title>
          .
          <source>In: Proceedings of The Web Conference</source>
          <year>2020</year>
          . p.
          <year>1897</year>
          {
          <year>1907</year>
          . WWW '
          <volume>20</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2020</year>
          ). https://doi.org/10.1145/3366423.3380258
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ai</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.B.:
          <article-title>A deep relevance matching model for ad-hoc retrieval</article-title>
          .
          <source>Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Oct</source>
          <year>2016</year>
          ). https://doi.org/10.1145/2983323.2983769
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heck</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning deep structured semantic models for web search using clickthrough data</article-title>
          .
          <source>In: Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management - CIKM '13</source>
          . pp.
          <volume>2333</volume>
          {
          <fpage>2338</fpage>
          . ACM Press, New York, New York, USA (
          <year>2013</year>
          ). https://doi.org/10.1145/2505515.2505665
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berberich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Melo</surname>
          </string-name>
          , G.:
          <article-title>Co-pacrr: A contextaware neural ir model for ad-hoc retrieval</article-title>
          . pp.
          <volume>279</volume>
          {
          <issue>287</issue>
          (02
          <year>2018</year>
          ). https://doi.org/10.1145/3159652.3159689
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.C.</given-names>
          </string-name>
          :
          <article-title>A multi-strategy query processing approach for biomedical question answering: Ustb prir at bioasq 2017 task 5b</article-title>
          .
          <source>In: BioNLP</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Klambauer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unterthiner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayr</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Self-normalizing neural networks (06</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>MacAvaney</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goharian</surname>
          </string-name>
          , N.:
          <article-title>Cedr: Contextualized embeddings for document ranking</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . p.
          <volume>1101</volume>
          {
          <fpage>1104</fpage>
          . SIGIR'
          <volume>19</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA (
          <year>2019</year>
          ). https://doi.org/10.1145/3331184.3331317
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mateus</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mindlab neural network approach at bioasq 6b (11</article-title>
          <year>2018</year>
          ). https://doi.org/10.18653/v1/
          <fpage>W18</fpage>
          -5305
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brokos</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.: Deep</given-names>
          </string-name>
          <string-name>
            <surname>Relevance Ranking Using Enhanced Document-Query Interactions</surname>
          </string-name>
          (sep
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1809</year>
          .01682
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume</source>
          <volume>2</volume>
          . p.
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          . NIPS'
          <volume>13</volume>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Misra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Mish: A self regularized non-monotonic neural activation function (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
          </string-name>
          , J., Cheng, X.: Deeprank.
          <source>Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Nov</source>
          <year>2017</year>
          ). https://doi.org/10.1145/3132847.3132914
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brokos</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Aueb at bioasq 7: Document and snippet retrieval</article-title>
          . In: Cellier,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Driessens</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.)
          <article-title>Machine Learning and Knowledge Discovery in Databases</article-title>
          . pp.
          <volume>607</volume>
          {
          <fpage>623</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Prodromos</surname>
            <given-names>Malakasiotis</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ioannis</given-names>
            <surname>Pavlopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.A.d.</given-names>
            ,
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Evaluation measures for task b</article-title>
          , http://participants-area.bioasq.org/Tasks/b/eval_meas_ 2020/
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narasimhan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimans</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Improving language understanding by generative pre-training (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rehurek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          . pp.
          <volume>45</volume>
          {
          <fpage>50</fpage>
          . ELRA, Valletta, Malta (May
          <year>2010</year>
          ), http://is.muni.cz/ publication/884893/en
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>The probabilistic relevance framework: Bm25 and beyond</article-title>
          .
          <source>Found. Trends Inf. Retr</source>
          .
          <volume>3</volume>
          (
          <issue>4</issue>
          ),
          <volume>333</volume>
          {389 (Apr
          <year>2009</year>
          ). https://doi.org/10.1561/1500000019
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mesnil</surname>
          </string-name>
          , G.:
          <article-title>A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval</article-title>
          .
          <source>In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14</source>
          . pp.
          <volume>101</volume>
          {
          <fpage>110</fpage>
          . ACM Press, New York, New York, USA (
          <year>2014</year>
          ). https://doi.org/10.1145/2661829.2661935
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>