<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of BioASQ 8a and 8b: Results of the eighth edition of the BioASQ tasks a and b</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anastasios Nentidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Krithara</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantinos Bougiatiotis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Paliouras</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aristotle University of Thessaloniki</institution>
          ,
          <addr-line>Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Center for Scienti c Research \Demokritos"</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National and Kapodistrian University of Athens</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present an overview of the eighth edition of the tasks a and b of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ aims at promoting methodologies and systems for large-scale biomedical semantic indexing and question answering through the organization of yearly challenges since 2012. These shared tasks o er to teams around the world the opportunity to develop and compare their methods on the same benchmark datasets that represent the demanding information needs of biomedical experts. This year, apart from introduction of a new task on medical semantic indexing in Spanish (MESINESP8), the eighth versions of the two established BioASQ tasks on semantic indexing (8a) and question answering (8b) in English were also o ered. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge, with seven of them focusing on task 8a and 23 on task 8b. As in previous versions of the tasks, the evaluation of system responses reveals some participating systems managed to outperform the strong baselines, indicating that continuous advancements in state-of-the-art systems keep pushing the frontier of research leading to performance improvements.</p>
      </abstract>
      <kwd-group>
        <kwd>Biomedical knowledge</kwd>
        <kwd>Semantic Indexing</kwd>
        <kwd>Question An- swering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper presents the shared tasks 8a and 8b of the eighth edition of the
BioASQ challenge in 2020, the corresponding datasets and the approaches and
achieved results of the participating systems. A detailed description of the new
task on medical indexing in Spanish is o ered in the MESINESP task overview.
A condensed BioASQ 2020 Lab overview [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is also available, describing the
eighth edition the BioASQ challenge as a whole, in the context of the
Conference and Labs of the Evaluation Forum (CLEF) 2020. Towards this direction, in
section 2 we provide an overview of the shared tasks 8a and 8b, that took place
from February to May 2020, as well as the corresponding datasets developed for
training and testing the participating systems. In section 3, we brie y overview
the participating systems and the approaches proposed by the corresponding
teams for these two tasks. Detailed descriptions for some of the systems are
also available in the proceedings of the BioASQ lab. In section 4, we present
the results of the evaluation of participating systems, based on manual
assessment or state-of-the-art evaluation measures, depending on the nature of the
required system response. Finally, we conclude and discuss the eighth version of
the BioASQ tasks a and b in section 5.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Overview of the Tasks</title>
      <p>
        In the eighth version of the BioASQ challenge were o ered three tasks: (1) a
large-scale biomedical semantic indexing task (task 8a), (2) a biomedical question
answering task (task 8b), both considering documents in English, and (3) a new
task on medical semantic indexing in Spanish (task MESINESP). In this section
we provide a brief description of the two established tasks (8a and 8b) with focus
on di erences from previous versions of the challenge [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. A detailed overview
of the initial versions of the tasks and the general structure of BioASQ is also
already available [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>Large-scale semantic indexing - Task 8a</title>
        <p>In Task 8a the aim is to classify articles from the PubMed/MedLine4 digital
library into concepts of the MeSH hierarchy. In particular, new PubMed articles
that are not yet annotated by the indexers in NLM are gathered to form the test
sets for the evaluation of the participating systems. Some basic details about
each test set and batch are provided in Table 1. As done in previous versions of
the task, the task is divided into three independent batches of 5 weekly test sets
each, providing an on-line and large-scale scenario, and the test sets consist of
new articles without any restriction on the journal published. The performance of
the participating systems is calculated using standard at information retrieval
measures, as well as, hierarchical ones, when the annotations from the NLM
indexers become available. As usual, participants have 21 hours to provide their
answers for each test set. However, as it has been observed that new MeSH
annotations are released in PubMed earlier that in previous years, we shifted
the submission period accordingly to avoid having some annotations available
from NLM while the task is still running. For training, a dataset of 14,913,939
articles with 12.68 labels per article, on average, was provided to the participants.</p>
        <sec id="sec-2-1-1">
          <title>4 https://pubmed.ncbi.nlm.nih.gov/</title>
          <p>Total
1
2
Total</p>
          <p>
            3
Total
Task 8b aims at providing a realistic large-scale question answering challenge
o ering to the participating teams the opportunity to develop systems for all the
stages of question answering in the biomedical domain. Four types of questions
are considered in the task: \yes/no", \factoid", \list" and \summary" questions
[
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. A training dataset of 3,243 questions annotated with golden relevant elements
and answers is provided for the participants to develop their systems. Table 2
presents some statistics about the training dataset as well as the ve test sets.
          </p>
          <p>As in previous versions of the challenge, the task is structured into two phases
that focus on the retrieval of the required information (phase A) and
answering the question (phase B). In addition, the task is split into ve independent
bi-weekly batches and the two phases for each batch run during two
consecutive days. In each phase, the participants receive the corresponding test set and
have 24 hours to submit the answers of their systems. In particular, in phase
A, a test set of 100 questions written in English is released and the
participants are expected to identify and submit relevant elements from designated
resources, including PubMed/MedLine articles, snippets extracted from these
articles, concepts and RDF triples. In phase B, the manually selected relevant
articles and snippets for these 100 questions are also released and the
participating systems are asked to respond with exact answers, that is entity names
or short phrases, and ideal answers, that is natural language summaries of the
requested information.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Overview of participation</title>
      <p>This year, 34 teams from institutes around the world participated in the
three tasks of the challenge with more than 100 distinct systems. Seven of these
teams focused on task 8a and 23 on task 8b. As presented in g. 1, the
institutions hosting the teams that participated in tasks 8a and 8b are distributed
around the world highlighting the international interest in the tasks. Compared
to previous versions of the challenge, we observe a shift towards the most
complex question answering task b, where the number of participating teams and
systems is increasing during the last years as shown in Fig. 2.
This year, 7 teams participated in the eighth edition of task a, submitting
predictions from 16 di erent systems in total. Here, we provide a brief overview of
those systems for which a description was available, stressing their key
characteristics. A summing-up of the participating systems and corresponding approaches
is presented in Table 3.</p>
      <p>
        This year, the LASIGE team from the University of Lisboa, in its \X-BERT
BioASQ" system propose a novel approach for biomedical semantic indexing
combining a solution based on Extreme Multi-Label Classi cation (XMLC) with
a Named-Entity-Recognition (NER) tool. In particular, their system is based on
X-BERT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], an approach to scale BERT [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to XMLC, combined with the use of
the MER [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] tool to recognize MeSH terms in the abstracts of the articles. The
system is structured into three steps. The rst step is the semantic indexing of
the labels into clusters using ELMo [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]; then a second step matches the indices
using a Transformer architecture; and nally, the third step focuses on ranking
the labels retrieved from the previous indices.
      </p>
      <p>
        Other teams, improved upon existing systems already participating in
previous versions of the task. Namely, the National Library of Medicine (NLM)
team, in its \NLM CNN " system enhance the previous version of their \ceb"
systems [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], based on an end-to-end Deep Learning (DL) architecture with
Convolutional Neural Networks (CNN), with SentencePiece tokenization [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The
Fudan University team also builds upon their previous \AttentionXML" [55]
and \DeepMeSH " [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] systems as well their new \BERTMeSH " [54] system,
which are based on document to vector (d2v) and tf-idf feature embeddings,
learning to rank (LTR) and DL-based extreme multi-label text classi cation,
Attention Mechanisms and Probabilistic Label Trees (PLT) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Finally, this
years versions of the \Iria" systems [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] are also based on the same techniques
used by the systems in previous versions of the challenge which are summarized
in Table 3.
      </p>
      <p>
        Similarly to the previous versions of the challenge, two systems developed by
NLM to facilitate the annotation of articles by indexers in MedLine/PubMed,
where available as baselines for the semantic indexing task. MTI [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] as enhanced
in [56] and an extension based on features suggested by the winners of the rst
version of the task [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ].
3.2
      </p>
      <sec id="sec-3-1">
        <title>Task 8b</title>
        <p>This version of Task b was tackled by 94 di erent systems in total, developed by
23 teams. In particular, 8 teams participated in the rst phase, on the retrieval of
relevant material required for answering the questions, submitting results from
30 systems. In the second phase, on providing the exact and ideal answers for the
questions, participated 18 teams with 72 distinct systems. Three of the teams
participated in both phases. An overview of the approaches, technologies and
datasets used by the teams is provided in Table 4 and a graphical representation
of them as a word cloud, weighted by their frequency in logarithmic scale, is also
provided in Fig. 3. Only systems for which a description was available is included
in this section. Detailed descriptions for some of the systems are available at the
proceedings of the workshop.</p>
        <p>
          The \ITMO " team participated in both phases of the task experimenting
in its \pa" systems with di ering solutions across the batches. In general, for
document retrieval the systems follow an approach with two stages. First, they
identify initial candidate articles based on BM25, and then they re-rank them
using variations of BERT [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], ne-tuned for the binary classi cation task with
the BioASQ dataset and pseudo-negative documents. They extract snippets from
the top documents and rerank them using biomedical Word2Vec based on cosine
similarity with the question. To extract exact answers they use BERT ne-tuned
on SQUAD [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] and BioASQ datasets and employ a post-processing to split
the answer for list questions and additional ne-tuning on PubMedQA [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] for
yes/no questions. Finally, for ideal answers they generate some candidates from
the snippets and their sentences and rerank them using the model used for
phase A. In the last batch, they also experiment with generative summarization,
developing a model based on BioMed-RoBERTa [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] to improve the readability
and consistency of the produced ideal answers.
        </p>
        <p>
          Another team participating in both phases of the task is the \UCSD " team
with its \bio-answer nder" system. In particular, for phase A they rely on
previously developed Bio-AnswerFinder system [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], which is also used as a rst
step in phase B, for re-ranking the sentences of the snippets provided in the
test set. For identifying the exact answers for factoid and list questions they
experimented on ne-tuning Electra [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and BioBERT [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] on SQuAD and
BioASQ datasets combined. The answer candidates are then scored considering
classi cation probability, the top ranking of corresponding snippets and number
of occurrences. Finally a normalization and ltering step is performed and, for
list questions, and enrichment step based on coordinated phrase detection. For
yes/no questions they ne-tune BioBERT on the BioASQ dataset and use
majority voting. For summary questions, they employ hierarchical clustering, based
on weighted relaxed word mover's distance (wRWMD) similarity [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] to group
the top sentences, and select the sentence ranked highest by Bio-AnswerFinder
to be concatenated to form the summary.
        </p>
        <p>
          The \AUEB " team also participated in both tasks focusing on phase A and
brie y experimenting with Phase B. Working on extending their previous
topperforming model [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], they experimented with graph-node embeddings
generated from a biomedical entity co-occurrence graph from publications [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
Moreover, they experimented with new ways to encode and retrieve relevant snippets,
but concluded that conventional BM25 pre-fetching was more e cient. For Phase
B, they worked with exact answer extraction. To this end, they experimented
with a SciBERT-based model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] modelled for cloze-style biomedical machine
reading comprehension [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] (MRC). However, their initial results indicated that
the MRC task di ers greatly from the exact answer extraction task and they did
not pursue this research direction further.
        </p>
        <p>
          In phase A, the team from the University of Aveiro participated with its
\bioinfo" systems, which consists of a ne-tuned BM25 retrieval model based on
ElasticSearch [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], followed by a neural reranking step. For the latter, they use
an interaction-based model inspired on the DeepRank [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ] architecture building
upon previous versions of their system [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The focus of the improvements was on
the sentence splitting strategy, on extracting of multiple relevance signals, and
the independent contribution of each sentence for the nal score. The \Google"
team also participated in phase A, with four distinct systems for document
retrieval based on di erent approaches. In particular, they used a BM25 retrieval
model, a neural retrieval model, initialized with BioBERT and trained on a
large set of questions developed through Synthetic Query Generation (QGen),
and a hybrid retrieval model 5 based on a linear blend of BM25 and the neural
model [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. In addition, they also used a reranking model, rescoring the results
of the hybrid model with a cross-attention BERT rescorer [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ].
        </p>
        <p>
          In phase B, this year the \KU-DMIS " team participated on both exact and
ideal answers. For exact answers, they build upon their previous
BioBERTbased systems [53] and try to adapt the sequential transfer learning of Natural
Language Inference (NLI) to biomedical question answering. In particular, they
investigate whether learning knowledge of entailment between two sentence pairs
can improve exact answer generation, enhancing their BioBERT-based models
with alternative ne-tuning con gurations based on the MultiNLI dataset [
          <xref ref-type="bibr" rid="ref50">50</xref>
          ].
For ideal answer generation, they develop a deep neural abstractive
summarization model based on BART [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and beam search, with particular focus on
pre-processing and post-processing steps. In particular, alternative systems were
developed either considering the answers predicted by the exact answer
prediction system in their input or not. In the post-processing step, the generated
candidate ideal answers for each question where scored using the predicted
exact answers and some grammar scores provided by the language check tool6. For
factoid and list questions in particular, the BERN [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] tool was also employed
to recognize named entities in the candidate ideal answers for the scoring step.
        </p>
        <p>The \NCU-IISR" team also participated in both parts of phase B,
constructing two BioBERT-based models for extracting the exact answer and ranking the
ideal answers respectively. The rst model is ne-tuned on the BioASQ dataset
formulated as a SQuAD-type QA task that extracts the answer span. For the
second model, they regard the sentences of the provided snippets as candidate
ideal answers and build a ranking model with two parts. First, a BioBERT-based
model takes as input the question and one of the snippet sentences and provides
their representation. Then, a logistic regressor, trained on predicting the
similarity between a question and each snippet sentence, takes this representation
and outputs a score, which is used for selecting the nal ideal answer.</p>
        <p>The \UoT " team participated with three di erent DL approaches for
generating exact answers. In their rst approach, they ne-tune separately two distinct
BioBERT-based models extended with an additional neural layer depending on
the question type, one for yes/no and one for factoid and list questions
together. In their second system, they use a joint-learning setting, where the same
BioBERT layer is connected with both the additional layers and jointly trained
for all types of questions. Finally, in their third system they propose a multi-task
model to learn recognizing biomedical entities and answers to questions
simulta</p>
        <sec id="sec-3-1-1">
          <title>5 https://ai.googleblog.com/2020/05/an-nlu-powered-tool-to-explore-covid-19.html</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>6 https://pypi.org/project/language-check/</title>
          <p>
            neously, aiming at transferring knowledge from the biomedical entity recognition
task to question answering. In particular, they extend their joint BioBERT-based
model with simultaneous training on the BC2GM dataset [
            <xref ref-type="bibr" rid="ref45">45</xref>
            ] for recognizing
gene and protein entities.
          </p>
          <p>
            The \BioNLPer " team also participated in the exact answers part of phase
B, focusing on factoids. They proposed 5 BioBERT-based systems, using
external feature enhancement and auxiliary task methodologies. In particular, in
their \factoid qa model" and \Parameters retrained" systems they consider the
prediction of answer boundaries (start and end positions) as the main task and
the whole answer content prediction as an auxiliary task. In their \Features
Fusion" system they leveraged external features including NER and part-of-speach
(POS) extracted by NLTK [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ] and ScispaCy [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] tools as additional textual
information and fused them with the pre-trained language model representations,
to improve answer boundary prediction. Then, in their \BioFusion" system they
combine the two methodologies together. Finally, their \BioLabel" system
employed the general and biomedical domain corpus classi cation as the auxiliary
task to help answer boundary prediction.
          </p>
          <p>
            The \LabZhu" systems participated in phase B as well, with focus on exact
answers for the factoid and list questions. They treat answer generation as an
extractive machine comprehension task and explore several di erent pretrained
language models, including BERT, BioBERT, XLNet [51] and SpanBERT [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ].
They also follow a transfer learning approach, training the models on the SQuAD
dataset, and then ne-tuning them on the BioASQ datasets. Finally, they also
rely on voting to integrate the results of multiple models. The \umass czi" team
also focused on the exact answer part of phase B, experimenting with
unsupervised representation learning approaches in the context of Biomedical QA. In
particular, they considered pretrained representations based on BioBERT,
SciBERT, and BioSentVec [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and experimented with transferring knowledge from the
SQuAD and PubMedQA datasets in to the BioASQ 8b QA task. Finally, they
also develop a new pre-training method based on a self-supervised de-noising
approach. In this method, they rst generate a QA dataset randomly replacing
entities automatically recognized by PubTator [
            <xref ref-type="bibr" rid="ref48">48</xref>
            ] in PubMed abstracts. Then,
train their model on extracting the span of the new entities given the original
ones as a queries.
          </p>
          <p>
            The \MQ" team, as in past years, focused on ideal answers, approaching
the task as query-based summarisation. In some of their systems the retrain
their previous classi cation and regression approaches [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ] in the new training
dataset. In addition, they also employ reinforcement learning with Proximal
Policy Optimization (PPO) [
            <xref ref-type="bibr" rid="ref44">44</xref>
            ] and two variants to represent the input
features, namely Word2Vec-based and BERT-based embeddings. The \DAIICT "
team also participated in ideal answer generation, using the standard extractive
summarization techniques textrank [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ] and lexrank [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ] as well as sentence
selection techniques based on their similarity with the query. They also modi ed
these techniques investigating the e ect of query expansion based on UMLS [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]
for sentence selection and summarization.
          </p>
          <p>
            Finally, the \sbert " team, also focused on ideal answers. They experimented
with di erent embedding models and multi-task learning in their systems, using
parts from previous \MQU " systems for the pre-processing of data and the
prediction step based on classi cation and regression [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ]. In particular, they
used a Universal Sentence Embedding Model [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] (BioBERT-NLI 7) based on a
version of BioBERT ne-tuned on the the SNLI [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] and the MultiNLI datasets as
in Sentence-BERT [
            <xref ref-type="bibr" rid="ref42">42</xref>
            ]. The features were fed to either a single logistic regression
or classi cation model to derive the ideal answers. Additionally, in a multi-task
setting, they trained the model on both the classi cation and regression tasks,
selecting for the nal prediction one of them.
          </p>
          <p>
            In this challenge too, the open source OAQA system proposed by [52] served
as baseline for phase B exact answers. The system which achieved among the
highest performances in previous versions of the challenge remains a strong
baseline for the exact answer generation task. The system is developed based on
the UIMA framework. ClearNLP is employed for question and snippet parsing.
MetaMap, TmTool [
            <xref ref-type="bibr" rid="ref49">49</xref>
            ], C-Value and LingPipe [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] are used for concept
identication and UMLS Terminology Services (UTS) for concept retrieval. The nal
steps include identi cation of concept, document and snippet relevance based on
classi er components and scoring and nally ranking techniques.
4
4.1
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        7 https://huggingface.co/gsarti/biobert-nli
used for measuring the classi cation performance of the systems. In
particular, the micro F-measure (MiF) and the Lowest Common Ancestor F-measure
(LCA-F) were used to identify the winners for each batch [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. As suggested
by Demsar [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the appropriate way to compare multiple classi cation systems
over multiple datasets is based on their average rank across all the datasets. In
this task, the system with the best performance in a test set gets rank 1.0 for
this test set, the second best rank 2.0 and so on. In case two or more systems tie,
they all receive the average rank. Then, according to the rules of the challenge,
the average rank of each system for a batch is calculated based on the four best
ranks of the system in the ve test sets of the batch. The average rank of each
system, based on both the at MiF and the hierarchical LCA-F scores, for the
three batches of the task are presented in Table 5.
      </p>
      <p>The results in Task 8a show that in all test batches and for both at and
hierarchical measures, the best systems outperform the strong baselines. In
particular, the \dmiip fdu" systems from the Fudan University team achieve the
best performance in all three batches of the task. More detailed results can
be found in the online results page8. Comparing these results with the
corresponding results from previous versions of the task, suggests that both the MTI
baseline and the top performing systems keep improving through the years of
the challenge, as shown in Figure 4.</p>
      <sec id="sec-4-1">
        <title>8 http://participants-area.bioasq.org/results/8a/</title>
        <p>Phase A: In the rst phase of Task 8b, the systems are ranked according to
the Mean Average Precision (MAP) measure for each of the four types of
annotations, namely documents, snippets, concepts and RDF triples. This year, the
calculation of Average Precision (AP) in MAP for phase A was reconsidered as
described in the o cial description of the evaluation measures for Task 8b9. In
brief, since BioASQ3, the participant systems are allowed to return up to 10
relevant items (e.g. documents), and the calculation of AP was modi ed to re ect
this change. However, the number of golden relevant items in the last years have
been observed to be lower than 10 in some cases, resulting to relatively small AP
values even for submissions with all the golden elements. For this reason, this
year, we modi ed the MAP calculation to consider both the limit of 10 elements
and the actual number of golden elements. In Tables 6 and 7 some indicative
preliminary results from batch 2 are presented. The full results are available in
the online results page of Task 8b, phase A10. The results presented here are
preliminary, as the nal results for the task 8b will be available after the manual
assessment of the system responses by the BioASQ team of biomedical experts.</p>
        <p>
          Phase B: In the second phase of task 8b, the participating systems were
expected to provide both exact and ideal answers. Regarding the ideal answers,
the systems will be ranked according to manual scores assigned to them by
the BioASQ experts during the assessment of systems responses [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For the
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>9 http://participants-area.bioasq.org/Tasks/b/eval meas 2020/ 10 http://participants-area.bioasq.org/results/8b/phaseA/</title>
        <p>exact answers, which are required for all questions except the summary ones,
the measure considered for ranking the participating systems depends on the
question type. For the yes/no questions, the systems were ranked according to
the macro-averaged F1-measure on prediction of no and yes answer. For factoid
questions, the ranking was based on mean reciprocal rank (MRR) and for list
questions on mean F1-measure. Some indicative results for exact answers for the
third batch of Task 8b are presented in Table 8. The full results of phase B of
Task 8b are available online11. These results are preliminary, as the nal results
for Task 8b will be available after the manual assessment of the system responses
by the BioASQ team of biomedical experts.</p>
        <p>Figure 5 presents the performance of the top systems for each question type
in exact answers during the eight years of the BioASQ challenge. The diagram
reveals that this year the performance of systems in the yes/no questions keeps
improving. For instance, in batch 3 presented in Table 8, various systems manage
to outperform by far the strong baseline, which is based on a version of the OAQA
system that achieved top performance in previous years. Improvements are also
observed in the preliminary results for list questions, whereas the top system
performance in factoid questions is uctuating in the same range as done last
year. In general, Figure 5 suggests that for the latter types of question there is
still more room for improvement.
11 http://participants-area.bioasq.org/results/8b/phaseB/
Fig. 5. The o cial evaluation scores of the best performing systems in Task B, Phase
B, exact answer generation, across the eight years of the BioASQ challenge. Since
BioASQ6 the o cial measure for Yes/No questions is the macro-averaged F1 score
(macro F1, but accuracy (Acc) is also presented as the former o cial measure. The
results for BioASQ8 are preliminary, as the nal results for Task 8b will be available
after the manual assessment of the system responses.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>This paper provides an overview of the eighth version of the BioASQ tasks
a and b, on biomedical semantic indexing and question answering in English
respectively. These tasks, already established through the previous seven years
of the challenge, together with the new MESINESP task on semantic indexing
of medical content in Spanish, which ran for the rst time, consisted the eighth
edition of the BioASQ challenge.</p>
      <p>
        The overall shift of participant systems towards deep neural approaches,
already noticed in the previous years, is even more apparent this year.
Stateof-the-art methodologies have been successfully adapted to biomedical question
answering and novel ideas have been investigated. In particular, most of the
systems adopted neural embedding approaches, notably based on BERT and
BioBERT models, for both tasks. In the QA task in particular, di erent teams
attempted transferring knowledge from general domain QA datasets, notably
SQuAD, or from other NLP tasks such as NER and NLI, also experimenting with
multi-task learning settings. In addition, recent advancements in NLP, such as
XLNet [51], BART [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and SpanBERT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] have also been tested for the tasks
of the challenge.
      </p>
      <p>Overall, as in previous versions of the tasks, the top preforming systems
were able to advance over the state of the art, outperforming the strong
baselines on the challenging shared tasks o ered by the organizers. Therefore, we
consider that the challenge keeps meeting its goal to push the research frontier
in biomedical semantic indexing and question answering. The future plans for
the challenge include the extension of the benchmark data though a
communitydriven acquisition process.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Google was a proud sponsor of the BioASQ Challenge in 2019. The eighth edition
of BioASQ is also sponsored by the Atypon Systems inc. BioASQ is grateful to
NLM for providing the baselines for task 8a and to the CMU team for providing
the baselines for task 8b. The MESINESP task is sponsored by the Spanish
Plan for advancement of Language Technologies (Plan TL) and the Secretar a
de Estado para el Avance Digital (SEAD). BioASQ is also grateful to LILACS,
SCIELO and Biblioteca virtual en salud and Instituto de salud Carlos III for
providing data for the BioASQ MESINESP task.
51. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.:
Xlnet: Generalized autoregressive pretraining for language understanding. CoRR
abs/1906.08237 (2019), http://arxiv.org/abs/1906.08237
52. Yang, Z., Zhou, Y., Eric, N.: Learning to answer biomedical questions: Oaqa at
bioasq 4b. ACL 2016 p. 23 (2016)
53. Yoon, W., Lee, J., Kim, D., Jeong, M., Kang, J.: Pre-trained Language Model for
Biomedical Question Answering. In: Seventh BioASQ Workshop: A challenge on
large-scale biomedical semantic indexing and question answering (2019)
54. You, R., Liu, Y., Mamitsuka, H., Zhu, S.: Bertmesh: Deep
contextual representation learning for large-scale high-performance mesh
indexing with full text. bioRxiv (2020). https://doi.org/10.1101/2020.07.04.187674,
https://www.biorxiv.org/content/early/2020/07/06/2020.07.04.187674
55. You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., Zhu, S.: Attentionxml: Label
tree-based attention-aware deep model for high-performance extreme multi-label
text classi cation. arXiv preprint arXiv:1811.01727 (2018)
56. Zavorin, I., Mork, J.G., Demner-Fushman, D.: Using learning-to-rank to enhance
nlm medical text indexer results. ACL 2016 p. 8 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Almeida</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Calling attention to passages for biomedical question answering</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>69</volume>
          {
          <fpage>77</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Anastasios</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anastasia</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstantinos</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlos</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marta</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgios</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Overview of bioasq
          <year>2020</year>
          :
          <article-title>The eighth bioasq challenge on largescale biomedical semantic indexing and question answering</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), Thessaloniki, Greece,
          <source>September</source>
          <volume>22</volume>
          {
          <fpage>25</fpage>
          ,
          <year>2020</year>
          , Proceedings. vol.
          <volume>12260</volume>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carpenter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : Lingpipe. Available from World Wide Web: http://alias-i. com/lingpipe (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Balikas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Partalas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosmopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petridis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malakasiotis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baskiotis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artieres</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallinari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Evaluation framework speci cations</article-title>
          .
          <source>Project deliverable D4</source>
          .1,
          <string-name>
            <surname>UPMC</surname>
          </string-name>
          (
          <volume>05</volume>
          /
          <year>2013</year>
          2013)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Scibert: A pretrained language model for scienti c text</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>10676</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system (umls): integrating biomedical terminology</article-title>
          .
          <source>Nucleic acids research 32(suppl 1)</source>
          ,
          <source>D267{D270</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angeli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          .
          <source>arXiv preprint arXiv:1508.05326</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <issue>8</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhillon</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>X-bert: extreme multilabel text classi cation with using bidirectional encoder representations from transformers</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>02331</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Biosentvec: creating sentence embeddings for biomedical texts</article-title>
          .
          <source>In: 2019 IEEE International Conference on Healthcare Informatics (ICHI)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Electra:
          <article-title>Pre-training text encoders as discriminators rather than generators</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>10555</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiela</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrault</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bordes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          .
          <source>arXiv preprint arXiv:1705.02364</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamurias</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>MER: a shell script and annotation server for minimal named entity recognition and linking</article-title>
          .
          <source>Journal of Cheminformatics</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>58</volume>
          (dec
          <year>2018</year>
          ). https://doi.org/10.1186/s13321-018-0312-9
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Demsar</surname>
          </string-name>
          , J.:
          <article-title>Statistical comparisons of classi ers over multiple data sets</article-title>
          .
          <source>Journal of Machine Learning Research 7</source>
          ,
          <issue>1</issue>
          {
          <fpage>30</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>NAACL HLT</source>
          2019
          <article-title>- 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies -</article-title>
          <source>Proceedings of the Conference</source>
          <volume>1</volume>
          (
          <issue>Mlm</issue>
          ),
          <volume>4171</volume>
          {4186 (oct
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Erkan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          : Lexrank:
          <article-title>Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>Journal of arti cial intelligence research 22</source>
          ,
          <volume>457</volume>
          {
          <fpage>479</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Gormley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tong</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Elasticsearch: The de nitive guide: A distributed real-time search and analytics engine</article-title>
          . \
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Gururangan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marasovic</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swayamdipta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Don't stop pretraining: Adapt language models to domains and tasks</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>10964</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prabhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking &amp; Other Missing Label Applications</article-title>
          .
          <source>In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16</source>
          . pp.
          <volume>935</volume>
          {
          <fpage>944</fpage>
          . ACM Press, New York, New York, USA (
          <year>2016</year>
          ). https://doi.org/10.1145/2939672.2939756
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhingra</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Pubmedqa: a dataset for biomedical research question answering</article-title>
          . arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>06146</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Weld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.S.</given-names>
            ,
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <surname>O.</surname>
          </string-name>
          :
          <article-title>Spanbert: Improving pre-training by representing and predicting spans</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>8</volume>
          ,
          <issue>64</issue>
          {
          <fpage>77</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>So</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A neural named entity recognition and multi-type normalization tool for biomedical text mining</article-title>
          .
          <source>IEEE Access 7</source>
          ,
          <issue>73729</issue>
          {
          <fpage>73740</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Kosmopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Partalas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Evaluation measures for hierarchical classi cation: a uni ed view and novel approaches</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>29</volume>
          (
          <issue>3</issue>
          ),
          <volume>820</volume>
          {
          <fpage>865</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Kotitsas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Apidianaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Embedding biomedical ontologies by jointly encoding network structure and textual node descriptors</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>05939</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing</article-title>
          .
          <source>In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          . pp.
          <volume>66</volume>
          {
          <fpage>71</fpage>
          . Association for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2018</year>
          ). https://doi.org/10.18653/v1/
          <fpage>D18</fpage>
          -2012
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>So</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
          </string-name>
          , J.:
          <article-title>Biobert: pretrained biomedical language representation model for biomedical text mining</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>08746</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghazvininejad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohamed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Nltk: the natural language toolkit</article-title>
          .
          <source>arXiv preprint cs/0205028</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Korotkov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Zero-shot neural retrieval via domain-targeted synthetic query generation</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>14503</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Textrank:
          <article-title>Bringing order into text</article-title>
          .
          <source>In: Proceedings of the 2004 conference on empirical methods in natural language processing</source>
          . pp.
          <volume>404</volume>
          {
          <issue>411</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Molla</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Classi cation betters regression in query-based multidocument summarisation techniques for question answering</article-title>
          .
          <source>In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          . pp.
          <volume>624</volume>
          {
          <fpage>635</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Mork</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Recent enhancements to the nlm medical text indexer</article-title>
          .
          <source>In: Proceedings of Question Answering Lab at CLEF</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Nentidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougiatiotis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krithara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Results of the seventh edition of the bioasq challenge</article-title>
          .
          <source>In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          . pp.
          <volume>553</volume>
          {
          <fpage>568</fpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -43887-6 51
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ammar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Scispacy: Fast and robust models for biomedical natural language processing</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>07669</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Ozyurt</surname>
            ,
            <given-names>I.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandrowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grethe</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          :
          <article-title>Bio-answer nder: a system to nd answers to questions from biomedical texts</article-title>
          .
          <source>Database</source>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
          </string-name>
          , J., Cheng, X.:
          <article-title>Deeprank: A new deep architecture for relevance ranking in information retrieval</article-title>
          .
          <source>In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management</source>
          . pp.
          <volume>257</volume>
          {
          <issue>266</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brokos</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>AUEB at BioASQ 7: Document and Snippet Retrieval</article-title>
          . In: Seventh BioASQ Workshop:
          <article-title>A challenge on large-scale biomedical semantic indexing and question answering (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stavropoulos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Biomrc: A dataset for biomedical machine reading comprehension</article-title>
          .
          <source>In: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing</source>
          . pp.
          <volume>140</volume>
          {
          <issue>149</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>You</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamitsuka</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Deepmesh: deep semantic representation for improving large-scale mesh indexing</article-title>
          .
          <source>Bioinformatics</source>
          <volume>32</volume>
          (
          <issue>12</issue>
          ),
          <year>i70</year>
          {
          <fpage>i79</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>Proceedings of the Conference on Empirical Methods in Natural Language</source>
          Processing pp.
          <volume>31</volume>
          {
          <issue>40</issue>
          (feb
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1802</year>
          .05365
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Rae</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mork</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Convolutional Neural Network for Automatic MeSH Indexing</article-title>
          . In: Seventh BioASQ Workshop:
          <article-title>A challenge on large-scale biomedical semantic indexing and question answering (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Rajpurkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Lopyrev</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Squad:
          <volume>100</volume>
          ,000+
          <article-title>questions for machine comprehension of text</article-title>
          .
          <source>arXiv preprint arXiv:1606.05250</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Sentence-bert: Sentence embeddings using siamese bertnetworks</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Ribadas</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Campos</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darriba</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          :
          <article-title>CoLe and</article-title>
          UTAI at BioASQ 2015:
          <article-title>Experiments with similarity based descriptor assignment</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>1391</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Schulman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolski</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimov</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Proximal policy optimization algorithms</article-title>
          .
          <source>arXiv preprint arXiv:1707.06347</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanabe</surname>
            ,
            <given-names>L.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>nee</surname>
            <given-names>Ando</given-names>
          </string-name>
          , R.J.,
          <string-name>
            <surname>Kuo</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klinger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganchev</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of biocreative ii gene mention recognition</article-title>
          .
          <source>Genome biology</source>
          <volume>9</volume>
          (
          <issue>S2</issue>
          ),
          <source>S2</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Tsatsaronis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balikas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malakasiotis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Partalas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zschunke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alvers</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weissenborn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krithara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petridis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polychronopoulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almirantis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlopoulos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baskiotis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallinari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artieres</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngonga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrio-Alvers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schroeder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>An overview of the bioasq large-scale biomedical semantic indexing and question answering competition</article-title>
          .
          <source>BMC Bioinformatics</source>
          <volume>16</volume>
          ,
          <issue>138</issue>
          (
          <year>2015</year>
          ). https://doi.org/10.1186/s12859-015-0564-6
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laliotis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markontanatos</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlahavas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Large-Scale Semantic Indexing of Biomedical Publications</article-title>
          . In: 1st BioASQ Workshop:
          <article-title>A challenge on large-scale biomedical semantic indexing and question answering (</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kao</surname>
            ,
            <given-names>H.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Pubtator: a web-based text mining tool for assisting biocuration</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>41</volume>
          (
          <issue>W1</issue>
          ),
          <source>W518{W522</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          49.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Beyond accuracy: creating interoperable and scalable text-mining web services</article-title>
          .
          <source>Bioinformatics</source>
          (Oxford, England)
          <volume>32</volume>
          (
          <issue>12</issue>
          ),
          <year>1907</year>
          {
          <volume>10</volume>
          (
          <year>2016</year>
          ). https://doi.org/10.1093/bioinformatics/btv760
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          50.
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nangia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.:</given-names>
          </string-name>
          <article-title>A broad-coverage challenge corpus for sentence understanding through inference</article-title>
          .
          <source>arXiv preprint arXiv:1704.05426</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>