<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of MESINESP8, a Spanish Medical Semantic Indexing Task within BioASQ 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carlos Rodriguez-Penagos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasios Nentidis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aitor Gonzalez-Agirre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alejandro Asensio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jordi Armengol-Estape</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Krithara</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marta Villegas</string-name>
          <email>marta.villegasg@bsc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Paliouras</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Krallinger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Barcelona Supercomputing Center</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Center for Scienti c Research \Demokritos"</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National and Kapodistrian University of Athens</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present an overview of the novel MESINESP Task on medical semantic indexing in Spanish within the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. MESINESP represents the rst attempt to generate resources for the development and evaluation semantic indexing strategies specialized on health-related content in Spanish. We have generate several publicly accessible Gold Standard collections of manually indexed content covering medical literature, clinical trials and health project descriptions associated to controlled terminologies in the form of the hierarchical DeCS vocabulary. Manual indexing of MESINESP documents was carried out by professional medical literature indexers. They used an indexing web interface particularly adapted for this task. The results obtained by participating teams was promising, showing that training data of semantically indexed medical literature can also serve to implement automatic indexing systems that assist manual indexing of other types of documents like clinical trials. MESINESP corpus: https://zenodo.org/record/3746596.Xo9WTI zaF A Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessaloniki, Greece.</p>
      </abstract>
      <kwd-group>
        <kwd>Biomedical knowledge</kwd>
        <kwd>Semantic Indexing</kwd>
        <kwd>Question Answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        There is a pressing need to facilitate more sophisticated search queries to retrieve
relevant health-related content, in particular medical publications. This became
clear in case of the recent COVID-19 pandemic, where experts required nding
medical articles describing certain aspects of this novel disease such as symptoms,
co-morbidities or treatment related aspects [
        <xref ref-type="bibr" rid="ref3">16, 14, 3</xref>
        ]. Moreover, highly
specialized information needs on complex subjects, for instance to select important
articles to elaborate publications such as systematic reviews do require complex
semantic search capabilities [8]. With the rapid accumulation of biomedical
and clinical research publications, healthcare experts are increasingly relying on
the results of so-called indexing initiatives to build more sophisticated semantic
search queries that incorporate indexed terms from structured controlled
vocabularies. Figure 1 provides a summary of the importance of semantic indexing and
retrieval systems of medical literature content from the perspective of various
stakeholders and end users.
      </p>
      <p>This paper aims at presenting the used data, settings and results of the
MESINESP shared task, which was part of the CLEF-BioASQ 2020 challenge.
Towards this direction we provide an overview of the MESINESP shared task
and the corresponding corpus and additional data resources prepared for this
track. We present a brief overview of the systems developed by the participating
teams for the di erent tasks. Detailed descriptions for some of the systems are
available in the proceedings of the lab. We focus on evaluating the performance
of semantic indexing strategies participating in this track systems using
stateof-the-art evaluation measures. Finally we sum up the conclusion and future
outlook of the MESINESP e ort.</p>
      <p>This year, the eighth version of the BioASQ challenge comprised three tasks:
(1) a large-scale biomedical semantic indexing task (task 8a), (2) a biomedical
question answering task (task 8b), both considering documents in English, and
(3) a new task on medical semantic indexing in Spanish (task MESINESP). A
detailed overview of these tasks and the general structure of BioASQ are available
in [19]. In this paper, we describe the new MESINESP task on semantic indexing
of medical content written in Spanish (medical literature abstracts, clinical trial
summaries and health-related project descriptions), which was introduced this
year [11], providing statistics about the dataset developed for it.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data and Resources</title>
      <p>There is a pressing need to improve the access to information comprised in health
and biomedicine related documents, not only by professional medical users buy
also by researches, public healthcare decision makers, pharma industry and
particularly by patients. Currently, most of the Biomedical NLP and IR research is
being done on content in English, despite the fact that a large volume of
medical documents is published in other languages including Spanish. Key resources
like PubMed focus primarily on data in English, but it provides outlinks also
to articles originally published in Spanish. For English, Task 8a's aim was to
classify articles from the PubMed/MedLine4 digital library into concepts of the
MeSH hierarchy. In particular, new PubMed articles that are not yet annotated
by the indexers in NLM are gathered to form the test sets for the evaluation
of the participating systems. The performance of the participating systems was
calculated using standard at information retrieval measures, as well as,
hierarchical ones, when the annotations from the NLM indexers become available.</p>
      <p>Task 8a provided for training a dataset of 14,913,939 articles with 12.68 labels
per article.</p>
      <p>The main aim of MESINESP is to promote the development of semantic
indexing tools of practical relevance of non-English content, determining the
current-state-of-the art, identifying challenges and comparing the strategies and
results to those published for English data.
4 https://pubmed.ncbi.nlm.nih.gov/</p>
      <p>MESINESP is focused on healthcare content in Spanish: IBECS5, LILACS6,
REEC 7 and FIS-ISCIII 8. In this task, the participants were asked to
classify new IBECS and LILACS documents in Spanish. The classes come from the
DeCS vocabulary 9 which was originally developed from the MeSH hierarchy. At
present, this annotation is done manually, being costly and labor-intensive. Thus
manual semantic indexing of Spanish medical literature would greatly bene t
from a more systematic indexing strategy or the availability of manual indexing
assistance software. Due to the burden of manual indexing, there is also a
considerable delay from the date a record is published until is manually indexed,
specially when compared to indexing speed of other databases like PubMed. The
MESINESP task was promoted within the e orts of the Spanish Government's
Plan for Promoting Language Technologies (Plan TL), that aims to promote the
development of natural language processing, machine translation and
conversational systems in Spanish and co-o cial languages in Spain.
2.1</p>
      <p>Description of the datasets for MESINESP, and the annotation
e ort
First, we performed a web crawling against https://pesquisa.bvsalud.org/ (IBECS
and LILACS) to obtain 1.1 million articles, extracting the title and the abstract
(not the full text) among other article data such as journal and date of
publication.</p>
      <p>A training dataset 10 was released with 369,368 articles manually annotated
with DeCS codes (Descriptores en Ciencias de la Salud, derived and extended
from MeSH terms)11. Then, 1500 articles, published from 2018 onwards, were
selected and annotated by 7 experts in the eld of clinical text indexing with
DeCS codes. Figure 2 shows a screen shot of the interface that was used for the
rst phase of the manual Gold Standard semantic indexing process.</p>
      <p>Those articles have been distributed in a way that each article is annotated,
at least, by two di erent annotators. The rst phase consisted in adding DeCS
codes to each document, and the second phase was about validating those DeCS
codes viewing suggestions from codes added by other annotators on that same
document (simple automatic DeCS term gazetteer look-up suggestions were also
5 IBECS includes bibliographic references from scienti c articles in health sciences</p>
      <p>published in Spanish journals. http://ibecs.isciii.es
6 LILACS is the most important and comprehensive index of scienti c and technical
literature of Latin America and the Caribbean. It includes 26 countries, 882 journals
and 878,285 records, 464,451 of which are full texts https://lilacs.bvsalud.org
7 Registro Espan~ol de Estudios Cl nicos, a database containing summaries of clinical</p>
      <p>trials https://reec.aemps.es/reec/public/web.html
8 public healthcare project proposal summaries (Proyectos de
Investigacion en Salud, disen~ado por el Instituto de Salud Carlos III, ISCIII)
https://portal s.isciii.es/es/Paginas/inicio.aspx
9 http://decs.bvs.br/I/decsweb2019.htm
10 https://zenodo.org/record/3826492
11 29,716 come directly from MeSH and 4,402 are exclusive to DeCS
shown). This process results in the Gold Standard manual corpus comprising the
development and test set records. Figure 3 shows a screen short of the semantic
indexing validation interface used during the corpus construction phase.</p>
      <p>A further additional background dataset was produced from diverse sources,
including machine-translated text. This set did not have manual annotations but
was distributed to teams to generate a silver standard corpus.</p>
      <p>Consistently, the di erent collections averaged, per document, around 10
sentences, contained 13 DeCS codes, and 300 words, of which between 130 and
140 were unique ones.</p>
      <p>Dataset docs DeCS unique avg tokens avg unique tokens max.tokens avg DeCS per doc avg unique DeCS
Dev. (i) 750 6540 2756 295.2 141.1 3026 8.7 8.7
Dev. (u) 750 9847 3600 295.2 141.1 3026 13.1 13.1
TEST 938 12429 4061 273.8 135.3 2552 13.2 13.2
TRAIN 318658 2588925 23423 192.6 106.5 1725 8.1 7.2
Table 1. Core statistics for datasets used in MESINESP. (i) intersection, (u) union,
Dev (development set)</p>
      <p>In order to explore the diversity of content from this dataset, we
generated clusters from the titles of semantically similar records from the background
dataset distributed to the participants for automatic annotation. We used the
lingo clustering algorithm from the Carrot Workbench project. 12. The resulting
26 clusters are shown in ZZZ using the Foam visualization (see Figure 4).</p>
    </sec>
    <sec id="sec-3">
      <title>Results and participation</title>
      <p>For the newly introduced MESINESP8 task, 6 teams from China, India, Portugal
and Spain participated and results from 24 di erent systems were submitted.</p>
      <p>The approaches were mostly the same as the ones used on the comparable
English task (8a), and included KNN and Support Vector Machine classi ers, as
well as deep learning frameworks like X-BERT and multilingual-BERT.</p>
      <p>
        The LASIGE team from the University of Lisboa implemented a \X-BERT
BioASQ" system that combines a solution based on Extreme Multi-Label
Classi cation (XMLC) with a Named-Entity-Recognition (NER) tool. In particular,
their system is based on X-BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an approach to scale BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to XMLC,
combined with the use of the MER [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] tool to recognize MeSH terms in the
abstracts of the articles. The system is structured into three steps. The rst step is
the semantic indexing of the labels into clusters using ELMo [13]; then a second
step matches the indices using a Transformer architecture; and nally, the third
step focuses on ranking the labels retrieved from the previous indices.
      </p>
      <p>The Fudan University team also builds upon their previous \AttentionXML" [20]
and \DeepMeSH " [12] systems as well their new \BERTMeSH " system, which
are based on document to vector (d2v) and tf-idf feature embeddings, learning
to rank (LTR) and DL-based extreme multi-label text classi cation, Attention
Mechanisms and Probabilistic Label Trees (PLT) [9].</p>
      <p>The Vigo and Grenada Universities \Iria" systems [15] implemented a
multilabel k-NN classi er backed by an Apache Lucene indexing. In the o cial runs,
only stemming and selected stem bigrams with high correlation were employed in
citation representation and indexing. Finally, candidate subjects provided by the
k-NN clasi er were enriched adding exact matches of subject labels taken from
the abstract text using Apache UIMA ConceptMapper. For the MESINESP8
task runs, the k-NN approach remained the same. Several lingustically
motivated text representations (content word lemmas, syntactic dependence triples,
NP chunks) were tested using the Spanish models from the spaCy NLP toolkit
to extract them from abstracts text.</p>
      <p>
        A simple lookup system was provided as a baseline for the MESINESP task.
This system extracts information from an annotated list. Then checks whether,
in a set of text documents, the annotation are present. It basically gets the
intersection between tokens in annotations and tokens in words. This simple
approach obtains a MiF of 0.2695.
Standard at and hierarchical evaluation measures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] were used for measuring
the classi cation performance of the systems. In particular, the micro F-measure
(MiF) and the Lowest Common Ancestor F-measure (LCA-F) were used to
identify the winners for each batch [10].
      </p>
      <p>The results in Task 8a show that in all test batches and for both at and
hierarchical measures, the best systems outperform the strong baselines. In
particular, the \dmiip fdu" systems from the Fudan University team achieve the
best performance in all three batches of the task. More detailed results can
be found in the online results page13. Comparing these results with the
corresponding results from previous versions of the task, suggests that both the MTI
baseline and the top performing systems keep improving through the years of
the challenge.</p>
      <p>
        In case of the MESINESP task, it seems that is was more di cult when
compared to results obtained for data in English (i.e. Task 8a), but overall we
believe the results were pretty good taking into account that the provided data
collection were considerably smaller. One problem with the medical semantic
concept indexing in Spanish, at least for diagnosis or disease related terms,
is the uneven distribution and high variability. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but the Task results show
that this fact does not prevent good performance by advanced implementations.
Compared to the setting for English, the overall training dataset was not only
signi cantly smaller, but also the track evaluation test data set contained also
clinical trial summaries and healthcare project summaries. Moreover, in case
of the provided training data, two di erent indexing settings were used by the
literature databases: IBECS has a more centralized manual indexing contracting
system, while in case of LILACS a number of records were indexed in a sort of
distributed community human indexer e ort. The training set contained 23,423
unique codes, while the 911 articles in the evaluation set contained almost 4,000
correct DeCS codes. The best predictions, by Fudan University, scored a MIF
(micro F-measure) of 0.4254 MiF using their AttentionXML with
multilingualBERT system, compared to the baseline score of 0.2695. Table 3 shows the results
of the runs for this task. As a matter of fact, the ve best scores were from Fudan.
This team also outperformed all others in the comparable 8a indexing Task in
English.
      </p>
      <p>Although MiF represent the o cial competition metric, other metrics are
provided for completeness 14.
13 http://participants-area.bioasq.org/results/8a/
14 It is noteworthy that another team (Anuj-ml, from India) that was not among the
highest scoring on MiF, nevertheless scored considerably higher than other teams</p>
      <p>Dataset releases and creation of a Silver Standard
A Silver Standard that contains 5.851.870 entries was created from the
submissions, that is automatically generated indexing results by participating teams
for a collection of 23.873 documents 15. Each entry in the MESINESP silver
standard corpus contains:
{ Submission/Run Name
{ The document Id
{ Our own MESINESP Id
{ The source DB
{ A DeCS code
{ The Spanish Term or descriptor
{ The MiF (Micro-F1) scored by this run
with Precision metrics such as EBP (Example Based Precision), MaP (Macro
Precision) and MiP (Micro Precision). Unfortunately, at this time we have not received
details on their system implementation.
15 https://zenodo.org/record/3946558
{ The MiR (Micro-Recall) scored by this run
{ The MiP (Micro-Precision) scored by this run
{ The Accuracy scored by this run
{ A consensus across all runs (e.g. how many runs attributed this DeCS to
this document)</p>
      <p>The last ve elds can help asses the reliability of the automatic annotation.
Since some of the teams used various non-o cial sources to train their systems,
there were some DeCS codes that were not included in the mapping le
distributed/used or in the training dataset, and they were removed from the Silver
Standard since no descriptor could be linked to it. 513 DeCS codes were thus
removed, some appearing only once, but at least 4 of the appearing hundred of
times.</p>
      <p>In addition to the automatically annotated Silver Standard, a the full,
manuallyannotated dataset from the 7 human annotator will be releases, containing 66.271
datapoints with:
{ Annotator ID
{ DocumentId
{ DeCS Code
{ Annotation Timestamp
{ IF validated or not by another annotator
{ Spanish Descriptor
{ MESINESP doc ID
{ Document Source</p>
      <p>
        We have also generated additional resources of relevance for this task,
including a machine translated collection of PubMed abstracts generated 16 using
a system adapted for medical text translation English-Spanish [17] that
particpated in the medical machine translation track of WMT 2019 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Moreover,
participants had access to medical word embeddings17 for Spanish [18].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and conclusions</title>
      <p>This paper provides an overview of the MESINESP Task within the eighth
BioASQ challenge (CLEF 2020). The new MESINESP task on semantic
indexing of medical content in Spanish ran for the rst time and showed strong
results across the board and a good international participation. The addition of
the new challenging task on medical semantic indexing in Spanish, revealed that
in a context beyond the English language, there is even more room for
improvement, highlighting the importance of the availability of adequate resources for
the development and evaluation of systems to e ectively help biomedical experts
dealing with non-English resources.
16 https://zenodo.org/record/3826554
17 https://zenodo.org/record/3744326</p>
      <p>The overall shift of participant systems towards deep neural approaches,
already noticed in the previous years, is even more apparent this year. Most of
the systems adopted on neural embedding approaches, notably based on BERT
and BioBERT models, for all tasks of the challenge.</p>
      <p>Overall, as in previous versions of the challenge, the top preforming systems
were able to advance over the state of the art, outperforming the strong baselines
on the challenging shared tasks o ered by the organizers. In addition, a very
valuable Silver Standard resource with 5.8 data points will enhance the semantic
indexing resources for Spanish. Therefore, we consider that the challenge keeps
meeting its goal to push the research frontier in biomedical semantic indexing
and question answering. The future plans for the challenge include the extension
of the benchmark data though a community-driven acquisition process.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The MESINESP task is sponsored by the Spanish Plan for advancement of
Language Technologies (Plan TL) and the Secretar a de Estado para el Avance
Digital (SEAD). BioASQ is also grateful to LILACS, SCIELO and Biblioteca
virtual en salud and Instituto de salud Carlos III for providing data for the
BioASQ MESINESP task.
8. Giustini, D., Boulos, M.N.K.: Google scholar is not enough to be used alone for
systematic reviews. Online journal of public health informatics 5(2), 214 (2013)
9. Jain, H., Prabhu, Y., Varma, M.: Extreme Multi-label Loss Functions for
Recommendation, Tagging, Ranking &amp; Other Missing Label Applications. In: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining - KDD '16. pp. 935{944. ACM Press, New York, New York, USA
(2016). https://doi.org/10.1145/2939672.2939756
10. Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I.:
Evaluation measures for hierarchical classi cation: a uni ed view and novel
approaches. Data Mining and Knowledge Discovery 29(3), 820{865 (2015)
11. Krallinger, M., Krithara, A., Nentidis, A., Paliouras, G., Villegas, M.: Bioasq at
clef2020: Large-scale biomedical semantic indexing and question answering. In:
European Conference on Information Retrieval. pp. 550{556. Springer (2020)
12. Peng, S., You, R., Wang, H., Zhai, C., Mamitsuka, H., Zhu, S.: Deepmesh: deep
semantic representation for improving large-scale mesh indexing. Bioinformatics
32(12), i70{i79 (2016)
13. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K.,
Zettlemoyer, L.: Deep contextualized word representations. Proceedings of the
Conference on Empirical Methods in Natural Language Processing pp. 31{40 (feb 2018),
http://arxiv.org/abs/1802.05365
14. Rajkumar, R.P.: Covid-19 and mental health: A review of the existing literature.</p>
      <p>Asian journal of psychiatry p. 102066 (2020)
15. Ribadas, F.J., De Campos, L.M., Darriba, V.M., Romero, A.E.: CoLe and UTAI
at BioASQ 2015: Experiments with similarity based descriptor assignment. CEUR
Workshop Proceedings 1391 (2015)
16. Salehi, S., Abedi, A., Balakrishnan, S., Gholamrezanezhad, A.: Coronavirus disease
2019 (covid-19): a systematic review of imaging ndings in 919 patients. American
Journal of Roentgenology pp. 1{7 (2020)
17. Soares, F., Krallinger, M.: Bsc participation in the wmt translation of
biomedical abstracts. In: Proceedings of the Fourth Conference on Machine Translation
(Volume 3: Shared Task Papers, Day 2). pp. 175{178 (2019)
18. Soares, F., Villegas, M., Gonzalez-Agirre, A., Krallinger, M., Armengol-Estape,
J.: Medical word embeddings for spanish: Development and evaluation. In:
Proceedings of the 2nd Clinical Natural Language Processing Workshop. pp. 124{133
(2019)
19. Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers,
M.R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D.,
Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artieres, T., Ngonga, A.,
Heino, N., Gaussier, E., Barrio-Alvers, L., Schroeder, M., Androutsopoulos, I.,
Paliouras, G.: An overview of the bioasq large-scale biomedical semantic
indexing and question answering competition. BMC Bioinformatics 16, 138 (2015).
https://doi.org/10.1186/s12859-015-0564-6
20. You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., Zhu, S.: Attentionxml: Label
tree-based attention-aware deep model for high-performance extreme multi-label
text classi cation. arXiv preprint arXiv:1811.01727 (2018)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Almagro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unanue</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fresno</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montalvo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Icd-10 coding of spanish electronic discharge summaries: An extreme classi cation problem</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <issue>100073</issue>
          {
          <fpage>100083</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Balikas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Partalas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosmopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petridis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malakasiotis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavlopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Androutsopoulos</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baskiotis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaussier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artieres</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallinari</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Evaluation framework speci cations</article-title>
          .
          <source>Project deliverable D4</source>
          .1,
          <string-name>
            <surname>UPMC</surname>
          </string-name>
          (
          <volume>05</volume>
          /
          <year>2013</year>
          2013)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bansal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <source>Cardiovascular disease and covid-19. Diabetes &amp; Metabolic Syndrome: Clinical Research &amp; Reviews</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bawden</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Cohen,
          <string-name>
            <given-names>K.B.</given-names>
            ,
            <surname>Grozea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Yepes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            ,
            <surname>Kittner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Mah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Neveol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Neves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          , et al.:
          <article-title>Findings of the wmt 2019 biomedical translation shared task: Evaluation for medline abstracts and biomedical terminologies</article-title>
          .
          <source>In: Proceedings of the Fourth Conference on Machine Translation (Volume</source>
          <volume>3</volume>
          :
          <string-name>
            <given-names>Shared</given-names>
            <surname>Task</surname>
          </string-name>
          <string-name>
            <surname>Papers</surname>
          </string-name>
          ,
          <source>Day 2)</source>
          . pp.
          <volume>29</volume>
          {
          <issue>53</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <issue>5</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhillon</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>X-bert: extreme multilabel text classi cation with using bidirectional encoder representations from transformers</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>02331</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamurias</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>MER: a shell script and annotation server for minimal named entity recognition and linking</article-title>
          .
          <source>Journal of Cheminformatics</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>58</volume>
          (dec
          <year>2018</year>
          ). https://doi.org/10.1186/s13321-018-0312-9
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>NAACL HLT</source>
          2019
          <article-title>- 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies -</article-title>
          <source>Proceedings of the Conference</source>
          <volume>1</volume>
          (
          <issue>Mlm</issue>
          ),
          <volume>4171</volume>
          {4186 (oct
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>