<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Madrid, Spain
* Corresponding author.
$ andrey.sakhovskiy@gmail.com (A. Sakhovskiy); louk_nat@mail.ru (N. Loukachevitch); tutubalinaev@gmail.com
(E. Tutubalina)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of the BioASQ BioNNE-L Task on Biomedical Nested Entity Linking in CLEF 2025</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrey Sakhovskiy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natalia Loukachevitch</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Tutubalina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Articial Intelligence Research Institute</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kazan Federal University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Sber AI</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Skoltech</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The task of biomedical entity linking (EL), which is intended to normalize a free-form textual entity to a concept from a standardized domain-specic vocabulary, is foundational for factuality-sensitive applications. Despite vast research on EL, modern methods ignore the nested structure of longer entities, which may provide vital context for joint normalization of nested entities. This paper presents an ocial results report for the BioNNE-L, a shared task on Biomedical Nested Named Entity Linking conducted within BioASQ 2025 Workshop on biomedical semantic indexing and question answering. The shared task included three subtasks organized into two evaluation tracks: monolingual track with (i) English and (ii) Russian subtasks, and (iii) multilingual track combining the data from the two monolingual subtasks. For evaluation, two novel test sets of annotated entities are released, each containing 154 PubMed abstracts in English and Russian. The evaluation of system submissions from 7 participating teams has revealed the e‌ectiveness of small domain-specic models for nested entity linking even in the era of large language models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BioNLP</kwd>
        <kwd>Biomedical NLP</kwd>
        <kwd>Nested Entity Linking</kwd>
        <kwd>Biomedical Text Mining</kwd>
        <kwd>Domain-specic Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. BioNNE-L Shared Task</title>
      <p>
        In the BioNNE-L Shared Task, we address the medical entity linking task, also known as Medical
Concept Normalization (MCN), which is to map given entities to the most relevant vocabular entries
from an external source, e.g., concepts from the UMLS metathesaurus [13] identied with concept unique
identiers (CUIs). Although the task has been widely explored in recent years, existing approaches
usually treat each entity individually, medical entities ofien form a nested structure, where an entity
can be a subpart of another entity. One of the key features of BioNNE-L is the focus on nested entities
that are (i) derived from the MCN annotation of the NEREL-BIO corpus [
        <xref ref-type="bibr" rid="ref10 ref4">4, 10</xref>
        ] and (ii) supplemented
by newly annotated data in both English and Russian. The annotated entity types are disorders (DISO),
anatomical structures (ANAT), and chemicals (CHEM). The competition was organized into three
subtasks that fell under two evaluation tracks:
1. Monolingual track that treated English and Russian data independently;
2. Bilingual track that required a single bilingual model for the combined Russian and English
data.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        Training and validation sets for the BioNNE-L competition are based on the NEREL-BIO dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
additional annotated texts for the BioNNE competition organized in 2024 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. NEREL-BIO is a corpus
of PubMed abstracts written in Russian and English. It enhances the NEREL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] dataset, originally
designed for the general domain, by incorporating biomedical entity types. Biomedical entity types in
NEREL-BIO are annotated according to UMLS denitions of relevant concepts. All the abstracts are
annotated in the BRAT format [14].
      </p>
      <p>Figures 1 and 2 present parallel examples of nested named entities in NEREL-BIO for one abstract.
Table 1 provides a comprehensive list of entity types, along with their explanations and examples.</p>
      <p>Compared to the original NEREL-BIO and BioNNE datasets, we selected only three most common
entity types for the BioNNE-L competition: disorders (DISO), anatomical structures (ANAT ), and
chemicals (CHEM).
any deviations from normal state lumbar vertebral canal stenosis, exogenous allergic
of organ- ism: diseases, symptoms, alveolitis, appendicitis, haemorrhoids, magnesium
abnormality of organ, excluding deficiency dysfunctions, arteriovenous angiodysplasia,
injuries or poisoning type 2 diabetes mellitus
chemicals including legal and venlafaxine, resistin, lipoprotein, mydocalm-richter,
illegal drugs, biological molecules leptin, melatonin, opioid, iodine, adrenalin, isotonic NaCl
solution
organs, body parts, cells and cell epidermal nerve fibers, skin biopsy specimens, tumor
components tissue, chiasmatic-sellar area, blood, low back, eye, bone,
brain, lower limb, oral cavity</p>
      <p>The resulting dataset comprises 662 annotated PubMed abstracts in Russian and 104 parallel abstracts
in Russian and English. 104 parallel abstracts were randomly split for training and validation sets for
each subtask. A novel test set was developed for the shared task, consisting of 154 abstracts in English
and Russian. Russian and English texts in dev and test sets are parallel texts written by the authors.
The Russian training set contains Russian variants of English training texts. When annotating UMLS
links, annotators worked with both Russian and English parallel texts and labeled the same entities and
created the same links, if possible.</p>
      <p>Table 2 shows the number of entities represented in each part of the data set. Observations can
be summarized as follows. First, entities labeled as DISO and ANAT are the most frequent across
all sets, with DISO being particularly prevalent in both training and test sets. Second, it can be seen
that the numbers of entities in the English test set (EN test) and the Russian test set (RU test) are
relatively comparable: the overall di‌erence in numbers is about 7%. The English test set also shows a
slightly higher number of unique CUIs, e.g. for DISO, 879 (En) vs. 770 (Ru). This may indicate that the
English test data might pose a greater normalization challenge. Third, the size of the normalization
dictionary reects the much greater maturity and coverage of UMLS for English biomedical texts. The
size di‌erence in dictionary coverage suggests that English normalization benet from broader recall,
while Russian normalization faces potential coverage gaps and might need domain-specic expansions.
The ANAT dictionary is much smaller than DISO or CHEM for both languages. This likely reects a
more nite set of anatomical terms compared to the expansive terminologies for diseases and chemicals.</p>
      <p>Normalization Dictionary As a normalization dictionary, we collect the English and Russian UMLS
concepts ltered by the concept types DISO, CHEM, and ANAT. In UMLS, each concept is identied
with a Concept Unique Identied (CUI) and a set of concept names in di‌erent languages including
English and Russian.</p>
      <sec id="sec-3-1">
        <title>3.1. BioNNE-L Challenges</title>
        <p>
          The two key challenges of BioNNE-L data are inherited from NEREL-BIO [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. First, the data exhibit a
high level of nestedness, i.e., cases where a shorter entity is a subpart of a longer one. Some illustrative
examples of nested entities are presented in Figure 3. The research question of whether nested entities
would be linked more e‌ectively when addressed jointly rather than individually remains unexplored.
The second challenge is the incompleteness of vocabular terminology in a target low-resource language,
e.g., Russian. NEREL-BIO’s data annotation protocol addresses the issue by linking the entities absent in
Russian UMLS to a concept that only has an English name. Although being well-aligned with the
realworld terminology incompleteness scenario, cross-lingual annotation causes extensive normalization
dictionary growth.
        </p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Problems of Annotating Texts with UMLS concepts</title>
          <p>There are some problems in annotating Russian and English texts with UMLS concepts.</p>
          <p>
            1) Lack of Russian translations in UMLS. When linking Russian texts with UMLS, there is a
serious problem of the absence of many Russian terms in UMLS concept variants: The Russian part of
UMLS includes only Russian translations for 1. 96% of the English UMLS concepts [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. In such cases,
the annotators tried to identify an appropriate UMLS concept even when a Russian translation was not
available, which could require signicant e‌ort. For example, Russian translations of well-known
singleword terms such as Complication (C0009566), Cirrhosis (C0023890), Bone (C0023890) are currently
absent in UMLS.
          </p>
          <p>
            In dicult cases, direct translation of a Russian medical term does not give a correct English term. To
nd a correct link to the UMLS concept, annotators should search for an appropriate English translation
using various sources of information, including Latin terms for anatomical structures, Wikipedia pages
in Russian and English, and even Russian scientic papers with English abstract and keywords [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ].
          </p>
          <p>2) Diculties with assigning adjectives in UMLS. In domain-specic texts, adjectives can express
some concepts. Therefore, in our detailed, nested annotation, adjectives should also be linked to UMLS
concepts. However, in some cases, there can be two di‌erent concepts for nouns and adjectives with
the same denotation, such as lung (C0024109) and pulmonary (C2709248). For veins, there are three
relevant concepts: noun vein is mentioned in the C0042449 concept, and adjective venous is appeared
in two concepts: C0042449 and C0348013 in UMLS. In some other cases, concept-related adjectives are
absent in UMLS. For example, none of the Russian or English adjectives for the noun nitrogen (nitric,
nitrous) are included in UMLS.</p>
          <p>
            3) Ambiguity of terms in UMLS. Some non-ambiguous medical terms are assigned to several
concepts in UMLS (see also [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]). For example, the term cognitive disorders is mentioned in C0009241
(Cognition disorders) and C0338656 (Impaired cognition) concepts. The term thrombin is assigned as a
synonym to both concepts Thrombin (C0040018) and Thrombin test (C0863178), et al.
          </p>
          <p>All the above-mentioned diculties of manual linking to UMLS concepts also cause problems in
automatic linking.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Di‌erences in Annotations of English and Russian Texts</title>
          <p>As we mainly dealt with English and Russian parallel texts, we can describe sources of di‌erent
annotations in English and Russian.</p>
          <p>1) The same concept is expressed in one language by a single word and in another language by a
phrase. For example, the term blood ow is expressed as a single word in Russian, which means that in
English texts two nested entities (blood and blood ow) are annotated, but only a single entity is linked
to UMLS in Russian. Single-word term brain has corresponding two-word Russian phrase (головной
мозг).</p>
          <p>2) A single word in one language corresponds to a multi-word term in another language. For example,
English term reductase exists only as a root in Russian. This leads to annotation of three entities and
links in English for the term Glutathione Reductase and only a single link for its Russian translation
(Глутатиоредуктаза).</p>
          <p>3) Di‌erences in syntactic structure and word order across languages lead to variations in the
nestedness of multi-word terms. For example, in English term Lower limb deep vein the following UMLS
concepts were revealed: vein (C0042449), deep vein (C0226514), limb (C0015385), lower limb (C0023216).
In the Russian translation, additional Russian term (глубокие вены нижних конечностей – deep
veins of lower limbs) and additional concept C0226813 (Structure of vein of lower extremity) can also
be identied and linked to the corresponding UMLS concept.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Evaluation metrics</title>
        <p>
          Following prior research on entity linking [
          <xref ref-type="bibr" rid="ref10 ref5 ref6 ref9">10, 6, 9, 5</xref>
          ], we address BioNNE-L as a retrieval task: given a
mention, a model must retrieve the top-k concepts from the given UMLS dictionary and employ two
ranking-based evaluation metrics: (i) Accuracy@k and (ii) Mean Reciprocal Rank ( ). Accuracy@k:
Accuracy@k=1 if the correct UMLS CUI is retrieved at rank ≤ , and Accuracy@k=0 otherwise.
  = |1| ∑︀∈ 1 , where  is the set of entities, || is the number of entities,  is the rank
of entity ’s the rst correctly retrieved concept among the top  retrieved concepts.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline</title>
        <p>
          As a baseline, we adopt zero-shot ranking with each entity type processed independently to reduce the
memory footprint caused by an extensive dictionary. Both input entities and normalization dictionary
concepts are encoded with a BERGAMOT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. BERGAMOT adopts the power of BERT [15] and graph
neural networks to capture both inter-concept and intra-concept interactions from the multilingual
dstepakov
ICUE
NLPIMP
        </p>
        <p>Bilingual,RU
Bilingual,EN,RU</p>
        <p>Bilingual
UMLS graph. This model utilizes contrastive loss on textual and graph concept representations from
UMLS to make them less sensitive to surface forms and enable intermodal knowledge exchange. For
each entity, we rank all dictionary entries based on their dot product with the entity’s embedding
obtained from the BERGAMOT checkpoint4 with [CLS] pooling. Finally, dictionary entries with the
highest scores are retrieved as matching UMLS concepts.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. O‌icial Results</title>
        <p>
          In total, we’ve received 23 Codalab registrations for the BioNNE-L task, with 7 teams submitting
predictions during the evaluation phase. The systems submitted by the participants are summarized
in Table 3. Most of the participants reported systems based on domain-specic biomedical BERT
models [15], such as SapBERT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], BERGAMOT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], BiomedBERT [16].
        </p>
        <p>
          Team verbanexialab [17] leveraged a SapBERT [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], pre-trained on UMLS concepts, to obtain entity
embeddings, followed by a multicomponent re-ranking. They combined embedding cosine similarity
with Jaccard similarity for lexical overlap recognition and Levenshtein distance for character-level
alignment.
        </p>
        <p>
          Team LYX_DMIIP_FDU [18] ne-tuned a BERGAMOT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] model for each task via contrastive
learning using the train- and dev-set entities to enrich the original vocabularies. The textual context of
each entity was used as additional input to enhance the entity representation.
        </p>
        <p>Team BlancaPlanca [19] used BERGAMOT for zero-shot retrieval based on entity-concept cosine
similarity. They apply language-specic lemmatization for Russian and speed up the inference by
chucking the normalization dictionary into type-specic parts of 100k entries each.</p>
        <p>
          Team MSM Lab [20] adopted two-step retrieval and ranking pipeline. For English, they employ
English SapBERT5 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and BioMedBERT6 [16] as retrieval and ranking models, respectively. For Russian
and multilingual subtasks, they use multilingual SapBERT7 [21] for both components.
        </p>
        <p>Team dstepakov performed the nearest-neighbor search based on the cosine similarity of RoBERTa
embeddings [22], ne-tuned contrastively on anchor-positive-negative term triplets via the InfoNCE
objective [23].</p>
        <p>Team ICUE [24] ne-tuned BioSyn [25] using the vocabularies reduced to less than 100k entries
each. They ne-tune a separate BERT-based model [26] for English [27], Russian8, and multilingual [28]
subtasks, respectively. They re-ranked the initial retrieval results with DeepSeek-R1-Distill-Llama-8B9.</p>
        <p>Team NLPIMP performed the zero-shot ranking using a Russian LaBSE [29] model10 pre-trained
contrastively on an in-house Russian medical corpus.
4https://huggingface.co/andorei/BERGAMOT-multilingual-GAT
5https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext
6https://huggingface.co/microsofi/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
7https://huggingface.co/cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR-large
8https://huggingface.co/KoichiYasuoka/bert-base-russian-upos
9https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
10https://huggingface.co/sergeyzh/LaBSE-ru-turbo
0.70
0.66
0.64
0.64
—
0.51
0.57
—
0.80
0.84
0.83
0.82
—
0.79
0.78
—</p>
        <p>MRR
—
0.71
0.72
0.65
0.70
0.62
0.52
—</p>
        <p>Russian
@5</p>
        <p>The ocial evaluation results, ordered by Accuracy@1 value, for BioNNE-L are summarized in Table 4.
Team LYX_DMIIP_FDU ranked rst in the multilingual track and second in the two monolingual subtasks
by ne-tuning BERGAMOT. Top 1 results for the Russian and English data are achieved by multilingual
BERGAMOT (Team BlancaPlanca) and English SapBERT (Team verbanexialab) models, respectively.
Despite using LLM-based re-ranking, Team ICUE did not surpass BERT-only systems.</p>
        <p>Overall, the results show that performance on the multilingual track is consistently lower than on
the monolingual English or Russian tracks. Most teams have a drop of about 5 to 10 percentage points
in Accuracy@1 when moving from monolingual to multilingual settings. This may highlights that
cross-lingual biomedical entity normalization remains more challenging than working within a single
language, likely due to di‌erences in terminology, translation ambiguities, and vocabulary coverage.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presents an overview of the ocial evaluation results for the BioNNE-L shared task on
biomedical nested entity linking. The evaluation was organized into three tracks: English, Russian,
and bilingual, and aimed at normalization of disorders, chemicals, and anatomical structure mentions
to the UMLS vocabulary. The best results were achieved by BERT-based normalization approaches.
Top-performing systems for bilingual and Russian tracks adopted multilingual BERGAMOT which is a
BERT model pre-trained on textual and graph data from the UMLS metathesaurus. The best English
system re-ranked SapBERT’s retrieval results through lexical and character-level similarity scores. In
general, the evaluation results have proven the e‌ectiveness of compact domain-specic encoders for
nested entity linking.</p>
      <p>Future work should focus on addressing the critical gaps identied in this shared task. This includes
expanding cross-lingual terminology for Russian UMLS by utilizing semi-automated pipelines that
leverage machine translation of English UMLS entries, which can be validated by human experts or
LLMs. Additionally, mining Russian clinical literature and utilizing resources like Wikidata will enhance
this process. Furthermore, employing joint modeling of nested entity hierarchies through graph-based
architectures, such as Graph Neural Networks (GNNs), could help propagate contextual constraints
between parent and child entities, thereby resolving ambiguities.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been supported by the Russian Science Foundation grant # 23-11-00358. We would like
to thank all the participating teams who contributed to the success of the shared task through their
interesting approaches and experiments.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: Grammar and spelling
check and Improve writing style. Afier using this tool, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content.
twelfth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering,
in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. Maria Di Nunzio, P. Galuščáková,
A. García Seco de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF
Association (CLEF 2024), 2024.
[13] O. Bodenreider, The unified medical language system (UMLS): integrating biomedical terminology,
Nucleic Acids Res. 32 (2004) 267–270. URL: https://doi.org/10.1093/nar/gkh061. doi:10.1093/
NAR/GKH061.
[14] P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, J. Tsujii, brat: a web-based tool for
NLP-assisted text annotation, in: Proceedings of the Demonstrations Session at EACL 2012,
Association for Computational Linguistics, Avignon, France, 2012.
[15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, NAACL HLT 2019 - 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings
of the Conference 1 (2018) 4171–4186. URL: http://arxiv.org/abs/1810.04805. arXiv:1810.04805.
[16] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon,
Domainspecific language model pretraining for biomedical natural language processing, ACM Transactions
on Computing for Healthcare (HEALTH) 3 (2021) 1–23.
[17] D. Peña Gnecco, J. Serrano, E. Puertas, J. C. Martínez-Santos, Hybrid Re-ranking for Biomedical
Entity Linking using SapBERT Embeddings: A High-Performance System for BioNNE-L 2025-1,
in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[18] Y. Liu, LYX_DMIIP_FDU at BioASQ 2025: Utilizing BERT embeddings for biomedical text mining,
in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[19] A. Burlova, Navigating Partial UMLS Terminology: GAT Embeddings and Confidence Analysis
for Multilingual Concept Linking, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025
Working Notes, 2025.
[20] C. Li, X. Zheng, S. Liu, BIBERT on Biomedical Nested Named Entity Linking at BioASQ 2025, in:</p>
      <p>G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), CLEF 2025 Working Notes, 2025.
[21] F. Liu, I. Vulić, A. Korhonen, N. Collier, Learning domain-specialised representations for
crosslingual biomedical entity linking, in: Proceedings of the 59th Annual Meeting of the Association
for Computational Linguistics and the 11th International Joint Conference on Natural Language
Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online, 2021, pp.
565–574. URL: https://aclanthology.org/2021.acl-short.72/. doi:10.18653/v1/2021.acl-short.
72.
[22] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, CoRR
abs/1911.02116 (2019). URL: http://arxiv.org/abs/1911.02116. arXiv:1911.02116.
[23] A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding,</p>
      <p>ArXiv abs/1807.03748 (2018). URL: https://api.semanticscholar.org/CorpusID:49670925.
[24] A. D. Lain, C. Lee, S. E. Doneva, M. J. Rodríguez-Cubillos, E. Castagnari, T. I. Simpson, , J. M. Posma,
Multilingual and Nested Biomedical Named Entity Normalisation via Candidate Retrieval and
Lightweight Large Language Model Disambiguation, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), CLEF 2025 Working Notes, 2025.
[25] M. Sung, H. Jeon, J. Lee, J. Kang, Biomedical entity representations with synonym marginalization,
in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,
Association for Computational Linguistics, Online, 2020, pp. 3641–3650. URL: https://aclanthology.
org/2020.acl-main.335/. doi:10.18653/v1/2020.acl-main.335.
[26] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers
for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv.org/abs/1810.04805.
arXiv:1810.04805.
[27] I. Beltagy, K. Lo, A. Cohan, Scibert: Pretrained language model for scientific text, in: EMNLP, 2019.</p>
      <p>arXiv:arXiv:1903.10676.
[28] S. Tedeschi, V. Maiorca, N. Campolungo, F. Cecconi, R. Navigli, WikiNEuRal: Combined neural
and knowledge-based silver data creation for multilingual NER, in: Findings of the Association for
Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana,
Dominican Republic, 2021, pp. 2521–2533. URL: https://aclanthology.org/2021.findings-emnlp.215/.
doi:10.18653/v1/2021.findings-emnlp.215.
[29] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, W. Wang, Language-agnostic BERT sentence embedding,
in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
(Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 878–
891. URL: https://aclanthology.org/2022.acl-long.62/. doi:10.18653/v1/2022.acl-long.62.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ming</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>An, Few-shot nested named entity recognition</article-title>
          ,
          <source>arXiv preprint arXiv:2212.00968</source>
          (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2212.00968.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Artemova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zmeev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          , I. Rozhkov,
          <string-name>
            <given-names>T.</given-names>
            <surname>Batura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Tutubalina,</surname>
          </string-name>
          <article-title>Runne2022 shared task: Recognizing nested named entities</article-title>
          ,
          <source>Komp'juternaja Lingvistika i Intellektual'nye Tehnologii</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <fpage>33</fpage>
          -
          <lpage>41</lpage>
          . doi:
          <volume>10</volume>
          .28995/2075-7182-2022-21-33-41.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          , E. Artemova,
          <string-name>
            <given-names>T.</given-names>
            <surname>Batura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Braslavski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manandhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pugachev</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rozhkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          , et al.,
          <article-title>Nerel: a russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links, Language Resources and Evaluation (</article-title>
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Manandhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Baral</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Rozhkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Braslavski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Batura</surname>
          </string-name>
          , E. Tutubalina,
          <article-title>NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities</article-title>
          ,
          <string-name>
            <surname>Bioinformatics</surname>
          </string-name>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1093/bioinformatics/btad161. doi:
          <volume>10</volume>
          .1093/ bioinformatics/btad161,
          <fpage>btad161</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Semenova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadurin</surname>
          </string-name>
          , E. Tutubalina,
          <article-title>Biomedical entity representation with graph-augmented multi-objective transformer, in: Findings of the Association for Computational Linguistics: NAACL 2024, Association for Computational Linguistics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>4626</fpage>
          -
          <lpage>4643</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .findings-naacl.
          <volume>288</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . findings-naacl.
          <volume>288</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shareghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Basaldella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <article-title>Self-alignment pretraining for biomedical entity representations, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>4228</fpage>
          -
          <lpage>4238</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . naacl-main.
          <volume>334</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>334</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alekseev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Miftahutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kokh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nesterov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Avetisian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chertok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nikolenko</surname>
          </string-name>
          ,
          <article-title>Medical crossing: a cross-lingual evaluation of clinical entity linking</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>4212</fpage>
          -
          <lpage>4220</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .447/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nesterov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zubkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Miftahutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kokh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tutubalina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alekseev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Avetisian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chertok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nikolenko</surname>
          </string-name>
          ,
          <article-title>Ruccon: clinical concept normalization in russian, in: Findings of the Association for Computational Linguistics: ACL</article-title>
          <year>2022</year>
          ,
          <year>2022</year>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Semenova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kadurin</surname>
          </string-name>
          , E. Tutubalina,
          <article-title>Graph-enriched biomedical entity representation transformer, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction, Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhovskiy</surname>
          </string-name>
          , E. Tutubalina,
          <article-title>Biomedical concept normalization over nested entities with partial UMLS terminology in Russian</article-title>
          ,
          <source>in: Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>2383</fpage>
          -
          <lpage>2389</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . lrec-main.
          <volume>213</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Davydova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          , E. Tutubalina,
          <source>Overview of BioNNE Task on Biomedical Nested Named Entity Recognition at BioASQ</source>
          <year>2024</year>
          , in: CLEF Working Notes,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nentidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Katsimpras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krithara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Loukachevitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Davydova</surname>
          </string-name>
          , E. Tutubalina, G. Paliouras,
          <source>Overview of BioASQ</source>
          <year>2024</year>
          : The
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>