Nested Named Entity Recognition using Multilayer BERT-based Model Notebook for the BioASQ Lab at CLEF 2024 Hasin Rehana1,2,† , Benu Bansal2,3,† , Nur Bengisu Çam4 , Jie Zheng5 , Yongqun He5,6 , Arzucan Özgür4 and Junguk Hur2,* 1 School of Electrical Engineering & Computer Science, University of North Dakota, Grand Forks, North Dakota, 58202, USA 2 Department of Biomedical Sciences, School of Medicine and Health Sciences, University of North Dakota, Grand Forks, North Dakota, 58202, USA 3 Department of Biomedical Engineering, University of North Dakota, Grand Forks, North Dakota, 58202, USA 4 Department of Computer Engineering, Bogazici University, Istanbul, 34342, Turkey 5 Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, 48109, USA 6 Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109, USA Abstract In natural language processing, named entity recognition (NER) is a crucial task involving finding and categorizing text entities. The biomedical domain presents substantial hurdles due to the complex structure of the language and the existence of nested entities. This paper introduces an innovative method for Nested NER by utilizing a multilayer bidirectional encoder representation transformer (BERT)-based model, notably employing pretrained PubMedBERT. Our proposed model is designed to manage nested entities’ complexities effectively. We combined the robust contextual embeddings from PubMedBERT with a multilayer tagging process. This approach allowed the model to precisely differentiate between overlapping items, a frequent occurrence in biomedical literature. To assess the effectiveness of our Multilayer NER Model (MultilayerNERModel), we conducted thorough experiments on the BioNNE English Dataset, a dataset for a shared task of BioASQ competition. The findings suggest that employing a multilayer approach enhances the model’s ability to identify nested entities, resulting in the thorough detection of entities in biomedical texts. It earned the highest overall performance in English oriented track, with an F1 score of 67.30% and a macro F1 score of 56.36%. These results demonstrate the significant impact of utilizing a multilayer approach in Nested NER tasks, especially in the biomedical domain. The use of UMLS dictionaries, along with the MultilayerNERModel, further enhances the model’s performance in biomedical entity recognition. Keywords Named entity recognition (NER), Nested NER, Bidirectional encoder representation transformer (BERT), Natural language processing (NLP) 1. Introduction Identifying and classifying entities, including but not limited to medical terms, names of people, organizations, and locations, is a critical undertaking in NLP. Nested NER extends this challenge by requiring the identification of entities embedded within other entities, adding complexity, particularly in specialized domains such as biomedicine [1]. In biomedical text mining, accurate Nested NER systems are essential for extracting meaningful information from scientific literature, which is crucial for advancing research and clinical practice [2]. The biomedical nested named entity recognition (BioNNE) task [3] was introduced to address this need as part of the BioASQ Workshop at CLEF 2024 CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France * Corresponding author. † These authors contributed equally. $ hasin.rehana@und.edu (H. Rehana); benu.bansal@und.edu (B. Bansal); bengisu.cam@std.bogazici.edu.tr (N. B. Çam); jiezhen@med.umich.edu (J. Zheng); yongqunh@med.umich.edu (Y. He); arzucan.ozgur@bogazici.edu.tr (A. Özgür); junguk.hur@med.und.edu (J. Hur)  0000-0003-2992-6547 (H. Rehana); 0000-0002-2834-197X (B. Bansal); 0009-0003-7880-8042 (N. B. Çam); 0000-0002-2999-0103 (J. Zheng); 0000-0001-9189-9661 (Y. He); 0000-0001-8376-1056 (A. Özgür); 0000-0002-0736-2149 (J. Hur) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings [4]. This task focuses on developing and evaluating NER systems capable of handling nested entities within biomedical texts. The BioNNE task includes three tracks: English, Russian, and Bilingual. Our participation was primarily in Track 2 - English-track, though we also applied our approach to the other tracks. This English track required participants to develop a Nested NER model for English biomedical scientific abstracts. Participants could train any model architecture on any data provided by organizers to achieve the best performance, fostering innovation and applying diverse methods. Nested NER has been thoroughly investigated in the domain of NLP while biomedical Nested NER aims to identify entities such as proteins, genes, diseases, and drugs within biomedical literature [5]. Traditional methods, such as rule-based approaches and early machine learning models, have been gradually substituted with advanced techniques that employ deep learning and pretrained language models. One of the pioneering works in biomedical NER is the introduction of BioBERT [6], a variant of the BERT model, specifically pretrained on biomedical literature from PubMed [7, 8]. BioBERT demon- strated significant improvements over previous models in various biomedical NER tasks, highlighting the effectiveness of domain-specific pre-training [9]. PubMedBERT was developed as a domain-specific model following the success of BioBERT [10]. It was trained purely on PubMed abstracts and designed to enhance the precision and effectiveness of biological NLP activities by utilizing a larger and more targeted dataset. Nested NER addresses identifying entities embedded within other entities, which is a frequent occurrence in biomedical texts [11]. Conventional flat NER models lack the necessary capabilities to handle such intricacies. Various methods have been suggested to address the issue of Nested NER, such as layered models, span-based models, and sequence-to-sequence models [12]. GCSs caused a decrease in the serum level of soluble interleukin-2 receptor (sCD25) in both groups. CHEM ANATOMY CHEM CHEM CHEM CHEM CHEM CHEM CHEM FINDING Figure 1: A sentence from the BioASQ-BioNNE 2024 dataset containing nested entities. Figure 1 presents a sentence extracted from the BioASQ-BioNNE 2024 dataset, illustrating the concept of nested named entities within a biomedical context. It highlights how chemical entities like "interleukin", anatomical entities like "serum" and finding entities such as "decrease in serum level of soluble interleukin-2 receptor" can be nested within each other, showcasing the complexity of biomedical text that advanced NER systems must handle. To participate in this challenge, we utilized PubMedBERT, a language model pretrained on PubMed abstracts, which is particularly suited for biomedical text processing [10]. By leveraging PubMedBERT, we aimed to enhance the performance of our NER system in recognizing and classifying nested entities in the complex domain of biomedical literature. 2. Related Work MMBERT is a transformer-based model designed to improve the performance of biomedical NER by integrating multiple models [13]. It also uses the ERNIE-Health, a Chinese pretrained biomedical language model. While evaluating MMBERT, they used Chinese biomedical NER datasets. Other than BERT-based encoder models, GPT-based decoder models are also used for the biomedical domain. BioGPT [14] is a GPT-2-based biomedical language model that was pretrained on a large amount of PubMed articles. BioGPT has been evaluated on six different tasks. Another study investigates the benefits of fine-tuning GPT-3 for biomedical tasks such as NER, relation extraction, and question answering [15]. Their experiments on BC5CDR [16], CADEC [17] and ADE [18] showed that their fine-tuned GPT-3 models lagged behind the state-of-the-art models. Some studies have worked on creating datasets designed explicitly for NER tasks involving biomedical entities. NEREL-BIO [19] is a detailed dataset focusing on nested named entities in the biomedical domain. It comprises over 700 Russian and 100 English PubMed abstracts, annotated to capture complex biomedical information through nested entities. The NEREL-BIO has 17 specific biomedical entity types. The dataset was created as an extension of the general-domain NEREL dataset [20]; therefore, it is an excellent resource for cross-domain and cross-language benchmarks. Their experiments with BERT-based and sequence-based models showed that the performance depends on the type of the NER. Others introduced a novel bi-encoder framework to improve NER through contrastive learning [21]. Rather than treating NER as sequence labeling or span classification, the bi-encoder framework represents the problem as learning vector representations. By mapping candidate text spans and entity types into the same vector space, the model maximizes the similarity between an entity mentioned and its type while minimizing the similarity of the non-entity types. 3. Methodology 3.1. Dataset To evaluate the efficacy of our proposed method, we performed experiments using the BioASQ-BioNNE dataset, which was released in 2024 as a dataset for the shared task challenge of BioASQ [22, 23, 24]. This dataset includes 54 training abstracts, 50 development abstracts, and 500 testing abstracts from PubMed for the English track. The challenge organizers provided 716 training abstracts, 50 development abstracts, and 500 testing abstracts for the Russian track. The Bilingual track participants should use the dataset provided for both English and Russian. The dataset encompasses a total of eight different biomedical named entity classes. For this article, we mainly focused on the English track. Table 1 shows the detailed distribution of different entity types across the training and development sets of the dataset. "DISO" and "ANATOMY" entities are the most frequent term classes, indicating a focus on anatomical and disorder related information, whereas "DEVICE" entity is the least frequent, suggesting limited data on medical devices. Since the annotation files for the testing data have not been disclosed to participants, we do not have the class-wise distribution for the testing set. Table 1 Classwise distribution of the BioASQ-BioNNE 2024 Dataset Entity type Train Dev DISO 1,200 1,012 ANATOMY 911 897 CHEM 579 575 FINDING 456 348 PHYS 397 379 LABPROC 190 154 INJURY_POISONING 90 20 DEVICE 20 28 Total 3,843 3,413 Refer to Table 3 for details on each entity type. Each abstract in the original dataset is accompanied by a corresponding annotation file. We processed the dataset by splitting each abstract into sentences and mapping the corresponding annotations to these sentences. We implemented the BIO-tagging scheme, a well-known method for named entity recognition encoding. Tokens were encoded as "B-TYPE" for the beginning of an entity, "I-TYPE" for subsequent tokens of the same entity, and "O" for tokens that do not belong to any entity class. Given that the provided annotation files indicate up to six nested levels, we applied six levels of BIO-tagging. After processing, we shuffled the sentences and split the merged dataset into training and validation sets using an 80:20 ratio. After performing hyperparameter tuning, we combined the training and validation sets to utilize the entire dataset for training. 3.2. MultilayerModel The core of the method is the MultilayerNERModel, which is our deep learning architecture designed specifically for Nested NER tasks in biomedical texts. This model is built on the robust foundation of the pretrained PubMedBERT, a variant of BERT that has been pretrained on a large corpus of the biomedical literature, making it highly relevant and effective for domain-specific tasks. The overall architecture of our method is shown in Figure 2. The figure represents a comprehensive workflow for recognizing and tagging named entities in biomedical texts. It integrates data from the BioASQ-BioNNE dataset, applies nested BIO-tagging, utilizes a MultilayerNERModel, performs dictionary-based search using UMLS resources, and concludes with postprocessing and evaluation of the results. Abstracts Nested MultilayerNER Annotations BIO - Tagging Model BioASQ-BioNNE Dataset Dictionary- Eight UMLS Postprocessed based Dictionaries Output Search Evaluation Figure 2: Methodology. BIO-Tagging refers to "Beginning", "Inside", and "Outside" tagging. Base Model: The base of the model is the pretrained PubMedBERT [10], which provides contextual- ized word embeddings. We also tried state-of-the-art pretrained BERT and BioBERT as an alternative to PubMedBERT to compare the performance. It can be replaced by any other pretrained model that is compatible with the dataset domain and language. Classification Layers: A series of six classification layers were added on top of the base model. Each layer was designed to output a specific nested level of NER tags, with each linear layer taking the hidden states from PubMedBERT and mapping them to the required number of labels for that layer. Although the original number of classes in the BioASQ-BioNNE dataset is eight, to support our preprocessed BIO-tagged dataset, the total number of output classes for each classification layer is 17. This includes "B-Class" and "I-Class" for each of the eight original classes, as well as "O" class for the rest of the tokens in the sentences those do not belong to any entity class. To optimize the performance of our method, we conducted hyperparameter tuning and determined the optimal settings. We utilized the Adam optimizer, known for its efficient handling of sparse gradients and adaptability to different data structures [25]. The hyperparameter settings listed in Table 2 were applied uniformly across all experiments to ensure consistency and facilitate a robust evaluation of our approach. 3.3. UMLS Dictionaries To further enrich the NER process, we leveraged the Unified Medical Language System (UMLS) Metathe- saurus 2024AA for vocabulary expansion [26, 27]. We utilized the MRCONSO.RRF data file within UMLS Table 2 Hyperparameter Settings Hyperparameter Value Batch Size 64 Learning Rate 1𝑒−4 Epochs 40 Maximum Sequence Length 512 Optimizer Adam to extract relevant concepts and their child concepts based on the UMLS Semantic Group in Table 3, obtained from the NEREL-BIO GitHub repository (https://github.com/nerel-ds/NEREL-BIO/). This approach allowed us to broaden the model’s ability to recognize entities by incorporating synonyms and related terms. By integrating these expanded vocabularies into our Nested NER system, we aimed to enhance the identification and classification of biomedical entities, ultimately improving the robustness and accuracy of our model. Table 3 List of entities and corresponding UMLS semantic groups and concept, obtained from NEREL-BIO GitHub repository. Entity UMLS Semantic Concept Name Concept ID Entity Count Matched Type Group (UMLS) Entities (Abstracts) ANATOMY A1.2 Anatomical Body structure C1268086 586,178 3,063 structure CHEM A1.4.1 Chemical Chemical C0220806 977,918 784 Substance (organic or inorganic) DEVICE A1.3.1 Medical Medical devices C0025080 94,190 56 device DISO B2.2.1.2 Pathology C0677042 650,482 2,953 Pathologic processes Function FINDING A2.2 Finding Experimental C2825141 777,463 1,065 finding INJURY_ B2.3 3 Injury and Poisoning/Injury C0178314 140,659 145 POISON- Poisoning ING LABPROC B1.3.1.1 Medical C0679541 85,230 450 Laboratory screening and Procedure diagnostic method PHYS B2.2.1.1 Physiological C0031845 168,997 2,288 Physiologic processes Function We extracted the entities from the UMLS dictionaries that match the concept identifiers of our target entity types. These dictionaries served as comprehensive references for the various entities we aimed to identify. Subsequently, we applied these dictionaries to the test data, systematically matching the terms within the text. By capturing the positions of these terms in the test data, we generated an output file that listed the identified entities. We merged this output with the results from our MultilayerNERModel based on BERT, BioBERT, and PubMedBERT. This two-step approach, combining rule-based matching with advanced machine learning techniques, provided a robust mechanism for entity recognition, improving the overall quality and reliability of our extracted data. 3.4. Evaluation metrics We evaluated the model’s performance using precision, recall, F1 score, and macro F1 score with a predefined evaluation script at the competition server of BioASQ. These metrics provide a comprehensive view of the model’s ability to accurately identify and classify named entities in biomedical texts. The equation for precision, recall, F1 score, and macro F1 score are Eq. (1), (2), (3), and (4), respectively. 𝑇𝑃 Precision = (1) 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 Recall = (2) 𝑇𝑃 + 𝐹𝑁 Precision × Recall F1 Score = 2 × (3) Precision + Recall 𝑁 1 ∑︁ Macro F1 Score = F1 Score𝑖 (4) 𝑁 𝑖=1 Here, macro F1 score is the average F1 score across the eight entity classes. 4. Experimental Results For this experiment, we employed 6 NVIDIA Tesla V100 GPUs with 32GB of HBM2 VRAM each. The model training and evaluation were implemented using the PyTorch [28] library. To speed up training, the DataParallel class was used to leverage multiple GPUs simultaneously. The overall distribution of the classes of the dataset after BIO-tagging is illustrated in Figure 3. The bar chart shows the named entity tag distribution, where "I-FINDING" has the highest frequency, while "I-INJURY_ POISONING" and "I-DEVICE" have the lowest frequencies. The frequency of "B-DISO" is higher than "B-ANATOMY". The distribution of "O" is excluded from the figure as they are not the actual target class of this experiment. Figure 3: Entity type distribution across the training and development dataset. "B-" and "I-" in the named entity tags represent the BIO-tagging scheme for named entity recognition, with "B" for the beginning of an entity and "I" for subsequent tokens. Figure 4: Distribution of nested named entity across the training and development dataset. Figure 4 provides valuable insights into the behavior and performance of different levels of nested entity tags in recognizing various entity types from the dataset. Overall, the frequency of the "B" (beginning) and "I" (inside) part of "FINDING", "DISO", and "ANATOMY" entities are higher across all layers, whereas "DEVICE", "PHYS", and "LABPROC" entities show very little or no frequency. The nested entity level 6 hardly had any entity type other than "O". Again, the frequencies of "O" types are not included in the bar chart as they are not our original target. These distributions highlight the specialized capabilities of certain layers in identifying specific entity types, which can inform the selection of appropriate layers for targeted NER tasks. We evaluated the performance of various models, namely baseline BERT, BioBERT, PubMedBERT and a combination of UMLS knowledge-based dictionary adaptation. The results are summarized in Table 4, presenting the precision, recall, F1 score, and macro F1 score for each model. PubMedBERT-based MultilayerNERModel achieved the highest overall performance with an F1 score of 67.30% and a macro F1 score of 56.36%. The score demonstrates its effectiveness in capturing the nuances of biomedical texts. Augmenting the models with UMLS knowledge yielded mixed results. For instance, BioBERT-based MultilayerNERModel, along with UMLS dictionaries, showed an increase in recall (70.97%) compared to BioBERT alone (66.28%), suggesting that UMLS integration helps in better entity recognition coverage. Table 4 Performance Evaluation Model Precision (%) Recall (%) F1 Score (%) Macro F1 Score (%) BERT 53.94 59.32 56.50 44.49 BERT+UMLS Dictionaries 46.39 64.40 53.93 46.15 BioBERT 64.01 66.28 65.12 54.49 BioBERT+UMLS Dictionaries 53.58 70.97 61.06 53.58 PubMedBERT 66.45 68.18 67.30 56.36 PubMedBERT+UMLS Dictionaries 55.10 72.55 62.63 55.46 Values in bold indicate the highest performance score achieved amongst the models for each metric However, this came at the cost of reduced precision (53.58% vs. 64.01%). PubMedBERT-based MultilayerNERModel, along with UMLS dictionaries, also demonstrated an improvement in terms of highest recall (72.55%) over PubMedBERT-based MultilayerNERModel (68.18% ), but similar to BioBERT, it experienced a drop in precision (55.10% vs. 66.45%). Despite this, PubMedBERT- based MultilayerNERModel with UMLS achieved a competitive F1 score of 62.63% and a macro F1 score of 55.46%, indicating that the inclusion of UMLS provides additional benefits in recognizing more entities, albeit with some trade-offs in precision. We also evaluated our MultilayerNERModel for the Russian and Bilingual Nested NER tracks as well. For the Russian Nested NER track, we used the pretrained SBERT-Large-NLU-RU [29] as the base of our MultilayerNERModel, which is a pretrained BERT-based model specifically tailored for the Russian language. This choice was necessary because BERT, BioBERT, and PubMedBERT are designed for English text only. Our model achieved precision, recall, and F1 scores of 68.59%, 65.34%, and 66.93%, respectively, on the Russian BioNNE dataset. However, the macro F1 score was lower (60.07%) than the F1 score, likely due to potential class imbalance issues. For the Bilingual Nested NER track, we employed BERT-Base-Multilingual-uncased [30] as the base of our MultilayerNERModel, which is tailored to understand and represent 102 different languages. The precision, recall, and F1 scores were 60.27%, 57.5%, and 58.89%, with a macro F1 score of 50.53%. The performance of our MultilayerNERModel for the Bilingual Nested NER track is lower than the English and Russian Nested NER tracks. Employing a dictionary-based approach may improve the performance of our MultilayerNERModel for Russian and Bilingual tracks. 5. Conclusion Our study demonstrates that the domain-specific models, specifically BioBERT and PubMedBERT-based Nested NER models, significantly outperform the baseline BERT-based Nested NER model in terms of precision, recall, F1 score, and macro F1 score. This improvement underscores the advantage of using models pretrained on biomedical literature for Nested NER tasks within this specialized domain. Integrating UMLS dictionaries enhances recall, suggesting it helps recognize a broader range of entities. However, the reduction in precision indicates a need for further optimization to balance the trade-offs between precision and recall. Overall, PubMedBERT stands out as the most effective pretrained model as the base for our Multilay- erNERModel, with a promising potential for further enhancement through the strategic incorporation of external knowledge bases like UMLS. The implications of this research are substantial in the field of biomedical informatics. An improved Nested NER system can facilitate more effective information extraction, aiding researchers in uncov- ering complex relationships within biomedical literature. This advancement can, in turn, accelerate the discovery of novel insights and advancements in biomedical science. The performance of our MultilayerNERModel emphasizes the capability of multilayer BERT-based architectures in advancing NLP applications in the biomedical domain by addressing the challenges of Nested NER. 6. Acknowledgments The study was supported by the US National Institute of Allergy and Infectious Disease (U24AI171008 to Y.H. and J.H.). GEBIP Award of the Turkish Academy of Sciences (to A.Ö.) is gratefully acknowledged. Code Availability: https://github.com/hurlab/Nested-NER-BERT References [1] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (2020) 1234–1240. [2] S. Sivarajkumar, H. A. Mohammad, D. Oniani, K. Roberts, W. Hersh, H. Liu, D. He, S. Visweswaran, Y. Wang, Clinical information retrieval: A literature review, Journal of Healthcare Informatics Research (2024) 1–40. [3] V. Davydova, N. Loukachevitch, E. Tutubalina, Overview of bionne task on biomedical nested named entity recognition at bioasq 2024, 2024, pp. 1–10. [4] A. Nentidis, G. Katsimpras, A. Krithara, S. Lima-López, E. Farré-Maduell, M. Krallinger, N. Loukachevitch, V. Davydova, E. Tutubalina, G. Paliouras, Overview of BioASQ 2024: The twelfth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering, in: L. Goeuriot, P. Mulhem, G. Quénot, D. Schwab, L. Soulier, G. Maria Di Nunzio, P. Galuščáková, A. García Seco de Herrera, G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), 2024, pp. 1–10. [5] Z. Bi, S. A. Dip, D. Hajialigol, S. Kommu, H. Liu, M. Lu, X. Wang, AI for biomedicine in the era of large language models, arXiv preprint arXiv:2403.15673 (2024). [6] S. Masoumi, H. Amirkhani, N. Sadeghian, S. Shahraz, Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research, Systematic Reviews 13 (2024) 107. [7] Q. Jin, W. Kim, Q. Chen, D. C. Comeau, L. Yeganova, W. J. Wilbur, Z. Lu, MedCPT: Contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval, Bioinformatics 39 (2023) btad651. [8] A. Turchin, S. Masharsky, M. Zitnik, Comparison of BERT implementations for natural language processing of narrative medical documents, Informatics in Medicine Unlocked 36 (2023) 101139. [9] Y. Kim, J.-H. Kim, Y.-M. Kim, S. Song, H. J. Joo, Predicting medical specialty from text based on a domain-specific pre-trained BERT, International Journal of Medical Informatics 170 (2023) 104956. [10] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon, Domain- specific language model pretraining for biomedical natural language processing, ACM Transactions on Computing for Healthcare (HEALTH) 3 (2021) 1–23. [11] Q. Zheng, Y. Wu, G. Wang, Y. Chen, W. Wu, Z. Zhang, B. Shi, B. Dong, Exploring interactive and contrastive relations for nested named entity recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023). [12] X. Du, H. Zhao, D. Xing, Y. Jia, H. Zan, MRC-based nested medical NER with co-prediction and adaptive pre-training, arXiv preprint arXiv:2403.15800 (2024). [13] L. Fu, Z. Weng, J. Zhang, H. Xie, Y. Cao, MMBERT: a unified framework for biomedical named entity recognition, Medical & Biological Engineering & Computing 62 (2024-01-01) 327–341. URL: https://doi.org/10.1007/s11517-023-02934-8. doi:10.1007/s11517-023-02934-8. [14] R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, T.-Y. Liu, BioGPT: generative pre-trained transformer for biomedical text generation and mining, Briefings in bioinformatics 23 (2022) bbac409. [15] H. Bousselham, A. Mourhir, Fine-tuning GPT on biomedical NLP tasks: An empirical evaluation, in: 2024 International Conference on Computer, Electrical & Communication Engineering (ICCECE), IEEE, 2024, pp. 1–6. [16] J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C.-H. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C. Wiegers, Z. Lu, BioCreative v CDR task corpus: a resource for chemical disease relation extraction, Database 2016 (2016). [17] S. Karimi, A. Metke-Jimenez, M. Kemp, C. Wang, Cadec: A corpus of adverse drug event annota- tions, Journal of biomedical informatics 55 (2015) 73–81. [18] C. Hiba, E. H. Nfaoui, C. Loqman, Fine-tuning transformer models for adverse drug event identification and extraction in biomedical corpora: A comparative study, in: International Conference on Digital Technologies and Applications, Springer, 2024, pp. 957–966. [19] N. Loukachevitch, S. Manandhar, E. Baral, I. Rozhkov, P. Braslavski, V. Ivanov, T. Batura, E. Tu- tubalina, NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities, Bioinformatics 39 (2023) btad161. [20] N. Loukachevitch, E. Artemova, T. Batura, P. Braslavski, V. Ivanov, S. Manandhar, A. Pugachev, I. Rozhkov, A. Shelmanov, E. Tutubalina, NEREL: a russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links, Language Resources and Evaluation (2023) 1–37. [21] S. Zhang, H. Cheng, J. Gao, H. Poon, Optimizing bi-encoder for named entity recognition via contrastive learning, arXiv preprint arXiv:2208.14565 (2022). [22] A. Krithara, J. G. Mork, A. Nentidis, G. Paliouras, The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey, Frontiers in Research Metrics and Analytics 8 (2023). [23] A. Krithara, A. Nentidis, K. Bougiatiotis, G. Paliouras, BioASQ-QA: A manually curated corpus for biomedical question answering, Scientific Data 10 (2023) 170. [24] G. Tsatsaronis, G. Balikas, P. Malakasiotis, I. Partalas, M. Zschunke, M. R. Alvers, D. Weissenborn, A. Krithara, S. Petridis, D. Polychronopoulos, An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition, BMC bioinformatics 16 (2015) 1–28. [25] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017-01-29. URL: http://arxiv. org/abs/1412.6980. doi:10.48550/arXiv.1412.6980. arXiv:1412.6980 [cs]. [26] A. K. Chanda, T. Bai, Z. Yang, S. Vucetic, Improving medical term embeddings using UMLS metathesaurus, BMC Medical Informatics and Decision Making 22 (2022-04-29) 114. URL: https: //doi.org/10.1186/s12911-022-01850-5. doi:10.1186/s12911-022-01850-5. [27] V. Nguyen, H. Y. Yip, O. Bodenreider, Biomedical vocabulary alignment at scale in the UMLS metathesaurus, in: Proceedings of the Web Conference 2021, WWW ’21, As- sociation for Computing Machinery, 2021-06-03, pp. 2672–2683. URL: https://doi.org/10.1145/ 3442381.3450128. doi:10.1145/3442381.3450128. [28] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019). [29] A. Kabaev, S. Khaustov, N. Gorlova, A. Kalmykov, Bert for russian news clustering, Available at: https://doi. org/10.28995/2075-7182-2021-20-385-390 (2021). [30] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv.org/abs/1810.04805. arXiv:1810.04805.