=Paper=
{{Paper
|id=Vol-3603/Paper9
|storemode=property
|title=Leveraging Biomedical Ontologies to Boost Performance of BERT-Based Models for
Answering Medical MCQs
|pdfUrl=https://ceur-ws.org/Vol-3603/Paper9.pdf
|volume=Vol-3603
|authors=Sahil,P Sreenivasa Kumar
|dblpUrl=https://dblp.org/rec/conf/icbo/SahilK23
}}
==Leveraging Biomedical Ontologies to Boost Performance of BERT-Based Models for
Answering Medical MCQs==
Leveraging Biomedical Ontologies to Boost Performance of BERT-Based Models for Answering Medical MCQs Sahil Sahil1,∗ , P Sreenivasa Kumar1 1 Department of Computer Science & Engineering, Indian Institute of Technology Madras, Chennai Abstract Large-scale pretrained language models like BERT have shown promising results in various natural lan- guage processing tasks. However, these models do not benefit from the rich knowledge available in domain ontologies. In this work, we propose BioOntoBERT, a BERT-based model pretrained on mul- tiple biomedical ontologies. We also introduce the Onto2Sen system to process various ontologies to generate lexical documents, such as entity names, synonyms and definitions, and concept relationship documents. We then incorporate these knowledge-rich documents during pretraining to enhance the model’s “understanding” of the biomedical concepts. We evaluate our model on the MedMCQA dataset, a multiple-choice question-answering benchmark for the medical domain. Our experiments show that BioOntoBERT outperforms the baseline model BERT, SciBERT, BioBERT and PubMedBERT. BioOnto- BERT achieves this performance improvement by incorporating only 158MB of ontology-generated data on top of the BERT model during pretraining, just 0.75% of data used in pretraining PubMedBERT. Our results demonstrate the effectiveness of incorporating biomedical ontologies in pretraining language models for the medical domain. Keywords Biomedical Ontologies, BERT, Medical Multiple Choice Question Answering 1. Introduction Biomedical ontology research encompasses a variety of entities (from dictionaries of names for biological products to controlled vocabularies to principled knowledge structures) and pro- cesses (i.e., acquisition of ontological relations, integration of heterogeneous databases, use of ontologies for reasoning about biological knowledge) [1]. Biomedical ontologies include various aspects of medical terminologies such as symptoms, diagnosis and treatment. Multiple-choice question-answering (MCQA) is a challenging task in general and in partic- ular, in the domain of the medical field as the relevant knowledge is not commonly available in text corpora. The success of MCQA systems relies on striking a delicate balance between language understanding, domain-specific reasoning, and the incorporation of rich knowledge sources. In the medical domain, the use of ontology-based QA systems has a very good potential to effectively capture domain-specific knowledge and provide accurate responses to medical Proceedings of the International Conference on Biomedical Ontologies 2023, August 28th-September 1st, 2023, Brasilia, Brazil ∗ Corresponding author. £ cs20s017@cse.iitm.ac.in (S. Sahil); psk@cse.iitm.ac.in (P. S. Kumar) Ȉ 0009-0004-0167-5621 (S. Sahil); 0000-0003-2283-7728 (P. S. Kumar) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 95 queries. By harnessing biomedical ontologies, these systems can depict intricate relationships among medical concepts, resulting in more precise and contextually aware answers. Ontology-based multiple-choice question-answering systems are few in number, but Ontology-based QA systems have shown promise in capturing domain-specific knowledge and accurately answering medical questions [2] [3]. By leveraging biomedical ontologies, these sys- tems can represent complex relationships between medical concepts, enabling more precise and contextually aware responses. A major limitation is that using these systems requires an un- derstanding of the ontology structure in order to formulate queries. For example, queries may necessitate using intermediate concepts in the ontology when there is no direct relationship between the concepts in question. Contextual word embedding models, such as BERT (Bidirectional Encoder Representations from Transformers) [4] have achieved state-of-the-art results in many NLP tasks. Initially tested in a general domain, models such as BioBERT[5], UmlsBERT [6], SciBERT [7], and Pub- medBERT [8], have also been successfully applied in the biomedical domain by pretraining them on biomedical corpora. However, current biomedical applications of transformer-based NLP models do not incorporate structured expert domain knowledge from a biomedical ontol- ogy into their embedding pretraining process. To illustrate the significance of biomedical ontology knowledge, let’s consider a scenario where a medical question pertains to a specific rare disease. While a pretrained language model trained on a vast corpus may have encountered related terms or phrases, it may lack the medical domain-specific knowledge required to provide accurate and nuanced answers. In contrast, a biomedical ontology encompasses structured and domain-specific knowledge, including relationships, hierarchies, and semantic information about medical concepts. By in- tegrating such ontology knowledge into our models, we can tap into a comprehensive and precise representation of medical domain knowledge, enabling more accurate and contextual- ized question-answering. In light of this research gap, our study aims to bridge the divide between ontology-based ap- proaches and deep learning models in the context of MCQA in the medical domain. Specifically, our objectives are: • To overcome the challenges of ontology injection, including the computational overhead and annotation burden associated with large biomedical ontologies. • To investigate techniques for integrating biomedical ontological knowledge with pre- trained BERT models in MCQA systems. In this paper, we present a novel approach that bridges the gap between ontology-based methods and pretrained language models, harnessing the strengths of both to enhance multiple- choice question-answering (MCQA) in the medical domain. Our contributions to this work can be summarized as follows: 1. Onto2Sen, a simple yet effective solution for Ontology Injection: We propose a unique solution called Onto2Sen system to generate a comprehensive ontology-backed sentence corpus, which serves as a valuable resource for enriching pretrained models with domain- specific knowledge. By incorporating this rich semantic information from biomedical do- main ontologies into the models, we anticipate enhancing their contextual understanding and reasoning abilities. 96 2. Introducing BioOntoBERT: We propose BioOntoBERT, a pretrained BERT model that leverages various Biomedical Ontologies using the Onto2Sen generated corpus. BioOn- toBERT surpasses several other biomedical BERT models, including PubmedBERT [8], SciBERT[7] and BioBERT [5], in terms of performance for multiple-choice question an- swering on the MedMCQA dataset. Furthermore, BioOntoBERT demonstrates remarkable performance with just 158MB of pre- training data, significantly reducing the computational cost and carbon footprint associated with larger models. This aspect makes our novel approach not only effective but also envi- ronmentally friendly, addressing the growing concerns regarding energy consumption in deep learning models and highlighting the power of knowledge. 2. Related Work Biomedical Multiple Choice Question Answering (MCQA) is a significant task in natural lan- guage processing. Various approaches have been proposed to improve the performance of MCQA systems by leveraging ontologies and pretrained language models. As mentioned earlier, Ontology-based MCQA models are relatively limited, while Ontology- based question-answering systems have shown promise in capturing domain-specific knowl- edge and providing accurate answers to medical questions. For instance, in the case of XMQAS proposed by Midhunlal et al.[9], the system utilized natural language processing techniques and ontology-based analysis to process medical queries and extract relevant information from medical documents. Other approaches, like the one presented by Kwon et al.[10] for stroke- related knowledge retrieval, employed SPARQL templates and medical knowledge QA query ontology to transform queries into executable SPARQL queries for retrieving medical knowl- edge. However, these approaches have limitations due to their reliance on a template-based approach, which may restrict the flexibility and adaptability of the system. In addition to ontology-based approaches, using pretrained models has significantly ad- vanced MCQA systems. One notable example is PubmedBERT [8], a variant of BERT designed explicitly for biomedical text comprehension. These pretrained models, including Pubmed- BERT, have showcased remarkable performance in capturing medical terminologies and com- prehending complex medical questions. Moreover, models like BioBERT [5], SciBERT[7], and UmlsBERT [6] have been finetuned for biomedical NLP tasks, exhibiting improved performance in various medical question-answering and information retrieval tasks. It is worth noting that these models are pretrained on extensive corpora, such as Pubmed abstracts entire medical dataset, which consists of over 3.1 billion words. Less amount of work has been done in using external knowledge with neural networks in the biomedical multiple choice question answering domain, whereas in other domains like common sense reasoning several different approaches have been investigated for leveraging external knowledge sources. Sap et al.[11] introduce the ATOMIC graph with 877k textual descriptions of inferential knowledge (e.g. if-then relation) to answer causal questions. Lv et al.[12] propose to extract evidence from both structured knowledge bases such as ConceptNet and Wikipedia text and conduct graph-based representation and inference for commonsense reasoning. 97 He et al.[13] proposed a training procedure to infuse disease knowledge and augment pre- trained BERT models. Their experiments demonstrated improved performance in consumer health question answering, medical language inference, and disease name recognition. This motivates us to leverage the strengths of ontology which excel at representing complex med- ical concepts and terminologies. By integrating ontology and BERT-based models, we aim to enhance the capabilities of our MCQA system and improve its accuracy and effectiveness in addressing biomedical questions. To bridge the gap between ontology-based approaches and deep learning models, the au- thors of [14] [15] [16] have explored techniques for ontology injection and infusing context. These approaches aim to enhance the models’ language understanding and domain-specific reasoning capabilities by injecting ontological information into the models by modifying or adding new BERT layers or mapping the concepts and relationships of the ontology to the data. However, these models face various challenges in processing and incorporating large biomedical ontologies. The computational overhead required to handle and integrate the vast knowledge in such ontologies can be significantly high. Moreover, the process of mapping the ontology with the dataset and preparing annotated data demands substantial time and labour resources. The manual effort required for this task can be burdensome, hindering the scalability and practicality of these approaches. 3. Biomedical Ontologies Biomedical ontologies play a critical role in the field of medicine by organizing and represent- ing knowledge related to diseases, genes, anatomical structures, and medical concepts. They establish a standardized framework that captures and integrates information, promoting data sharing, interoperability, and knowledge discovery. We now briefly describe the prominent biomedical ontologies we use for our model: 1. Disease Ontology (DO) [17] (v1.2): The Disease Ontology is a standardized ontology created to offer the biomedical community consistent, reusable, and sustainable descrip- tions of human disease terms, phenotype characteristics, and related medical vocabulary disease concepts. 2. Gene Ontology (GO) [18] (v2023-04-01): It is a widely used ontology that focuses on representing the functional attributes of genes and gene products across different species. GO encompasses three main domains: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). BP describes biological processes in which genes are in- volved, MF represents the molecular functions they perform, and CC defines their cellular locations. 3. Foundational Model of Anatomy Ontology (FMAO) [19] (v5.0.0): FMAO is an ontol- ogy that aims to represent human anatomy in a detailed and structured manner. FMAO provides a hierarchical organization of anatomical structures, capturing spatial relation- ships and functional associations between different body parts. 4. Precision Medicine Ontology [20] (v4.0): It is a comprehensive ontology that rep- resents medical concepts and their relationships in a standardized manner. Medicine Ontology covers various medical domains, including diseases, symptoms, treatments, di- agnostic procedures, and medical devices. 98 Table 1 Different Biomedical Ontologies used Ontology Scope Classes # Object Properties # Annotations # subClass FMAO Ontology Anatomy 104721 139 51 262548 Bioassay Ontology Pharmacology 904 17 34 981 Dental Ontology Dentistry 2745 62 28 6507 Gene Ontology Bioinformatics 84108 297 60 192606 Precision Medicine Ontology Medicine 76155 95 23 122760 Disease Ontology Pathology 11033 2 53 11063 Paediatrics Ontology Paediatrics 1771 - 8 1760 HPS Ontology Physiology 2920 86 34 3143 Mental Disease Ontology Psychiatry 879 41 102 940 5. Bioassay Ontology (BAO) [21] (v1.1): The BAO focuses on establishing common ref- erence metadata terms and definitions required for describing relevant information of low-and high-throughput drug and probe screening assays and results. 6. Dental Ontology [22] (v2016-06-27): It captures dental-related concepts and relation- ships, providing a standardized vocabulary for representing dental conditions, proce- dures, materials, and anatomical structures. It facilitates the integration of dental data and knowledge, supporting research, education, and clinical practice in dentistry. 7. Pediatrics Ontology (v2.0): Ontology focuses on representing pediatric healthcare- related concepts and their relationships. It covers various aspects of pediatric medicine, including diseases, developmental milestones, treatments, and interventions. 8. Human Physiology Simulation Ontology (HPSO) [23] (v1.1.1): HPSO captures the concepts and relationships related to the simulation and modelling of human physiology. It provides a standardized framework for representing physiological processes, organ interactions, and computational models. 9. Mental Disease Ontology (MDO) [24] (v2020-04-26): MDO represents mental disor- ders and related concepts. It offers a standardized vocabulary for categorizing and anno- tating mental diseases, symptoms, treatments, and diagnostic criteria. 4. Methodology In this section, we present our approach for pretraining and fine-tuning a BERT[4] model on biomedical ontologies for multiple-choice question answering on the MedMCQA dataset. Our approach involves several key steps: data preparation, pretraining on biomedical ontolo- gies, and fine-tuning the MedMCQA dataset. The code implementation is publicly available on GitHub1 . 4.1. Datasets 4.1.1. Multiple Choice Questions Dataset We use the MedMCQA dataset[25], which consists of 1,94,000 multiple-choice questions on around 2400 healthcare topics and 21 medical subjects from one of the toughest entrance exams conducted for medical graduates in India, i.e., AIIMS and NEET PG. The diversity of questions 1 https://github.com/sahillihas/BioOntoBERT 99 Table 2 Sample MCQA question from MedMCQA dataset with the correct answer as (A) Question: Dentigerous cyst is likely to cause which neoplasia? (A) Ameloblastoma (B) Adenocarcinoma (C) Fibrosarcoma (D) All of the above Figure 1: Proposed Onto2Sen Framework to generate BERT input corpus from the Ontologies in the MedMCQA makes it a challenging dataset containing many aspects of medical knowl- edge; Table 2 illustrates one such example. Another distinguishing factor of this dataset is its questions are created for and by human experts. The dataset has three parts: the training set of 1,82,822 questions, the validation set of 4183 and the test set comprising 6150 questions, with an average token length of 12.35, 13.91 and 9.68, respectively. The answer choices are provided in the ‘labels’ column, encoded as integers 0, 1, 2, and 3. The ground truth for the test set is not publicly available. Hence we will be analysing the results on the validation set. 4.1.2. Ontology-based Sentence Generation We propose a system called Onto2Sen to generate sentences from multiple ontologies curated from public resources mentioned in the previous section. It extracts concepts, annotations, and their properties from the ontology to form meaningful sentences. Onto2Sen preprocesses the ontologies and generates two types of sentences. The first type of sentence generated is from the subClass relationships. The second type of sentence is extracted from the relevant lexical annotation axioms in the ontology. In the example shown in Figure 1, the Class Hierarchy Relationship sentences will contain the subClass property in the Disease Ontology (DO) allowing us to identify specific disease classi- fications. For instance, we can state that ‘SPOAN syndrome is a neurodegenerative disease’ us- ing labels and identifiers in subClass relations. In addition, the transitive nature of the subclass properties is also utilized. Furthermore, Annotation Properties associated with diseases offer valuable insights into symptoms, synonyms and causal associations. For instance, we can de- scribe that “SPOAN syndrome has synonym Spastic paraplegia” using the ‘has_exact_synonym’ annotation property. 100 We then used a natural language processing tool, spaCy, for preprocessing the compiled docu- ments. We use these generated sentences as input to the model during pretraining to leverage the ontological knowledge. After a study of the ontologies mentioned in Section 3, we find that using annotation prop- erties and the class hierarchy for sentence generation is commonly applicable across all these ontologies and hence we adopt only these techniques for the present. 4.2. Pretraining Model Pretraining is a crucial aspect of the BERT (Bidirectional Encoder Representations from Trans- formers) [4] model, which has revolutionized the field of natural language processing. In the context of BERT, pretraining refers to the initial phase where the model is trained on vast amounts of unlabeled text data, such as web documents or books. During this pretraining phase, BERT learns to generate contextualized representations of words and capture intricate semantic relationships by leveraging the bidirectional nature of transformers. We propose a novel approach using Biomedical ontologies to pretrain the BERT model. As mentioned in the previous section, Onto2Sen can generate a corpus of meaningful sentences from different Biomedical ontologies. We use this generated corpus consisting of about 20M words which is a substantial volume of unlabeled text data related to the medical domain. The corpus was preprocessed and prepared for training, ensuring it was suitable for the subsequent steps. The BERT model’s pretraining phase involves two tasks: Masked Language Modelling (MLM) and Next Sentence Prediction. However, for our model, which incorporates biomed- ical ontologies, we focus on augmenting the Masked LM task and omit the Next Sentence Prediction task. In the Masked LM task, we masked out 15 per cent of tokens in a sentence, and the model is trained to predict the original tokens given the context of the surrounding words. This approach will help the semantic understanding of medical terminologies by directly injecting biomedical ontology concepts and properties into the input sequence. As a result, the model will recognise and better understand medical concepts and terminologies effectively. During the pretraining process, the BERT model was trained using the Adam optimizer, a widely adopted optimization algorithm for neural networks. The optimizer iteratively adjusted the model’s parameters to minimize a predefined loss function, optimizing its ability to capture language patterns. Additionally, a learning rate scheduler was employed to dynamically adjust the learning rate at specific intervals, facilitating improved convergence and optimization of the model. The scheduler strategy, such as linear or exponential decay, was carefully selected based on experimentation and optimization. These pretraining steps establish a well-built foundation for subsequent finetuning and pro- ficient utilization of the BioOntoBERT model across diverse downstream natural language pro- cessing tasks. 4.3. Finetuning BERT During the fine-tuning stage, we aim to train our BioOntoBERT model to accurately answer multiple-choice questions on the MedMCQA dataset without using any external context. 101 Figure 2: BioOntoBERT for multiple choice questions Each multiple-choice question in the MedMCQA dataset was concatenated with its answer options to form a single input sequence of the form as shown in Figure 2. Next, we performed tokenization on the dataset. Tokenization involves breaking down the questions and answers choices into smaller units called tokens, which the model can handle. This step ensures that the data is in a format suitable for the BioOntoBERT model to process. After the dataset is properly tokenized, we then train the BioOntoBERT model on this data. During training, the model learned from the dataset by adjusting its internal parameters to better capture the relationships between questions and answer choices. The goal was to enhance the model’s capacity to accurately choose the right answer when presented with a question. In this case, the labels were encoded in a one-hot format derived from integers. Throughout the training process, the model iteratively refined its understanding of the task by analyzing the patterns and context in the data. We carefully optimized the model’s performance by adjusting various parameters, such as the learning rate and the number of training epochs. Once the training was completed, we evaluated the performance of the finetuned BioOn- toBERT model using the validation dataset. This evaluation allowed us to measure how well the model performed on unseen data and provided valuable insights into its ability to answer multiple-choice questions accurately. During the fine-tuning process and subsequent evaluation of the BioOntoBERT model, a probability distribution is generated for each question’s answer choices. The output probability distribution is denoted by 𝑝1, 𝑝2, 𝑝3 and 𝑝4 as shown in Figure 2. We identify the most likely answer choice by choosing the index associated with the highest probability. 5. Results The main objective of this paper is to investigate the impact of incorporating biomedical on- tology into the pretraining process of BERT models for the task of medical multiple-choice question answering. To achieve this objective, we developed a new pretrained model, BioOn- 102 Table 3 Accuracy and additional corpus size for different models on the MedMCQA dataset [25]. Statistics for prior BERT models are taken from their publications. [4] [5] [7] [8] . Models Corpus Text Size Accuracy BERT Wiki + Books - 35% BioBERT PubMed 4.5B Words 38% SciBERT PMC + CS 3.2B words 39% PubmedBERT PubMed 3.1B words | 21GB 40% BioOntoBERT (proposed) Biomedical Ontologies 20M words | 158 MB 42.72% toBERT, that is pretrained on a combination of 9 biomedical ontologies. We evaluated the performance of BioOntoBERT on the MedMCQA dataset, which contains a set of challenging medical questions curated by medical experts and compared it to the performance of other pretrained models, such as PubMedBERT[26], SciBERT[7] and BioBERT[5]. We conducted the pretraining of our BioOntoBERT model using the BERT base architecture, pretrained on English Wikipedia and BooksCorpus for 1M steps. BioOntoBERT was pretrained for 200K steps. The pretraining process involved a batch size of 32 and a learning rate schedul- ing of 5e-5. The pretraining and finetuning were both performed on a Tesla V100-PCIE-32GB GPU, with a maximum sequence length of 128. The pretraining of BioOntoBERT on ontology- generated sentences took approximately 10 hours only, whereas the pretraining times for Pub- medBERT and BioBERT were reported as 5 days (120 hours) [8] and 10 days (240 hours) [5], respectively. For the finetuning process, a batch size of 32 and a learning rate of 1e-5 were selected. It took approximately 30 hours to complete the finetuning process due to the large size of the MedMCQA training data. BioOntoBERT outperformed the baseline BERT-base, achieving a minimum accuracy of 42.72% in 10 runs. Furthermore, BioOntoBERT also outperformed PubMedBERT, which is pre- trained on a huge corpus of biomedical text data. These results indicate that adding ontology data to the pretraining process can improve the performance of BERT models for medical ques- tion answering. The comparison of models in Table 3 highlights the significance of the relatively small amount of additional ontology data we used to enhance the performance of our model. This finding suggests that the biomedical ontology we injected into the model is highly informative and beneficial, unlike much of the data in other corpora, which may be considered irrelevant. During the evaluation, we also conducted a comparative analysis of the performance of BioOntoBERT, BERT, and PubmedBERT on various multiple-choice questions across differ- ent medical subjects. One evaluated question is in Table 2. Notably, BioOntoBERT correctly predicted the answer as (A) since the keywords ‘Ameloblastoma’, ‘Adenocarcinoma’, ‘Fibrosar- coma’ and ‘Neoplasia’ are present in the DOID ontology, BioOntoBERT model would have leveraged this knowledge. Whereas ‘Dentigerous cyst’ is not present in the DOID, Dentiger- ous cyst is a type of ‘Odontogenic Cyst’, and DOID contains a reference to ‘Odontogenic Ep- ithelium’. Odontogenic cysts and Odontogenic epithelium are closely related, as the former is derived from the remnants of the latter and forms as a result of abnormal developmental processes during tooth formation. In contrast, both BERT and PubmedBERT predicted the an- 103 Table 4 Subject-wise model comparison of PubMedBERT and BioOntoBERT on MedMCQA validation set of AIIMS MCQA. Statistics for PubMedBERT subject-wise are taken from [25] Subject Name PubMedBERT BioOntoBERT Ontology Used Anatomy 39% 41% 3 Biochemistry 49% 50% 3 Dental 36% 40% 3 ENT 52% 41% 7 Medicine 47% 48% 3 Microbiology 44% 40% 7 Pathology 46% 47% 3 Pharmacology 46% 42% 3 Physiology 56% 54% 3 Psychiatry 56% 50% 3 Radiology 31% 28% 7 swer as (D). This demonstrates an example instance of BioOntoBERT utilizing domain-specific knowledge. The results presented in Table 4 demonstrate that BioOntoBERT exhibited superior per- formance compared to PubmedBERT across various subjects during pretraining, particularly when ontology data was available. Subjects like Anatomy, Biochemistry, Dental, Medicine, and Pathology showed notable improvements by including ontology data. However, for sub- jects such as ENT, Microbiology, and Radiology, where no ontology was used in our experi- ments, the benefits were not as evident. Additionally, for Pharmacology, Physiology and Psy- chiatry, the subject ontologies were not comprehensive enough to contribute significantly to question-answering capabilities. These findings underscore the significance of incorporating subject-specific ontology information to enhance the model’s understanding and performance on domain-specific questions. Importantly, we also evaluated the impact of the size and complexity of ontologies on the per- formance of the models. Surprisingly, we observed that the size or the number of concepts and properties in the ontologies did not necessarily correlate with improved question-answering performance. This suggests that the relevance and quality of the ontology data are crucial fac- tors in enhancing the model’s understanding and reasoning capabilities rather than the sheer quantity of information. 6. Conclusions This study introduces the Onto2Sen system, which incorporates annotation-based and class- hierarchical sentences from ontologies to enhance the performance of a language model. It is the first instance of leveraging such knowledge in pretraining a language model for biomed- ical natural language processing tasks. The BioOntoBERT model, pretrained on biomedical ontologies, outperforms other models, including PubMedBERT, in multiple-choice question- answering tasks within the medical domain, effectively capturing medical terminologies. By achieving improved results with just 158MB of pretraining data, our approach not only en- hances performance but also significantly reduces computational costs, making it a more sus- tainable approach to model training. 104 7. Future work Firstly, the selection and incorporation of appropriate biomedical ontologies remain an ongoing challenge. While we employed several ontologies in our pretraining process, there are numer- ous other ontologies available that could potentially contribute to even better performance. Secondly, although BioOntoBERT exhibits impressive proficiency in language understanding and representation, it lacks advanced reasoning capabilities on ontologies. The model primar- ily captures contextual relationships and semantic information but does not possess explicit reasoning mechanisms to infer complex logical connections within ontologies. This limita- tion suggests avenues for future research, focusing on incorporating reasoning abilities into language models trained on biomedical ontologies. References [1] O. Bodenreider, A. Burgun, Biomedical ontologies, Medical Informatics: Knowledge Management and Data Mining in Biomedicine (2005) 211–236. [2] Q. Guo, M. Zhang, Question answering based on pervasive agent ontology and semantic web, Knowledge-Based Systems 22 (2009) 443–448. [3] A. Arbaaeen, A. Shah, Ontology-based approach to semantically enhanced question an- swering for closed domain: A review, Information 12 (2021) 200. [4] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [5] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics 36 (2020) 1234– 1240. [6] G. Michalopoulos, Y. Wang, H. Kaka, H. Chen, A. Wong, Umlsbert: Clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus, arXiv preprint arXiv:2010.10391 (2020). [7] I. Beltagy, K. Lo, A. Cohan, Scibert: A pretrained language model for scientific text, arXiv preprint arXiv:1903.10676 (2019). [8] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, H. Poon, Domain-specific language model pretraining for biomedical natural language processing, ACM Transactions on Computing for Healthcare (HEALTH) 3 (2021) 1–23. [9] M. Midhunlal, M. Gopika, Xmqas-an ontology based medical question answering system, International Journal of Advanced Research in Computer and Communication Engineer- ing 5 (2016) 929–932. [10] S. Kwon, J. Yu, S. Park, J.-A. Jun, C.-S. Pyo, Stroke medical ontology qa system for pro- cessing medical queries in natural language form, in: 2021 International Conference on Information and Communication Technology Convergence (ICTC), IEEE, 2021, pp. 1649– 1654. [11] M. Sap, R. Le Bras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, B. Roof, N. A. Smith, Y. Choi, Atomic: An atlas of machine commonsense for if-then reasoning, in: Proceedings of the AAAI conference on artificial intelligence, volume 33, 2019, pp. 3027–3035. [12] S. Lv, D. Guo, J. Xu, D. Tang, N. Duan, M. Gong, L. Shou, D. Jiang, G. Cao, S. Hu, Graph- 105 based reasoning over heterogeneous external knowledge for commonsense question an- swering, in: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 8449–8456. [13] Y. He, Z. Zhu, Y. Zhang, Q. Chen, J. Caverlee, Infusing disease knowledge into bert for health question answering, medical inference and disease name recognition, arXiv preprint arXiv:2010.03746 (2020). [14] T. R. Goodwin, D. Demner-Fushman, Enhancing question answering by injecting ontolog- ical knowledge through regularization, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, volume 2020, NIH Public Access, 2020, p. 56. [15] L. He, S. Zheng, T. Yang, F. Zhang, Klmo: Knowledge graph enhanced pretrained language model with fine-grained relationships, in: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 4536–4542. [16] K. Faldu, A. Sheth, P. Kikani, H. Akbari, Ki-bert: Infusing knowledge context for better language and domain understanding, arXiv preprint arXiv:2104.08145 (2021). [17] L. M. Schriml, C. Arze, S. Nadendla, Y.-W. W. Chang, M. Mazaitis, V. Felix, G. Feng, W. A. Kibbe, Disease ontology: a backbone for disease semantic integration, Nucleic acids research 40 (2012) D940–D946. [18] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, et al., Gene ontology: tool for the unification of biology, Nature genetics 25 (2000) 25–29. [19] C. Rosse, J. L. Mejino Jr, The foundational model of anatomy ontology, in: Anatomy ontologies for bioinformatics: principles and practice, Springer, 2008, pp. 59–117. [20] L. Hou, M. Wu, H. Y. Kang, S. Zheng, L. Shen, Q. Qian, J. Li, Pmo: A knowledge represen- tation model towards precision medicine, Math. Biosci. Eng 17 (2020) 4098–4114. [21] U. Visser, S. Abeyruwan, U. Vempati, R. P. Smith, V. Lemmon, S. C. Schürer, Bioassay on- tology (bao): a semantic description of bioassays and high-throughput screening results, BMC bioinformatics 12 (2011) 1–16. [22] W. D. Duncan, T. Thyvalikakath, M. Haendel, C. Torniai, P. Hernandez, M. Song, A. Acharya, D. J. Caplan, T. Schleyer, A. Ruttenberg, Structuring, reuse and analysis of electronic dental data using the oral health and disease ontology, Journal of Biomedical Semantics 11 (2020) 1–19. [23] M. Gündel, E. Younesi, A. Malhotra, J. Wang, H. Li, B. Zhang, B. de Bono, H.-T. Mevissen, M. Hofmann-Apitius, Hupson: the human physiology simulation ontology, Journal of biomedical semantics 4 (2013) 1–9. [24] J. Hastings, W. Ceusters, M. Jensen, K. Mulligan, B. Smith, Representing mental function- ing: Ontologies for mental health and disease (2012). [25] A. Pal, L. K. Umapathi, M. Sankarasubbu, Medmcqa: A large-scale multi-subject multi- choice dataset for medical domain question answering, in: Conference on Health, Infer- ence, and Learning, PMLR, 2022, pp. 248–260. [26] Y. Peng, S. Yan, Z. Lu, Transfer learning in biomedical natural language process- ing: an evaluation of bert and elmo on ten benchmarking datasets, arXiv preprint arXiv:1906.05474 (2019). 106