Biomedical Relation Extraction via Domain Knowledge and Prompt Learning Jianyuan Yuan1,† , Wei Du1,† , Xiaoxia Liu2 and Yijia Zhang1,* 1 Dalian Maritime University, Dalian 116024, Liaoning, China 2 Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA Abstract Biomedical relation extraction plays a crucial role in extracting crucial biomedical information from extensive literature, thereby supporting disease treatment and the construction of biomedical knowledge bases. However, biomedical texts contain highly technical language and domain-specific terminology, which makes it difficult for models to fully understand their semantics. Furthermore, imbalances in the distribution of samples across different categories in biomedical datasets result in reduced classification accuracy for categories with limited training samples. In this study, we propose a biomedical relation extraction model based on domain knowledge and prompt learning. The prompt template guides the model to focus on key features and information, so that more knowledge can be obtained from limited data. Utilizing domain knowledge to acquire refined entity representations, thereby mitigating the challenges posed by technical language and domain-specific terminology. The model is evaluated on the DDI Extraction 2013 dataset and the ChemProt dataset, and the experimental results demonstrate that our model can achieve state-of-the-art performance. Keywords biomedical relation extraction, prompt learning, biomedical literature, domain knowledge 1. Introduction is inconsistent with the supervised classification operation of the downstream task, the model cannot fully apply its With the rapid development of the biomedical field, the prior knowledge to the downstream task. amount of biomedical literature has exploded, which con- We propose a biomedical relation extraction model based tains a wealth of biomedical information [1]. Biomedical on domain knowledge and prompt learning. Domain knowl- relation extraction is a natural language processing tech- edge can provide entities with richer feature representations, nology whose purpose is to extract the relation between which can better reflect the essence of entities and improve entities from biomedical text data [2]. This technology can the effect of entity representation. Prompt learning is a help researchers quickly extract important biomedical in- method that can effectively bridge the gap between pre- formation from literature, and provide important support training and fine-tuning on downstream tasks. The core for drug development and disease treatment [3]. idea of this method is to transform the traditional classifi- The highly technical language and domain-specific termi- cation task into a cloze problem. By designing a prompt nology used in biomedical texts complicates this task, and template, replace a word or a continuous short sentence traditional approaches often struggle to achieve high perfor- (usually represented by [MASK]) in the input text with the mance [4]. Moreover, there are differences in the number of corresponding label words, and ask the model to predict the samples of each category in the biomedical data set, result- label words. This approach makes the model need to con- ing in low classification accuracy for categories with fewer sider more contextual information when predicting, so as to training samples. Meanwhile, biomedical relation extraction better understand the semantics of the input text. Overall, usually requires a large amount of labeled data to effectively the contributions of this paper are as follows: train the model. However, due to the huge amount of data, the cost of manual labeling is very high, and how to obtain 1) We propose a biomedical relation extraction model more knowledge from limited data becomes very important based on prompt learning, which can guide the [5]. model to focus on key features and information by The application of pre-trained language models in constructing multiple task-related prompt. By in- biomedical texts has received widespread attention and ex- troducing prompt learning, more knowledge can be ploration [6]. Most of the current biomedical relation extrac- obtained from limited data, which effectively allevi- tion methods mainly rely on pre-trained language models. ates the problem of insufficient knowledge that the Although the pre-trained language model has the ability to model can learn when the amount of data is small. learn the general representation of language, there is a sig- 2) The model obtains detailed information of biomed- nificant difference between the pre-training target and the ical entities through domain knowledge and ob- downstream task fine-tuning, which has a very important tains enhanced entity representation. In addition, impact on the performance of the model in the downstream special tokens are embedded around entities, en- task. As shown in Fig.1, since the target of unsupervised pre- abling entities to better integrate domain knowledge, diction of the input text sequence of the pre-trained model thereby reducing the impact of high-tech language and domain-specific terminology in biomedical texts Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities on model performance. from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), 3) The model is experimented on the ChemProt dataset April 23 24, 2024, Changchun, China and Online and the DDI Extraction 2013 dataset. Experimental * Corresponding author. results demonstrate that the proposed model outper- † These authors contributed equally. forms existing methods and achieves state-of-the-art $ jianyuany@dlmu.edu.cn (J. Yuan); duwei@dlmu.edu.cn (W. Du); performance in biomedical relation extraction. xxliu@stanford.edu (X. Liu); zhangyijia@dlmu.edu.cn (Y. Zhang)  0000-0002-5843-4675 (Y. Zhang) © 2024 Copyright 2024 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 51 Figure 1: The instances of pre-training, fine-tuning, and prompt-tuning for relation extraction in the biomedical domain. 2. Related Work extract the interaction relation between drugs in biomedical texts. Zhang et al. [17] utilizes the shortest dependency path Recently, various neural network-based approaches have to determine the grammatical relations within a sentence, demonstrated commendable outcomes in diverse relation and extracts keywords located between two entities. extraction tasks and have been extensively employed in Peng et al. [18] proposed a multi-model approach that biomedical research. Liu et al. [7] utilized a convolutional combines a SVM, CNN, and RNN to improve the perfor- neural network (CNN) model for biomedical relation ex- mance of biomedical relation extraction. Sun et al. [19] traction, demonstrating its effectiveness in achieving high improved biomedical relation extraction by integrating at- performance. In this model, the words in the sentences of tention and ELMo representations with bidirectional LSTM the biomedical dataset serve as inputs to the CNN, which can networks. A neural model for extracting CPI was proposed effectively capture local features. Liu et al. [8] introduced a by Zhang et al. [20], which utilized depth context repre- model for biomedical relation extraction tasks,which is the sentation and a multi-head attention mechanism. Xiong et dependency convolutional neural network (DCNN) model. al. [21] presented a model that utilizes a combination of a By utilizing the dependency parse tree, the DCNN model Graph Convolutional Neural Network (GCNN) and a LSTM can effectively capture the interdependency between words. network for extracting biomedical relations. Park et al. [22] Sasaki et al. [9] applied an attention-based CNN model to utilized attention-based GCN for the task of biomedical re- biomedical relation extraction tasks. Each word in a biomed- lation extraction. ical sentence has a varying impact on the final classification Peng et al. [23] applied the BERT (Bidirectional Encoder outcome in relation extraction. Kavuluru et al. [10] pro- Representation of Transformer) model to the task of biomed- posed a method that employs recurrent neural networks ical relation extraction. Lee et al. [24] extended the BERT (RNNs) at the word and character levels to extract drug-drug model by training it on a large-scale biomedical corpus, re- interaction relations. Lim et al. [11] proposed a method us- sulting in the BioBERT model. Huang et al. [25] proposed ing recurrent neural networks to automatically extract drug an EMSI-BERT method for drug-drug interaction extrac- interactions in the literature. This method decomposes the tion. This method utilizes an asymmetric entity masking text into a syntax tree and uses RNN to recursively process strategy and a symbol insertion structure. Sun et al. [26] the tree structure to extract drug-drug interaction informa- proposed a model that uses a combination of Gaussian prob- tion. ability distribution and external biomedical knowledge to Sahu et al. [12] used Long Short-Term Memory Network extract CPI. Sun et al. [27] proposed a model( BERT Att cap- (LSTM) to automatically extract drug interaction informa- sule) that utilizes a BERT-based attention-guided capsule tion from biomedical texts. Mostafapour et al. [13] proposed network to extract CPI. This method uses attention mech- a model that uses Bi-directional Long-Short Term Memory anisms to guide the extraction of interactions and capsule (BiLSTM) to model context information in text sequences networks to capture the interactions’ semantic features. Liu and uses a hierarchical structure to consider different levels et al. [28] proposed a grammar-enhanced model and a cate- of semantic information. Wang et al.[14] used dependency gory keyword-based approach. The model uses graph-based parsing to model the relation between drugs in text and used grammar to build a syntactic tree and uses type keywords the LSTM network to capture contextual information in text to guide the model to extract specific types of relations. Su sequences. Huang et al. [15] employed a hybrid model con- et al. Su and Vijay-Shanker [6] explore the approaches to sisting of a support vector machine(SVM) and LSTM for improve the BERT model for relation extraction tasks in extracting drug interaction information. Zheng et al. [16] both the pre-training and fine-tuning stages. proposed a BiLSTM model with an attention mechanism to 52 Figure 2: The schematic overview of the proposed model. The black arrows indicates the input stream. 3. Method 3.3. Input Module For the biomedical relation extraction task, it is represented 3.1. Problem Definition as 𝑇 = 𝑋, 𝑌 , where 𝑋 represents the input text and 𝑌 rep- Given a sentence sequence 𝑆 = {𝑐1 ,𝑐2 ,. . . ,𝑐𝑛−1 ,𝑐𝑛 }, where resents the category label. The sentence in the biomedical 𝑐 is a word in sentence and n is the length of the sen- dataset is represented as 𝑥 = {𝑥1 ,. . . ,𝑒1 ,. . . 𝑒2 ,. . . ,𝑥𝑛 }, where tence. The subject entity 𝑒1 ={𝑐𝑖 ,. . . ,𝑐𝑗 } and the object entity 𝑒1 , 𝑒2 represents two biomedical entities, respectively. A 𝑒1 ={𝑐𝑥 ,. . . ,𝑐𝑦 } are located in the same sentence. Biomedical key part of prompt learning is to construct an appropriate relation extraction aims to identify the relation 𝑟 between 𝑒1 template P and label word V. M: Y → V is a mapping that and 𝑒2 , where 𝑟 is either selected from a predefined relation connects the task label with the label word V. set 𝑅 or 𝑁 𝐴. The model’s input comprises two components, specifi- cally the input text denoted as 𝑥 and the prompt template 3.2. Model Framework denoted as 𝑝(𝑥). The sentence is subjected to tokenization, and each token is encoded using a vector of d dimensions. Fig.2 shows the architecture of the biomedical relation ex- Moreover, an embedded "CLS" token is added at the begin- traction model based on domain knowledge and prompt ning of each sentence sequence. To denote the boundaries learning. The model consists of four modules: input module, of each biomedical entity, special symbols are introduced. encoding module, knowledge enhancement module, and The first entity is enclosed by "$" symbols on both sides, prompt learning module. We have designed three prompt while the second entity is enclosed by "#" symbols on both templates, namely the prompt for biomedical entity 𝑒1, the sides. prompt for biomedical entity relations, and the prompt for In addition to retaining the original input in 𝑥, multiple biomedical entity 𝑒2. Firstly, input biomedical text and [MASKs] need to be fed into the model. Three prompts prompt templates into the model for encoding. Then, the are designed in the input prompt template, respectively, enhanced entity representation is obtained through knowl- the prompt 𝑝𝑒1 (𝑥) corresponding to the biomedical entity edge enhancement. Finally, through the prompt module, the 𝑒1 , the prompt 𝑝𝑟 (𝑥) corresponding to the biomedical en- model can predict the label words at the [MASK] position tity relation and the prompt 𝑝𝑒2 (𝑥) corresponding to the and select their corresponding labels for classification. biomedical entity 𝑒2 . Denote the prompt template 𝑝(𝑥) cor- responding to the input text 𝑥 as: 53 𝐻𝑘 to 𝐻𝑚 in the model. The calculation formulas for these 𝑝(𝑥) = {𝑝𝑒1 (𝑥), 𝑝𝑟 (𝑥), 𝑝𝑒2 (𝑥)} (1) vectors are as follows: The prompt 𝑝𝑒1 (𝑥) corresponding to biomedical entity 𝑒1 and the prompt 𝑝𝑒2 (𝑥) corresponding to biomedical entity 𝑗 [︃ (︃ )︃]︃ 1 ∑︁ 𝑒2 can be formalized as follows: 𝐻1′ = 𝑊1 𝑡𝑎𝑛ℎ 𝐻𝑡 + 𝑏1 (6) 𝑗 − 𝑖 + 1 𝑡=𝑖 𝑝𝑒1 (𝑥) = {𝑥, 𝑡ℎ𝑒[𝑀 𝐴𝑆𝐾]𝑒1 } (2) [︃ (︃ 𝑚 )︃]︃ 1 ∑︁ 𝑝𝑒2 (𝑥) = {𝑥, 𝑡ℎ𝑒[𝑀 𝐴𝑆𝐾]𝑒2 } (3) 𝐻2′ = 𝑊2 𝑡𝑎𝑛ℎ 𝐻𝑡 + 𝑏2 (7) 𝑚−𝑘+1 𝑡=𝑘 Then, the prompt 𝑝𝑟 (𝑥) for the relation between biomed- ical entities is designed. For example, in the biomedical where 𝑊1 ∈ 𝑅𝑑×𝑑 , 𝑊2 ∈ 𝑅𝑑×𝑑 denote weight matrices. example sentence above, the relation type is CPR: 4, which 𝑏1 , 𝑏2 denote bias vectors. means that the relation between entity 𝑒1 and entity 𝑒2 is The semantic feature representation of domain knowl- "inhibition". The prompt template for the relation type is edge is acquired by the model using BioBERT. This vector is "𝑒1 [MASK] 𝑒2 ", and the prompt label word is "has curved then combined with entity interpretation information and the". Prompt 𝑝𝑟 (𝑥) for the relation corresponding to the the corresponding entity vector to generate an improved input text 𝑥 can be expressed as: vector representation of biomedical entities. When a sen- tence 𝑆𝑒 containing biomedical knowledge is successfully 𝑝𝑟 (𝑥) = {𝑥, 𝑒1 [𝑀 𝐴𝑆𝐾]𝑒2 } (4) matched with entity 𝑒1 , the final hidden layer vector 𝐻𝑒1 of "CLS" can be obtained from BioBERT. The acquired en- The complete input composition can be formalized as hanced representation is integrated into the model, with the follows: calculation formulas being as follows: 𝐼𝑛𝑝𝑢𝑡𝑥 = {𝑥, 𝑝(𝑥)} (5) 𝐻𝐸1 = 𝑊4 𝑐𝑜𝑛𝑐𝑎𝑡 𝐻1′ , 𝑊3 (𝑡𝑎𝑛ℎ (𝐻𝑒1 )) + 𝑏3 + 𝑏4 [︀ (︀ )︀]︀ 3.4. Encode Module (8) BioBERT is a pre-trained model based on BERT, which is suitable for natural language processing tasks of biomedical 𝐻𝐸2 = 𝑊6 𝑐𝑜𝑛𝑐𝑎𝑡 𝐻2′ , 𝑊5 (𝑡𝑎𝑛ℎ (𝐻𝑒2 )) + 𝑏5 + 𝑏6 [︀ (︀ )︀]︀ texts. The BioBERT model is trained using a large corpus in (9) the biomedical field, which can improve the text understand- where 𝑊3 , 𝑊4 , 𝑊5 , 𝑊6 denote weight matrices. 𝑏3 , 𝑏4 , 𝑏5 , ing and classification performance in the biomedical field, 𝑏6 denote bias vectors. making BioBERT a model widely used in the biomedical natural language processing field. The model’s input consists of biomedical text and a 3.6. Prompt Learning Module prompt template, wherein [MASK] denotes the portion that In the prompt module, multiple prompts are combined di- requires completion by the model. Within the input se- rectly to form a complete prompt for a specific task. The quence, [MASK] is substituted with a special token, sig- complete prompt template is as follows: nifying its prediction requirement. To ensure the model comprehends the word’s position within the sentence, each word embedding vector is added to its corresponding posi- 𝑝 (𝑥) = 𝑥 𝑡ℎ𝑒 [𝑀 𝐴𝑆𝐾]1 𝑒1 (10) tion vector in the sequence. The Transformer architecture [𝑀 𝐴𝑆𝐾]2 𝑡ℎ𝑒 [𝑀 𝐴𝑆𝐾]3 𝑒2 is employed to encode the sequence of embedding vectors and position vectors. This architecture comprises multiple where [𝑀 𝐴𝑆𝐾]1 is mask of entity, [𝑀 𝐴𝑆𝐾]2 is mask layers, each containing a multi-head attention mechanism of entity relation, and [𝑀 𝐴𝑆𝐾]3 is mask of entity. The and a feed-forward neural network. Each layer encodes an corresponding label words are as follows: input vector sequence to extract its representation. This encoding approach effectively captures both the semantic 𝑉[𝑀 𝐴𝑆𝐾1 ] = {𝐶ℎ𝑒𝑚𝑖𝑐𝑎𝑙, 𝐺𝑒𝑛𝑒, 𝐷𝑟𝑢𝑔} (11) and syntactic information present in the input sequence, thereby enhancing the model’s ability to predict the content to fill the [MASK]. 𝑉[𝑀 𝐴𝑆𝐾2 ] = {ℎ𝑎𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 𝑡ℎ𝑒, ℎ𝑎𝑠 𝑐𝑢𝑟𝑏𝑒𝑑 𝑡ℎ𝑒, ...} (12) 3.5. Knowledge Enhancement Module 𝑉[𝑀 𝐴𝑆𝐾3 ] = {𝐶ℎ𝑒𝑚𝑖𝑐𝑎𝑙, 𝐺𝑒𝑛𝑒, 𝐷𝑟𝑢𝑔} (13) Biomedical entities are sourced from Wikipedia and Drug- Bank using crawler technology to obtain interpretation in- Due to the possibility that the aggregated template may formation in the biomedical domain. This interpretation contain multiple [MASK], all masked locations must be information is denoted as 𝑆𝑒 = { 𝐸1 ,𝐸2 ,𝐸3 ,...,𝐸𝑁 }, where considered for prediction. In the sentence, each [MASK] 𝐸𝑖 represents the 𝑖 − 𝑡ℎ word and 𝑁 represents the length is equivalent to a classification mapped to a label word. of the sentence. Each position will get a corresponding probability, and the The vector 𝑒1 for a biomedical entity is computed as the probability of the entire sentence being predicted correctly average of the hidden layer vectors from 𝐻𝑖 to 𝐻𝑗 in the is the cumulative multiplication of the probabilities of each model. Similarly, the vector 𝑒2 for another biomedical entity position. The final probability calculation formula is as is obtained as the average of the hidden layer vectors from follows: 54 Table 1 Table 2 The label words for the prompt of the CPI dataset The label words for the prompt of the DDI dataset Class Label Prompt1 Prompt2 Prompt3 Class Label Prompt1 Prompt2 Prompt3 False Gene/Chemical has nothing to Chemical/Gene False DRUG has nothing to DRUG CPR: 3 Gene/Chemical was activated by Chemical/Gene Advice DRUG need advice with DRUG CPR: 4 Gene/Chemical has curbed by Chemical/Gene Mechanism DRUG generate mechanisms with DRUG CPR: 5 Gene/Chemical is agitation of Chemical/Gene Effect DRUG make effect with DRUG CPR: 6 Gene/Chemical is antagonist of Chemical/Gene Int DRUG will interact with DRUG CPR: 9 Gene/Chemical is substrate of Chemical/Gene Table 3 Statistics of the DDI corpus Train set Test set Relation type (14) ∏︀𝑛 𝑃 (𝑦|𝑥) = 𝑗=1 𝑃 ([𝑀 𝐴𝑆𝐾𝑗 ] = 𝜑𝑗 (𝑦) |𝑝 (𝑥)) DrugBank MEDLINE DrugBank MEDLINE Advice 818 8 214 7 where 𝑛 is the number of mask positions in 𝑝(𝑥), and 𝜑𝑗 (𝑦) Mechanism 1257 62 278 24 is the label word set 𝑉[𝑀 𝐴𝑆𝐾𝑗 ] that maps class 𝑦 to the Effect 1535 152 298 62 𝑗 − 𝑡ℎ mask position [𝑀 𝐴𝑆𝐾]𝑗 . Int 178 10 94 2 Negative 22217 1555 4381 401 During the training process, the model will predict the Total 26005 1787 5265 496 [MASK] part of the input sequence through the masked language model (MLM) according to the information in the context, which makes the goal of the model consistent with Table 4 the task goal of the MLM, thus effectively reducing the pre- Statistics of the CPI corpus. training and downstream task gap. Relation type Training set Development set Test set In our model, label words are critical to accurately classify the relation between biomedical entities. We design a set of CPR:3 768 550 665 label words for each relation type, and further verify their CPR:4 2251 1094 1661 effectiveness by using them for model training and testing. CPR:5 173 116 195 CPR:6 235 199 293 Entity label words refer to words that describe biomedical CPR:9 727 457 644 entity types, such as Chemical or Gene. Label words can False 15306 9404 13485 help the model better understand entity types and thus Total 19460 11820 16943 correctly predict the relation between entities. Relational label words are key short sentences describing the relation types of biomedical entities, which are very im- Table 5 portant for the classification results of biomedical entities. The setting of hyper-parameters parameter. During the learning process of the model, fill in the predic- Parameter Name Value tion result of [MASK] and the closest set of label words in the label word set, and the relation label words can make the Sentence feature dimension 768 model better understand the relation between biomedical Max sentence length 512 Number of hidden layers of BioBERT 12 entities. Table 1 and Table 2 show the details of biomedical Batch size 8 entity label words and relation label words in CPI dataset Learning rate 2e-5 and DDI dataset respectively. Epoch 10 Dropout rate 0.1 Weight decay 1e-5 4. Experiments and Discussion 4.1. Datasets and Evaluation Metrics and PubMed Central, which are annotated with different The performance of the model is evaluated by the DDI Ex- types of CPIs, such as inhibition, activating. The dataset was traction 2013 dataset [29] and the ChemProt dataset [30]. originally created for the BioCreative IV challenge in 2013, DDI Extraction 2013 Dataset and has since become a widely used benchmark dataset in The DDI Extraction 2013 dataset is a dataset for extract- the field of biomedical natural language processing. Detailed ing drug-drug interaction relations. This dataset contains statistics are shown in Table 4. medical texts from multiple sources such as DrugBank and Evaluation Metrics MedLine. DrugBank provides drug names, chemical for- To assess the efficacy of the proposed model, its perfor- mulas, and pharmacological information, while MedLine mance is measured using precision, recall, micro-F1 and provides abstracts and full-text articles containing DDI in- macro-F1 metrics. In particular, the micro-averaged metrics formation. All drug pairs in the text are annotated as having are employed to derive an average metric by amalgamating or not having interactions, with a total of four types of in- the contributions of all classes. The macro-F1 score is more teractions, namely Advice, Effect, Mechanism, and Int. The effective in accurately reflecting the superior performance quantity statistics of the dataset are shown in Table 3. of the model in classes with fewer samples. ChemProt Dataset ChemProt dataset is a benchmark dataset used for extract- ing chemical-protein interactions (CPIs) from biomedical literature. The dataset consists of documents from PubMed 55 Table 6 Performance comparison on the DDI dataset. F1-score on each type Model P R Micro-F1 Macro-F1 Advice Mechanism Effect Int CNN 77.7 70.2 69.3 46.4 75.7 64.7 69.8 65.9 DCNN 78.2 70.6 69.9 46.4 77.2 64.4 70.2 66.3 ACNN - - - - 76.3 63.3 69.1 - RNN - - - - 78.6 63.8 72.1 - LSTM 80.3 72.3 65.5 44.1 74.5 65.0 69.4 65.5 Two-stage LSTM - - - - - - 69.0 - ASDP-LSTM 80.3 74.0 71.8 54.3 74.1 71.8 72.9 70.1 ATT-BLSTM 85.1 77.5 76.6 57.7 78.4 76.2 77.3 74.2 AGCN 86.2 78.7 74.2 52.6 78.2 75.6 76.9 72.9 BERT - - - - - - 78.8 - BioBERT - - - - 79.9 78.1 79.0 - EMSI-BERT 86.8 86.6 80.7 56.0 - - 82.0 77.5 SECK[28] - - - - 83.0 81.1 82.0 - Our model 84.3 78.4 86.3 58.2 84.2 83.4 83.8 76.8 Table 7 Performance comparison on the CPI dataset. F1-score on each type Model P R Micro-F1 Macro-F1 CPR:3 CPR:4 CPR:5 CPR:6 CPR:9 [31] 49.8 66.5 56.5 69.6 28.3 63.5 51.2 56.7 54.1 LSTM - - - - - 59.1 67.8 63.1 - GA-BGRU [32] - - - - - 65.4 64.8 65.1 - [20] 59.4 71.8 65.7 72.5 50.1 70.6 61.8 65.9 63.9 Bi-LSTM 64.7 75.3 68.1 79.3 55.7 67.0 72.0 69.4 68.6 BERT - - - - - 74.5 70.6 72.5 - Sun et al [26] 71.5 81.3 70.9 79.9 69.9 77.1 76.1 76.6 74.7 BERT-Att-Capsule 72.9 78.6 72.7 77.9 64.4 77.8 71.7 74.7 73.3 BioBERT - - - - - 77.0 75.9 76.5 - Our model 74.3 81.4 77.7 82.3 69.4 80.0 81.1 80.5 77.1 Table 8 recall, Micro-F1 and Macro-F1 scores were used to assess the Ablation study of the model. model’s performance. The Micro-F1 and Macro-F1 scores provide a comprehensive evaluation of the model’s perfor- DDI 2013 ChemProt Model mance, with higher values indicating better performance. P R Micro-F1 P R Micro-F1 The model achieved P, R, Micro-F1 and Macro-F1 scores Our Model (DMPL) 84.2 83.4 83.8 80.0 81.1 80.5 of 84.2%, 83.4%, 83.8% and 76.8%, respectively, better than DMPL w/o DK 82.9 82.5 82.7 77.9 79.9 78.9 achieved baselines on the DDI dataset . DMPL w/o PL 82.8 81.0 81.9 78.0 77.6 77.8 Furthermore, the model achieved F1-scores of 84.3%, DMPL w/o DK w/o PL 81.8 80.7 81.3 77.9 76.9 77.4 78.4%, 86.3%, and 58.2% in the Advice, Mechanism, Effect, BioBERT 79.9 78.1 79.0 77.0 75.9 76.5 and Int categories, respectively. Notably, the F1-scores in the Int type, which has limited data, surpassed those of other methods. Comparison results with alternative models 4.2. Experimental Settings suggest that the model proposed in this study effectively Implement the model proposed in this article through the enhances biomedical relation extraction performance. Python programming language and PyTorch development Table 7 exhibits the comparison results between this framework. The Python language has good compatibility model and other approaches on the CPI dataset. The model with existing deep learning frameworks. Set the batch size achieved P, R, Micro-F1 and Macro-F1 scores of 80.0%, 81.1%, to 8. During the training process, an Adam optimizer was 80.5% and 77.1%, respectively, representing a 4% improve- used to optimize the parameters that affect model training ment in Micro-F1 score compared to the BioBERT model. and output. Set the maximum sentence length to 512 and Moreover, the model obtained F1-scores of 74.3%, 81.4%, the learning rate of the model to 2e-5. The experimental 77.7%, 82.3%, and 69.4% in the CPR:3, CPR:4, CPR:5, CPR:6, parameter settings are detailed in Table 5. and CPR:9 types, respectively. The comparison results with other models demonstrate that the model proposed in this paper effectively enhances the classification performance 4.3. Experimental Results of types with limited data. Comparison with Other Models The CPI dataset and the Ablation Study The ablation studies were conducted DDI dataset were employed to evaluate the performance of to assess the individual contributions of each module in the model. Table 6 presents the experimental results of the the model towards the overall performance. The outcomes model and other approaches on the DDI dataset. Precision, of these studies are presented in Table 8. After eliminat- 56 Figure 3: Examples of extraction results by different methods. Table 9 components of the model, contributing significantly to the Comparative experimental results of low resource biomedical improvement of biomedical relation extraction performance. relation extraction. Dataset Model 8-shot 16-shot 32-shot 4.4. Low-resource Results BERT 10.37 16.85 25.01 BioBERT 17.41 24.17 31.02 The dataset in the relation extraction task usually requires CPI manual annotation of a large amount of high-quality data, SCIBERT 18.26 23.59 30.76 Our Model 29.67 35.58 41.15 which requires the participation of domain experts. How- BERT 12.76 20.45 29.58 ever, the cost of collecting these data is high, especially in DDI BioBERT 21.57 27.14 34.82 the biomedical field. Therefore, in the context of resource SCIBERT 22.01 26.32 34.17 scarcity, how to make the model fully utilize existing data to Our Model 33.90 38.26 43.29 achieve better performance has become a highly concerned issue. The relation extraction performance of the model is eval- ing domain knowledge from the model, the Micro-F1 score uated by simulating low-resource relation extraction when decreases by 1.1% and 1.6% in the DDI and CPI datasets, re- biomedical data is scarce. The K-shot support set is con- spectively. The experimental findings indicate that domain structed using the training set of the biomedical dataset, knowledge plays a moderating role in mitigating the impact where each entity type contains K samples. To simulate of domain-specific terminology on model performance. low-resource biomedical relation extraction, 8, 16, and 32 When prompt learning is removed from the model, the samples are sampled for each entity type, and each relation Micro-F1 score experiences a decline of 1.9% and 2.7% in type is sampled at least once. Table 9 shows the comparison the DDI dataset and the CPI dataset, respectively. We hy- of biomedical relation extraction performance of our model pothesize that prompt learning can narrow the gap between and other pre-trained models under low-resources. pre-training and downstream tasks, enabling the model to According to the comparative findings presented in Table acquire more knowledge from limited data and thereby en- 9, it is evident that our model exhibits commendable per- hancing the effectiveness of biomedical relation extraction. formance in scenarios characterized by limited resources. Upon removing both domain knowledge and prompt learn- In such instances, our model surpasses other pre-trained ing from the proposed model, the Micro-F1 score exhibits models in terms of efficacy. Notably, even when working a decrease of 2.5% and 3.1% in the DDI dataset and CPI with a relatively modest data volume at K=8, our model dataset, respectively. The experimental results demonstrate manages to attain desirable outcomes. Even upon increas- that domain knowledge and prompt learning are crucial ing K to 16, the F1 score of our model remains superior to 57 that of other models. As K is further elevated to 32, the dis- References crepancy between our model and other pre-trained models gradually diminishes alongside the expansion of the sample [1] S. Zhao, C. Su, Z. Lu, F. Wang, Recent advances in size. Nevertheless, our model’s performance continues to biomedical literature mining, Briefings in Bioinfor- outshine that of other models. Empirical evidence substan- matics 22 (2021) bbaa057. tiates the notion that our model effectively enhances the [2] T. Zhang, J. Leng, Y. Liu, Deep learning for drug–drug accuracy of biomedical relation extraction when confronted interaction extraction from the literature: a review, with limited resources. Briefings in bioinformatics 21 (2020) 1609–1627. [3] Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Sun, B. Xu, Z. Zhao, Neural network-based approaches for 4.5. Case Study biomedical relation classification: a review, Journal of As shown in Figure 3, we selected some examples from the biomedical informatics 99 (2019) 103294. biomedical dataset for detailed analysis. We compare the [4] Y. Qiu, Y. Zhang, Y. Deng, S. Liu, W. Zhang, A com- prediction results of BioBERT with our model. According to prehensive review of computational methods for drug- Case 1, the result of the BioBERT model is Negative, indicat- drug interaction detection, IEEE/ACM transactions on ing that the prediction is incorrect, while our model is CPR: computational biology and bioinformatics 19 (2021) 9, indicating that the prediction is correct. The sentence 1968–1985. contains multiple biomedical entities, which makes it diffi- [5] Q. Zhao, D. Xu, J. Li, L. Zhao, F. A. Rajput, Knowledge cult for the model to fully learn the Semantic information guided distance supervision for biomedical relation ex- of biomedical text. Upon integrating the biomedical entities traction in chinese electronic medical records, Expert with the expertise found in the knowledge base, the model Systems with Applications 204 (2022) 117606. is fortified to represent the said entities, facilitating a better [6] P. Su, K. Vijay-Shanker, Investigation of improving the understanding of the textual information. The prediction pre-training and fine-tuning of bert model for biomed- results show that our model can obtain enhanced text repre- ical relation extraction, BMC bioinformatics 23 (2022) sentation after integrating domain knowledge, and improve 120. the classification effect in sentences containing complex [7] S. Liu, B. Tang, Q. Chen, X. Wang, et al., Drug-drug biomedical entities. interaction extraction via convolutional neural net- According to Case 2, there are multiple biomedical enti- works, Computational and mathematical methods in ties in the sentence, which makes it difficult for the model medicine 2016 (2016). to fully learn the Semantic information of biomedical text, [8] S. Liu, K. Chen, Q. Chen, B. Tang, Dependency-based and BioBERT model makes wrong predictions. Our model convolutional neural network for drug-drug interac- fused domain knowledge and made correct predictions. Ac- tion extraction, in: 2016 IEEE international conference cording to Case 3, the BioBERT model incorrectly predicts on bioinformatics and biomedicine (BIBM), IEEE, 2016, Int type text as Mechanism. The small number of Int type pp. 1074–1080. training samples makes it difficult for the BioBERT model [9] M. Asada, M. Miwa, Y. Sasaki, Extracting drug-drug to fully learn its class characteristics. Our model can obtain interactions with attention cnns, in: BioNLP 2017, more knowledge from limited data by introducing prompt 2017, pp. 9–18. learning, effectively alleviating the problem of insufficient [10] R. Kavuluru, A. Rios, T. Tran, Extracting drug-drug learning knowledge when the data volume is small. There- interactions with word and character-level recurrent fore, our model made the correct prediction. neural networks, in: 2017 IEEE International Confer- ence on Healthcare Informatics (ICHI), IEEE, 2017, pp. 5–12. 5. Conclusion [11] S. Lim, K. Lee, J. Kang, Drug drug interaction ex- traction from the literature using a recursive neural In this study, we propose a biomedical relation extraction network, PloS one 13 (2018) e0190926. model based on domain knowledge and prompt learning. [12] S. K. Sahu, A. Anand, Drug-drug interaction extraction The model can enhance entity representation by integrating from biomedical texts using long short-term memory domain knowledge, thus reducing the impact of highly tech- network, Journal of biomedical informatics 86 (2018) nical languages and domain specific terms in biomedical 15–24. texts on model performance. By introducing prompt learn- [13] V. Mostafapour, O. Dikenelli, Attention-wrapped hi- ing, more knowledge can be obtained from limited data, erarchical blstms for ddi extraction, arXiv preprint effectively alleviating the problem of insufficient knowledge arXiv:1907.13561 (2019). that models can learn when the data volume is small, thereby [14] W. Wang, X. Yang, C. Yang, X. Guo, X. Zhang, C. Wu, improving the classification effect of biomedical relation. Dependency-based long short term memory network The experimental results show that the model can effectively for drug-drug interaction extraction, BMC bioinfor- improve the accuracy of biomedical relation extraction by matics 18 (2017) 99–109. introducing domain knowledge and prompt learning. [15] D. Huang, Z. Jiang, L. Zou, L. Li, Drug–drug interac- In the future, we will continue to explore the potential of tion extraction from biomedical literature using sup- prompt learning, try different prompt methods, and apply port vector machine and long short term memory net- our model to document-level relation extraction. works, Information sciences 415 (2017) 100–109. [16] W. Zheng, H. Lin, L. Luo, Z. Zhao, Z. Li, Y. Zhang, Acknowledgments Z. Yang, J. Wang, An attention-based effective neural model for drug-drug interactions extraction, BMC This work is supported by grant from the Natural Science bioinformatics 18 (2017) 1–11. Foundation of China (No. 62072070 and 62106034) [17] Y. Zhang, W. Zheng, H. Lin, J. Wang, Z. Yang, M. Du- 58 montier, Drug–drug interaction extraction via hier- (2019) 61–68. archical rnns on sequence and shortest dependency paths, Bioinformatics 34 (2018) 828–835. [18] Y. Peng, A. Rios, R. Kavuluru, Z. Lu, Extracting chemical–protein relations with ensembles of svm and deep learning models, Database 2018 (2018) bay073. [19] C. Sun, Z. Yang, L. Luo, L. Wang, Y. Zhang, H. Lin, J. Wang, A deep learning approach with deep contex- tualized word representations for chemical–protein interaction extraction from biomedical literature, IEEE Access 7 (2019) 151034–151046. [20] Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Sun, Chemical– protein interaction extraction via contextualized word representations and multihead attention, Database 2019 (2019) baz054. [21] W. Xiong, F. Li, H. Yu, D. Ji, Extracting drug-drug inter- actions with a dependency-based graph convolution neural network, in: 2019 IEEE International Confer- ence on Bioinformatics and Biomedicine (BIBM), IEEE, 2019, pp. 755–759. [22] C. Park, J. Park, S. Park, Agcn: Attention-based graph convolutional networks for drug-drug interaction ex- traction, Expert Systems with Applications 159 (2020) 113538. [23] Y. Peng, S. Yan, Z. Lu, Transfer learning in biomedical natural language processing: an evaluation of bert and elmo on ten benchmarking datasets, arXiv preprint arXiv:1906.05474 (2019). [24] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical language representa- tion model for biomedical text mining, Bioinformatics 36 (2020) 1234–1240. [25] Z. Huang, N. An, J. Liu, F. Ren, Emsi-bert: Asymmetri- cal entity-mask strategy and symbol-insert structure for drug–drug interaction extraction based on bert, Symmetry 15 (2023) 398. [26] C. Sun, Z. Yang, L. Su, L. Wang, Y. Zhang, H. Lin, J. Wang, Chemical–protein interaction extraction via gaussian probability distribution and external biomed- ical knowledge, Bioinformatics 36 (2020) 4323–4330. [27] C. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, J. Wang, At- tention guided capsule networks for chemical-protein interaction extraction, Journal of Biomedical Infor- matics 103 (2020) 103392. [28] X. Liu, J. Tan, J. Fan, K. Tan, J. Hu, S. Dong, A syntax-enhanced model based on category keywords for biomedical relation extraction, Journal of Biomed- ical Informatics 132 (2022) 104135. [29] M. Herrero-Zazo, I. Segura-Bedmar, P. Martínez, T. De- clerck, The ddi corpus: An annotated corpus with pharmacological substances and drug–drug interac- tions, Journal of biomedical informatics 46 (2013) 914– 920. [30] J. Kringelum, S. K. Kjaerulff, S. Brunak, O. Lund, T. I. Oprea, O. Taboureau, Chemprot-3.0: a global chem- ical biology diseases mapping, Database 2016 (2016) bav123. [31] P.-Y. Lung, T. Zhao, Z. He, J. Zhang, Extracting chemi- cal protein interactions from literature, in: Proceed- ings of the BioCreative VI Workshop, 2017, pp. 159– 162. [32] H. Lu, L. Li, X. He, Y. Liu, A. Zhou, Extracting chemical- protein interactions from biomedical literature via granular attention based recurrent neural networks, Computer methods and programs in biomedicine 176 59