Biomedical Relation Extraction via Domain Knowledge and
                         Prompt Learning
                         Jianyuan Yuan1,† , Wei Du1,† , Xiaoxia Liu2 and Yijia Zhang1,*
                         1
                             Dalian Maritime University, Dalian 116024, Liaoning, China
                         2
                             Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA


                                            Abstract
                                             Biomedical relation extraction plays a crucial role in extracting crucial biomedical information from extensive literature, thereby
                                             supporting disease treatment and the construction of biomedical knowledge bases. However, biomedical texts contain highly technical
                                             language and domain-specific terminology, which makes it difficult for models to fully understand their semantics. Furthermore,
                                             imbalances in the distribution of samples across different categories in biomedical datasets result in reduced classification accuracy for
                                             categories with limited training samples. In this study, we propose a biomedical relation extraction model based on domain knowledge
                                             and prompt learning. The prompt template guides the model to focus on key features and information, so that more knowledge can be
                                             obtained from limited data. Utilizing domain knowledge to acquire refined entity representations, thereby mitigating the challenges
                                             posed by technical language and domain-specific terminology. The model is evaluated on the DDI Extraction 2013 dataset and the
                                             ChemProt dataset, and the experimental results demonstrate that our model can achieve state-of-the-art performance.

                                             Keywords
                                             biomedical relation extraction, prompt learning, biomedical literature, domain knowledge


                         1. Introduction                                                                                                      is inconsistent with the supervised classification operation
                                                                                                                                              of the downstream task, the model cannot fully apply its
                         With the rapid development of the biomedical field, the                                                              prior knowledge to the downstream task.
                         amount of biomedical literature has exploded, which con-                                                                We propose a biomedical relation extraction model based
                         tains a wealth of biomedical information [1]. Biomedical                                                             on domain knowledge and prompt learning. Domain knowl-
                         relation extraction is a natural language processing tech-                                                           edge can provide entities with richer feature representations,
                         nology whose purpose is to extract the relation between                                                              which can better reflect the essence of entities and improve
                         entities from biomedical text data [2]. This technology can                                                          the effect of entity representation. Prompt learning is a
                         help researchers quickly extract important biomedical in-                                                            method that can effectively bridge the gap between pre-
                         formation from literature, and provide important support                                                             training and fine-tuning on downstream tasks. The core
                         for drug development and disease treatment [3].                                                                      idea of this method is to transform the traditional classifi-
                            The highly technical language and domain-specific termi-                                                          cation task into a cloze problem. By designing a prompt
                         nology used in biomedical texts complicates this task, and                                                           template, replace a word or a continuous short sentence
                         traditional approaches often struggle to achieve high perfor-                                                        (usually represented by [MASK]) in the input text with the
                         mance [4]. Moreover, there are differences in the number of                                                          corresponding label words, and ask the model to predict the
                         samples of each category in the biomedical data set, result-                                                         label words. This approach makes the model need to con-
                         ing in low classification accuracy for categories with fewer                                                         sider more contextual information when predicting, so as to
                         training samples. Meanwhile, biomedical relation extraction                                                          better understand the semantics of the input text. Overall,
                         usually requires a large amount of labeled data to effectively                                                       the contributions of this paper are as follows:
                         train the model. However, due to the huge amount of data,
                         the cost of manual labeling is very high, and how to obtain                                                              1) We propose a biomedical relation extraction model
                         more knowledge from limited data becomes very important                                                                     based on prompt learning, which can guide the
                         [5].                                                                                                                        model to focus on key features and information by
                            The application of pre-trained language models in                                                                        constructing multiple task-related prompt. By in-
                         biomedical texts has received widespread attention and ex-                                                                  troducing prompt learning, more knowledge can be
                         ploration [6]. Most of the current biomedical relation extrac-                                                              obtained from limited data, which effectively allevi-
                         tion methods mainly rely on pre-trained language models.                                                                    ates the problem of insufficient knowledge that the
                         Although the pre-trained language model has the ability to                                                                  model can learn when the amount of data is small.
                         learn the general representation of language, there is a sig-                                                            2) The model obtains detailed information of biomed-
                         nificant difference between the pre-training target and the                                                                 ical entities through domain knowledge and ob-
                         downstream task fine-tuning, which has a very important                                                                     tains enhanced entity representation. In addition,
                         impact on the performance of the model in the downstream                                                                    special tokens are embedded around entities, en-
                         task. As shown in Fig.1, since the target of unsupervised pre-                                                              abling entities to better integrate domain knowledge,
                         diction of the input text sequence of the pre-trained model                                                                 thereby reducing the impact of high-tech language
                                                                                                                                                     and domain-specific terminology in biomedical texts
                         Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities
                                                                                                                                                     on model performance.
                         from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024),                                                  3) The model is experimented on the ChemProt dataset
                         April 23 24, 2024, Changchun, China and Online                                                                              and the DDI Extraction 2013 dataset. Experimental
                         *
                           Corresponding author.                                                                                                     results demonstrate that the proposed model outper-
                         †
                           These authors contributed equally.                                                                                        forms existing methods and achieves state-of-the-art
                         $ jianyuany@dlmu.edu.cn (J. Yuan); duwei@dlmu.edu.cn (W. Du);
                                                                                                                                                     performance in biomedical relation extraction.
                         xxliu@stanford.edu (X. Liu); zhangyijia@dlmu.edu.cn (Y. Zhang)
                          0000-0002-5843-4675 (Y. Zhang)
                                     © 2024 Copyright 2024 for this paper by its authors. Use permitted under Creative Commons License
                                     Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                                         51
          Figure 1: The instances of pre-training, fine-tuning, and prompt-tuning for relation extraction in the biomedical domain.


2. Related Work                                                        extract the interaction relation between drugs in biomedical
                                                                       texts. Zhang et al. [17] utilizes the shortest dependency path
Recently, various neural network-based approaches have                 to determine the grammatical relations within a sentence,
demonstrated commendable outcomes in diverse relation                  and extracts keywords located between two entities.
extraction tasks and have been extensively employed in                    Peng et al. [18] proposed a multi-model approach that
biomedical research. Liu et al. [7] utilized a convolutional           combines a SVM, CNN, and RNN to improve the perfor-
neural network (CNN) model for biomedical relation ex-                 mance of biomedical relation extraction. Sun et al. [19]
traction, demonstrating its effectiveness in achieving high            improved biomedical relation extraction by integrating at-
performance. In this model, the words in the sentences of              tention and ELMo representations with bidirectional LSTM
the biomedical dataset serve as inputs to the CNN, which can           networks. A neural model for extracting CPI was proposed
effectively capture local features. Liu et al. [8] introduced a        by Zhang et al. [20], which utilized depth context repre-
model for biomedical relation extraction tasks,which is the            sentation and a multi-head attention mechanism. Xiong et
dependency convolutional neural network (DCNN) model.                  al. [21] presented a model that utilizes a combination of a
By utilizing the dependency parse tree, the DCNN model                 Graph Convolutional Neural Network (GCNN) and a LSTM
can effectively capture the interdependency between words.             network for extracting biomedical relations. Park et al. [22]
Sasaki et al. [9] applied an attention-based CNN model to              utilized attention-based GCN for the task of biomedical re-
biomedical relation extraction tasks. Each word in a biomed-           lation extraction.
ical sentence has a varying impact on the final classification            Peng et al. [23] applied the BERT (Bidirectional Encoder
outcome in relation extraction. Kavuluru et al. [10] pro-              Representation of Transformer) model to the task of biomed-
posed a method that employs recurrent neural networks                  ical relation extraction. Lee et al. [24] extended the BERT
(RNNs) at the word and character levels to extract drug-drug           model by training it on a large-scale biomedical corpus, re-
interaction relations. Lim et al. [11] proposed a method us-           sulting in the BioBERT model. Huang et al. [25] proposed
ing recurrent neural networks to automatically extract drug            an EMSI-BERT method for drug-drug interaction extrac-
interactions in the literature. This method decomposes the             tion. This method utilizes an asymmetric entity masking
text into a syntax tree and uses RNN to recursively process            strategy and a symbol insertion structure. Sun et al. [26]
the tree structure to extract drug-drug interaction informa-           proposed a model that uses a combination of Gaussian prob-
tion.                                                                  ability distribution and external biomedical knowledge to
   Sahu et al. [12] used Long Short-Term Memory Network                extract CPI. Sun et al. [27] proposed a model( BERT Att cap-
(LSTM) to automatically extract drug interaction informa-              sule) that utilizes a BERT-based attention-guided capsule
tion from biomedical texts. Mostafapour et al. [13] proposed           network to extract CPI. This method uses attention mech-
a model that uses Bi-directional Long-Short Term Memory                anisms to guide the extraction of interactions and capsule
(BiLSTM) to model context information in text sequences                networks to capture the interactions’ semantic features. Liu
and uses a hierarchical structure to consider different levels         et al. [28] proposed a grammar-enhanced model and a cate-
of semantic information. Wang et al.[14] used dependency               gory keyword-based approach. The model uses graph-based
parsing to model the relation between drugs in text and used           grammar to build a syntactic tree and uses type keywords
the LSTM network to capture contextual information in text             to guide the model to extract specific types of relations. Su
sequences. Huang et al. [15] employed a hybrid model con-              et al. Su and Vijay-Shanker [6] explore the approaches to
sisting of a support vector machine(SVM) and LSTM for                  improve the BERT model for relation extraction tasks in
extracting drug interaction information. Zheng et al. [16]             both the pre-training and fine-tuning stages.
proposed a BiLSTM model with an attention mechanism to


                                                                  52
          Figure 2: The schematic overview of the proposed model. The black arrows indicates the input stream.


3. Method                                                                   3.3. Input Module
                                                                            For the biomedical relation extraction task, it is represented
3.1. Problem Definition
                                                                            as 𝑇 = 𝑋, 𝑌 , where 𝑋 represents the input text and 𝑌 rep-
Given a sentence sequence 𝑆 = {𝑐1 ,𝑐2 ,. . . ,𝑐𝑛−1 ,𝑐𝑛 }, where             resents the category label. The sentence in the biomedical
𝑐 is a word in sentence and n is the length of the sen-                     dataset is represented as 𝑥 = {𝑥1 ,. . . ,𝑒1 ,. . . 𝑒2 ,. . . ,𝑥𝑛 }, where
tence. The subject entity 𝑒1 ={𝑐𝑖 ,. . . ,𝑐𝑗 } and the object entity        𝑒1 , 𝑒2 represents two biomedical entities, respectively. A
𝑒1 ={𝑐𝑥 ,. . . ,𝑐𝑦 } are located in the same sentence. Biomedical           key part of prompt learning is to construct an appropriate
relation extraction aims to identify the relation 𝑟 between 𝑒1              template P and label word V. M: Y → V is a mapping that
and 𝑒2 , where 𝑟 is either selected from a predefined relation              connects the task label with the label word V.
set 𝑅 or 𝑁 𝐴.                                                                  The model’s input comprises two components, specifi-
                                                                            cally the input text denoted as 𝑥 and the prompt template
3.2. Model Framework                                                        denoted as 𝑝(𝑥). The sentence is subjected to tokenization,
                                                                            and each token is encoded using a vector of d dimensions.
Fig.2 shows the architecture of the biomedical relation ex-                 Moreover, an embedded "CLS" token is added at the begin-
traction model based on domain knowledge and prompt                         ning of each sentence sequence. To denote the boundaries
learning. The model consists of four modules: input module,                 of each biomedical entity, special symbols are introduced.
encoding module, knowledge enhancement module, and                          The first entity is enclosed by "$" symbols on both sides,
prompt learning module. We have designed three prompt                       while the second entity is enclosed by "#" symbols on both
templates, namely the prompt for biomedical entity 𝑒1, the                  sides.
prompt for biomedical entity relations, and the prompt for                     In addition to retaining the original input in 𝑥, multiple
biomedical entity 𝑒2. Firstly, input biomedical text and                    [MASKs] need to be fed into the model. Three prompts
prompt templates into the model for encoding. Then, the                     are designed in the input prompt template, respectively,
enhanced entity representation is obtained through knowl-                   the prompt 𝑝𝑒1 (𝑥) corresponding to the biomedical entity
edge enhancement. Finally, through the prompt module, the                   𝑒1 , the prompt 𝑝𝑟 (𝑥) corresponding to the biomedical en-
model can predict the label words at the [MASK] position                    tity relation and the prompt 𝑝𝑒2 (𝑥) corresponding to the
and select their corresponding labels for classification.                   biomedical entity 𝑒2 . Denote the prompt template 𝑝(𝑥) cor-
                                                                            responding to the input text 𝑥 as:


                                                                       53
                                                                     𝐻𝑘 to 𝐻𝑚 in the model. The calculation formulas for these
             𝑝(𝑥) = {𝑝𝑒1 (𝑥), 𝑝𝑟 (𝑥), 𝑝𝑒2 (𝑥)}            (1)        vectors are as follows:
  The prompt 𝑝𝑒1 (𝑥) corresponding to biomedical entity 𝑒1
and the prompt 𝑝𝑒2 (𝑥) corresponding to biomedical entity                                                          𝑗
                                                                                       [︃          (︃                      )︃]︃
                                                                                                            1     ∑︁
𝑒2 can be formalized as follows:                                           𝐻1′ = 𝑊1         𝑡𝑎𝑛ℎ                      𝐻𝑡          + 𝑏1    (6)
                                                                                                        𝑗 − 𝑖 + 1 𝑡=𝑖
              𝑝𝑒1 (𝑥) = {𝑥, 𝑡ℎ𝑒[𝑀 𝐴𝑆𝐾]𝑒1 }                (2)
                                                                                      [︃       (︃                  𝑚
                                                                                                                           )︃]︃
                                                                                                      1   ∑︁
              𝑝𝑒2 (𝑥) = {𝑥, 𝑡ℎ𝑒[𝑀 𝐴𝑆𝐾]𝑒2 }                (3)             𝐻2′ = 𝑊2 𝑡𝑎𝑛ℎ                      𝐻𝑡                   + 𝑏2    (7)
                                                                                                    𝑚−𝑘+1
                                                                                                                  𝑡=𝑘
   Then, the prompt 𝑝𝑟 (𝑥) for the relation between biomed-
ical entities is designed. For example, in the biomedical            where 𝑊1 ∈ 𝑅𝑑×𝑑 , 𝑊2 ∈ 𝑅𝑑×𝑑 denote weight matrices.
example sentence above, the relation type is CPR: 4, which           𝑏1 , 𝑏2 denote bias vectors.
means that the relation between entity 𝑒1 and entity 𝑒2 is              The semantic feature representation of domain knowl-
"inhibition". The prompt template for the relation type is           edge is acquired by the model using BioBERT. This vector is
"𝑒1 [MASK] 𝑒2 ", and the prompt label word is "has curved            then combined with entity interpretation information and
the". Prompt 𝑝𝑟 (𝑥) for the relation corresponding to the            the corresponding entity vector to generate an improved
input text 𝑥 can be expressed as:                                    vector representation of biomedical entities. When a sen-
                                                                     tence 𝑆𝑒 containing biomedical knowledge is successfully
               𝑝𝑟 (𝑥) = {𝑥, 𝑒1 [𝑀 𝐴𝑆𝐾]𝑒2 }                (4)        matched with entity 𝑒1 , the final hidden layer vector 𝐻𝑒1
                                                                     of "CLS" can be obtained from BioBERT. The acquired en-
   The complete input composition can be formalized as
                                                                     hanced representation is integrated into the model, with the
follows:
                                                                     calculation formulas being as follows:
                   𝐼𝑛𝑝𝑢𝑡𝑥 = {𝑥, 𝑝(𝑥)}                     (5)
                                                                      𝐻𝐸1 = 𝑊4 𝑐𝑜𝑛𝑐𝑎𝑡 𝐻1′ , 𝑊3 (𝑡𝑎𝑛ℎ (𝐻𝑒1 )) + 𝑏3 + 𝑏4
                                                                              [︀     (︀                          )︀]︀
3.4. Encode Module                                                                                                    (8)
BioBERT is a pre-trained model based on BERT, which is
suitable for natural language processing tasks of biomedical          𝐻𝐸2 = 𝑊6 𝑐𝑜𝑛𝑐𝑎𝑡 𝐻2′ , 𝑊5 (𝑡𝑎𝑛ℎ (𝐻𝑒2 )) + 𝑏5 + 𝑏6
                                                                                  [︀       (︀                         )︀]︀
texts. The BioBERT model is trained using a large corpus in                                                                    (9)
the biomedical field, which can improve the text understand-         where 𝑊3 , 𝑊4 , 𝑊5 , 𝑊6 denote weight matrices. 𝑏3 , 𝑏4 , 𝑏5 ,
ing and classification performance in the biomedical field,          𝑏6 denote bias vectors.
making BioBERT a model widely used in the biomedical
natural language processing field.
   The model’s input consists of biomedical text and a               3.6. Prompt Learning Module
prompt template, wherein [MASK] denotes the portion that             In the prompt module, multiple prompts are combined di-
requires completion by the model. Within the input se-               rectly to form a complete prompt for a specific task. The
quence, [MASK] is substituted with a special token, sig-             complete prompt template is as follows:
nifying its prediction requirement. To ensure the model
comprehends the word’s position within the sentence, each
word embedding vector is added to its corresponding posi-                𝑝 (𝑥) = 𝑥    𝑡ℎ𝑒      [𝑀 𝐴𝑆𝐾]1          𝑒1
                                                                                                                                         (10)
tion vector in the sequence. The Transformer architecture                            [𝑀 𝐴𝑆𝐾]2            𝑡ℎ𝑒   [𝑀 𝐴𝑆𝐾]3           𝑒2
is employed to encode the sequence of embedding vectors
and position vectors. This architecture comprises multiple              where [𝑀 𝐴𝑆𝐾]1 is mask of entity, [𝑀 𝐴𝑆𝐾]2 is mask
layers, each containing a multi-head attention mechanism             of entity relation, and [𝑀 𝐴𝑆𝐾]3 is mask of entity. The
and a feed-forward neural network. Each layer encodes an             corresponding label words are as follows:
input vector sequence to extract its representation. This
encoding approach effectively captures both the semantic                     𝑉[𝑀 𝐴𝑆𝐾1 ] = {𝐶ℎ𝑒𝑚𝑖𝑐𝑎𝑙, 𝐺𝑒𝑛𝑒, 𝐷𝑟𝑢𝑔}                         (11)
and syntactic information present in the input sequence,
thereby enhancing the model’s ability to predict the content
to fill the [MASK].                                                   𝑉[𝑀 𝐴𝑆𝐾2 ] = {ℎ𝑎𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑒𝑑 𝑡ℎ𝑒, ℎ𝑎𝑠 𝑐𝑢𝑟𝑏𝑒𝑑 𝑡ℎ𝑒, ...}
                                                                                                                         (12)
3.5. Knowledge Enhancement Module
                                                                             𝑉[𝑀 𝐴𝑆𝐾3 ] = {𝐶ℎ𝑒𝑚𝑖𝑐𝑎𝑙, 𝐺𝑒𝑛𝑒, 𝐷𝑟𝑢𝑔}                         (13)
Biomedical entities are sourced from Wikipedia and Drug-
Bank using crawler technology to obtain interpretation in-              Due to the possibility that the aggregated template may
formation in the biomedical domain. This interpretation              contain multiple [MASK], all masked locations must be
information is denoted as 𝑆𝑒 = { 𝐸1 ,𝐸2 ,𝐸3 ,...,𝐸𝑁 }, where         considered for prediction. In the sentence, each [MASK]
𝐸𝑖 represents the 𝑖 − 𝑡ℎ word and 𝑁 represents the length            is equivalent to a classification mapped to a label word.
of the sentence.                                                     Each position will get a corresponding probability, and the
   The vector 𝑒1 for a biomedical entity is computed as the          probability of the entire sentence being predicted correctly
average of the hidden layer vectors from 𝐻𝑖 to 𝐻𝑗 in the             is the cumulative multiplication of the probabilities of each
model. Similarly, the vector 𝑒2 for another biomedical entity        position. The final probability calculation formula is as
is obtained as the average of the hidden layer vectors from          follows:


                                                                54
Table 1                                                                 Table 2
The label words for the prompt of the CPI dataset                       The label words for the prompt of the DDI dataset

Class Label     Prompt1           Prompt2           Prompt3             Class Label    Prompt1            Prompt2              Prompt3
   False      Gene/Chemical has nothing to Chemical/Gene                  False          DRUG          has nothing to              DRUG
  CPR: 3      Gene/Chemical was activated by Chemical/Gene               Advice          DRUG         need advice with             DRUG
  CPR: 4      Gene/Chemical has curbed by Chemical/Gene                 Mechanism        DRUG     generate mechanisms with         DRUG
  CPR: 5      Gene/Chemical is agitation of Chemical/Gene                 Effect         DRUG         make effect with             DRUG
  CPR: 6      Gene/Chemical is antagonist of Chemical/Gene                 Int           DRUG         will interact with           DRUG
  CPR: 9      Gene/Chemical is substrate of Chemical/Gene

                                                                        Table 3
                                                                        Statistics of the DDI corpus

                                                                                               Train set         Test set
                                                                         Relation type
                                                           (14)
                ∏︀𝑛
   𝑃 (𝑦|𝑥) =      𝑗=1 𝑃 ([𝑀 𝐴𝑆𝐾𝑗 ] = 𝜑𝑗 (𝑦) |𝑝 (𝑥))                                      DrugBank MEDLINE DrugBank MEDLINE
                                                                            Advice          818           8  214            7
where 𝑛 is the number of mask positions in 𝑝(𝑥), and 𝜑𝑗 (𝑦)               Mechanism        1257          62  278           24
is the label word set 𝑉[𝑀 𝐴𝑆𝐾𝑗 ] that maps class 𝑦 to the                   Effect          1535        152  298           62
𝑗 − 𝑡ℎ mask position [𝑀 𝐴𝑆𝐾]𝑗 .                                               Int           178          10   94            2
                                                                           Negative        22217       1555 4381          401
   During the training process, the model will predict the
                                                                             Total         26005       1787 5265          496
[MASK] part of the input sequence through the masked
language model (MLM) according to the information in the
context, which makes the goal of the model consistent with              Table 4
the task goal of the MLM, thus effectively reducing the pre-            Statistics of the CPI corpus.
training and downstream task gap.
                                                                         Relation type     Training set   Development set     Test set
   In our model, label words are critical to accurately classify
the relation between biomedical entities. We design a set of                 CPR:3             768              550             665
label words for each relation type, and further verify their                 CPR:4            2251             1094             1661
effectiveness by using them for model training and testing.                  CPR:5             173              116             195
                                                                             CPR:6             235              199             293
Entity label words refer to words that describe biomedical
                                                                             CPR:9             727              457             644
entity types, such as Chemical or Gene. Label words can
                                                                             False            15306            9404            13485
help the model better understand entity types and thus                       Total            19460           11820            16943
correctly predict the relation between entities.
   Relational label words are key short sentences describing
the relation types of biomedical entities, which are very im-           Table 5
portant for the classification results of biomedical entities.          The setting of hyper-parameters parameter.
During the learning process of the model, fill in the predic-                 Parameter Name                                Value
tion result of [MASK] and the closest set of label words in
the label word set, and the relation label words can make the                 Sentence feature dimension                    768
model better understand the relation between biomedical                       Max sentence length                           512
                                                                              Number of hidden layers of BioBERT            12
entities. Table 1 and Table 2 show the details of biomedical
                                                                              Batch size                                    8
entity label words and relation label words in CPI dataset                    Learning rate                                 2e-5
and DDI dataset respectively.                                                 Epoch                                         10
                                                                              Dropout rate                                  0.1
                                                                              Weight decay                                  1e-5
4. Experiments and Discussion
4.1. Datasets and Evaluation Metrics
                                                                        and PubMed Central, which are annotated with different
The performance of the model is evaluated by the DDI Ex-                types of CPIs, such as inhibition, activating. The dataset was
traction 2013 dataset [29] and the ChemProt dataset [30].               originally created for the BioCreative IV challenge in 2013,
   DDI Extraction 2013 Dataset                                          and has since become a widely used benchmark dataset in
   The DDI Extraction 2013 dataset is a dataset for extract-            the field of biomedical natural language processing. Detailed
ing drug-drug interaction relations. This dataset contains              statistics are shown in Table 4.
medical texts from multiple sources such as DrugBank and                   Evaluation Metrics
MedLine. DrugBank provides drug names, chemical for-                       To assess the efficacy of the proposed model, its perfor-
mulas, and pharmacological information, while MedLine                   mance is measured using precision, recall, micro-F1 and
provides abstracts and full-text articles containing DDI in-            macro-F1 metrics. In particular, the micro-averaged metrics
formation. All drug pairs in the text are annotated as having           are employed to derive an average metric by amalgamating
or not having interactions, with a total of four types of in-           the contributions of all classes. The macro-F1 score is more
teractions, namely Advice, Effect, Mechanism, and Int. The              effective in accurately reflecting the superior performance
quantity statistics of the dataset are shown in Table 3.                of the model in classes with fewer samples.
   ChemProt Dataset
   ChemProt dataset is a benchmark dataset used for extract-
ing chemical-protein interactions (CPIs) from biomedical
literature. The dataset consists of documents from PubMed


                                                                   55
     Table 6
     Performance comparison on the DDI dataset.

                                          F1-score on each type
     Model                                                                             P           R          Micro-F1     Macro-F1
                        Advice            Mechanism         Effect          Int
     CNN                 77.7                70.2            69.3          46.4       75.7        64.7          69.8          65.9
    DCNN                 78.2                70.6            69.9          46.4       77.2        64.4          70.2          66.3
    ACNN                  -                   -                -             -        76.3        63.3          69.1           -
     RNN                  -                   -                -             -        78.6        63.8          72.1           -
     LSTM                80.3                72.3            65.5          44.1       74.5        65.0          69.4          65.5
Two-stage LSTM            -                   -                -             -         -           -            69.0           -
  ASDP-LSTM              80.3                74.0            71.8          54.3       74.1        71.8          72.9          70.1
  ATT-BLSTM              85.1                77.5            76.6          57.7       78.4        76.2          77.3          74.2
    AGCN                 86.2                78.7            74.2          52.6       78.2        75.6          76.9          72.9
     BERT                 -                   -                -             -         -           -            78.8           -
   BioBERT                -                   -                -             -        79.9        78.1          79.0           -
  EMSI-BERT              86.8               86.6             80.7          56.0        -           -            82.0          77.5
   SECK[28]               -                   -                -             -        83.0        81.1          82.0           -
  Our model              84.3                78.4            86.3          58.2       84.2        83.4          83.8          76.8


     Table 7
     Performance comparison on the CPI dataset.

                                            F1-score on each type
     Model                                                                                   P          R      Micro-F1     Macro-F1
                        CPR:3         CPR:4        CPR:5          CPR:6       CPR:9
      [31]               49.8          66.5         56.5           69.6        28.3        63.5        51.2      56.7          54.1
     LSTM                 -             -            -              -           -          59.1        67.8      63.1           -
 GA-BGRU [32]             -             -            -              -           -          65.4        64.8      65.1           -
      [20]               59.4          71.8         65.7           72.5        50.1        70.6        61.8      65.9          63.9
    Bi-LSTM              64.7          75.3         68.1           79.3        55.7        67.0        72.0      69.4          68.6
     BERT                 -             -            -              -           -          74.5        70.6      72.5           -
  Sun et al [26]         71.5          81.3         70.9           79.9        69.9        77.1        76.1      76.6          74.7
BERT-Att-Capsule         72.9          78.6         72.7           77.9        64.4        77.8        71.7      74.7          73.3
    BioBERT               -             -            -              -           -          77.0        75.9      76.5           -
   Our model             74.3          81.4         77.7           82.3        69.4        80.0        81.1      80.5          77.1


Table 8                                                                   recall, Micro-F1 and Macro-F1 scores were used to assess the
Ablation study of the model.                                              model’s performance. The Micro-F1 and Macro-F1 scores
                                                                          provide a comprehensive evaluation of the model’s perfor-
                               DDI 2013          ChemProt
         Model                                                            mance, with higher values indicating better performance.
                     P R Micro-F1 P R Micro-F1                            The model achieved P, R, Micro-F1 and Macro-F1 scores
 Our Model (DMPL) 84.2 83.4 83.8 80.0 81.1 80.5                           of 84.2%, 83.4%, 83.8% and 76.8%, respectively, better than
   DMPL w/o DK      82.9 82.5 82.7 77.9 79.9 78.9                         achieved baselines on the DDI dataset .
   DMPL w/o PL      82.8 81.0 81.9 78.0 77.6 77.8                            Furthermore, the model achieved F1-scores of 84.3%,
 DMPL w/o DK w/o PL 81.8 80.7 81.3 77.9 76.9 77.4
                                                                          78.4%, 86.3%, and 58.2% in the Advice, Mechanism, Effect,
      BioBERT       79.9 78.1 79.0 77.0 75.9 76.5
                                                                          and Int categories, respectively. Notably, the F1-scores in
                                                                          the Int type, which has limited data, surpassed those of
                                                                          other methods. Comparison results with alternative models
4.2. Experimental Settings                                                suggest that the model proposed in this study effectively
Implement the model proposed in this article through the                  enhances biomedical relation extraction performance.
Python programming language and PyTorch development                          Table 7 exhibits the comparison results between this
framework. The Python language has good compatibility                     model and other approaches on the CPI dataset. The model
with existing deep learning frameworks. Set the batch size                achieved P, R, Micro-F1 and Macro-F1 scores of 80.0%, 81.1%,
to 8. During the training process, an Adam optimizer was                  80.5% and 77.1%, respectively, representing a 4% improve-
used to optimize the parameters that affect model training                ment in Micro-F1 score compared to the BioBERT model.
and output. Set the maximum sentence length to 512 and                    Moreover, the model obtained F1-scores of 74.3%, 81.4%,
the learning rate of the model to 2e-5. The experimental                  77.7%, 82.3%, and 69.4% in the CPR:3, CPR:4, CPR:5, CPR:6,
parameter settings are detailed in Table 5.                               and CPR:9 types, respectively. The comparison results with
                                                                          other models demonstrate that the model proposed in this
                                                                          paper effectively enhances the classification performance
4.3. Experimental Results                                                 of types with limited data.
Comparison with Other Models The CPI dataset and the                         Ablation Study The ablation studies were conducted
DDI dataset were employed to evaluate the performance of                  to assess the individual contributions of each module in
the model. Table 6 presents the experimental results of the               the model towards the overall performance. The outcomes
model and other approaches on the DDI dataset. Precision,                 of these studies are presented in Table 8. After eliminat-


                                                                     56
           Figure 3: Examples of extraction results by different methods.


Table 9                                                                components of the model, contributing significantly to the
Comparative experimental results of low resource biomedical            improvement of biomedical relation extraction performance.
relation extraction.
 Dataset        Model          8-shot      16-shot     32-shot         4.4. Low-resource Results
                BERT            10.37       16.85       25.01
               BioBERT          17.41       24.17       31.02          The dataset in the relation extraction task usually requires
  CPI                                                                  manual annotation of a large amount of high-quality data,
               SCIBERT          18.26       23.59       30.76
              Our Model        29.67        35.58       41.15          which requires the participation of domain experts. How-
                BERT            12.76       20.45       29.58          ever, the cost of collecting these data is high, especially in
  DDI
               BioBERT          21.57       27.14       34.82          the biomedical field. Therefore, in the context of resource
               SCIBERT          22.01       26.32       34.17          scarcity, how to make the model fully utilize existing data to
              Our Model        33.90        38.26       43.29          achieve better performance has become a highly concerned
                                                                       issue.
                                                                          The relation extraction performance of the model is eval-
ing domain knowledge from the model, the Micro-F1 score                uated by simulating low-resource relation extraction when
decreases by 1.1% and 1.6% in the DDI and CPI datasets, re-            biomedical data is scarce. The K-shot support set is con-
spectively. The experimental findings indicate that domain             structed using the training set of the biomedical dataset,
knowledge plays a moderating role in mitigating the impact             where each entity type contains K samples. To simulate
of domain-specific terminology on model performance.                   low-resource biomedical relation extraction, 8, 16, and 32
   When prompt learning is removed from the model, the                 samples are sampled for each entity type, and each relation
Micro-F1 score experiences a decline of 1.9% and 2.7% in               type is sampled at least once. Table 9 shows the comparison
the DDI dataset and the CPI dataset, respectively. We hy-              of biomedical relation extraction performance of our model
pothesize that prompt learning can narrow the gap between              and other pre-trained models under low-resources.
pre-training and downstream tasks, enabling the model to                  According to the comparative findings presented in Table
acquire more knowledge from limited data and thereby en-               9, it is evident that our model exhibits commendable per-
hancing the effectiveness of biomedical relation extraction.           formance in scenarios characterized by limited resources.
Upon removing both domain knowledge and prompt learn-                  In such instances, our model surpasses other pre-trained
ing from the proposed model, the Micro-F1 score exhibits               models in terms of efficacy. Notably, even when working
a decrease of 2.5% and 3.1% in the DDI dataset and CPI                 with a relatively modest data volume at K=8, our model
dataset, respectively. The experimental results demonstrate            manages to attain desirable outcomes. Even upon increas-
that domain knowledge and prompt learning are crucial                  ing K to 16, the F1 score of our model remains superior to


                                                                  57
that of other models. As K is further elevated to 32, the dis-            References
crepancy between our model and other pre-trained models
gradually diminishes alongside the expansion of the sample                 [1] S. Zhao, C. Su, Z. Lu, F. Wang, Recent advances in
size. Nevertheless, our model’s performance continues to                       biomedical literature mining, Briefings in Bioinfor-
outshine that of other models. Empirical evidence substan-                     matics 22 (2021) bbaa057.
tiates the notion that our model effectively enhances the                  [2] T. Zhang, J. Leng, Y. Liu, Deep learning for drug–drug
accuracy of biomedical relation extraction when confronted                     interaction extraction from the literature: a review,
with limited resources.                                                        Briefings in bioinformatics 21 (2020) 1609–1627.
                                                                           [3] Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Sun, B. Xu,
                                                                               Z. Zhao, Neural network-based approaches for
4.5. Case Study                                                                biomedical relation classification: a review, Journal of
As shown in Figure 3, we selected some examples from the                       biomedical informatics 99 (2019) 103294.
biomedical dataset for detailed analysis. We compare the                   [4] Y. Qiu, Y. Zhang, Y. Deng, S. Liu, W. Zhang, A com-
prediction results of BioBERT with our model. According to                     prehensive review of computational methods for drug-
Case 1, the result of the BioBERT model is Negative, indicat-                  drug interaction detection, IEEE/ACM transactions on
ing that the prediction is incorrect, while our model is CPR:                  computational biology and bioinformatics 19 (2021)
9, indicating that the prediction is correct. The sentence                     1968–1985.
contains multiple biomedical entities, which makes it diffi-               [5] Q. Zhao, D. Xu, J. Li, L. Zhao, F. A. Rajput, Knowledge
cult for the model to fully learn the Semantic information                     guided distance supervision for biomedical relation ex-
of biomedical text. Upon integrating the biomedical entities                   traction in chinese electronic medical records, Expert
with the expertise found in the knowledge base, the model                      Systems with Applications 204 (2022) 117606.
is fortified to represent the said entities, facilitating a better         [6] P. Su, K. Vijay-Shanker, Investigation of improving the
understanding of the textual information. The prediction                       pre-training and fine-tuning of bert model for biomed-
results show that our model can obtain enhanced text repre-                    ical relation extraction, BMC bioinformatics 23 (2022)
sentation after integrating domain knowledge, and improve                      120.
the classification effect in sentences containing complex                  [7] S. Liu, B. Tang, Q. Chen, X. Wang, et al., Drug-drug
biomedical entities.                                                           interaction extraction via convolutional neural net-
   According to Case 2, there are multiple biomedical enti-                    works, Computational and mathematical methods in
ties in the sentence, which makes it difficult for the model                   medicine 2016 (2016).
to fully learn the Semantic information of biomedical text,                [8] S. Liu, K. Chen, Q. Chen, B. Tang, Dependency-based
and BioBERT model makes wrong predictions. Our model                           convolutional neural network for drug-drug interac-
fused domain knowledge and made correct predictions. Ac-                       tion extraction, in: 2016 IEEE international conference
cording to Case 3, the BioBERT model incorrectly predicts                      on bioinformatics and biomedicine (BIBM), IEEE, 2016,
Int type text as Mechanism. The small number of Int type                       pp. 1074–1080.
training samples makes it difficult for the BioBERT model                  [9] M. Asada, M. Miwa, Y. Sasaki, Extracting drug-drug
to fully learn its class characteristics. Our model can obtain                 interactions with attention cnns, in: BioNLP 2017,
more knowledge from limited data by introducing prompt                         2017, pp. 9–18.
learning, effectively alleviating the problem of insufficient             [10] R. Kavuluru, A. Rios, T. Tran, Extracting drug-drug
learning knowledge when the data volume is small. There-                       interactions with word and character-level recurrent
fore, our model made the correct prediction.                                   neural networks, in: 2017 IEEE International Confer-
                                                                               ence on Healthcare Informatics (ICHI), IEEE, 2017, pp.
                                                                               5–12.
5. Conclusion                                                             [11] S. Lim, K. Lee, J. Kang, Drug drug interaction ex-
                                                                               traction from the literature using a recursive neural
In this study, we propose a biomedical relation extraction                     network, PloS one 13 (2018) e0190926.
model based on domain knowledge and prompt learning.                      [12] S. K. Sahu, A. Anand, Drug-drug interaction extraction
The model can enhance entity representation by integrating                     from biomedical texts using long short-term memory
domain knowledge, thus reducing the impact of highly tech-                     network, Journal of biomedical informatics 86 (2018)
nical languages and domain specific terms in biomedical                        15–24.
texts on model performance. By introducing prompt learn-                  [13] V. Mostafapour, O. Dikenelli, Attention-wrapped hi-
ing, more knowledge can be obtained from limited data,                         erarchical blstms for ddi extraction, arXiv preprint
effectively alleviating the problem of insufficient knowledge                  arXiv:1907.13561 (2019).
that models can learn when the data volume is small, thereby              [14] W. Wang, X. Yang, C. Yang, X. Guo, X. Zhang, C. Wu,
improving the classification effect of biomedical relation.                    Dependency-based long short term memory network
The experimental results show that the model can effectively                   for drug-drug interaction extraction, BMC bioinfor-
improve the accuracy of biomedical relation extraction by                      matics 18 (2017) 99–109.
introducing domain knowledge and prompt learning.                         [15] D. Huang, Z. Jiang, L. Zou, L. Li, Drug–drug interac-
   In the future, we will continue to explore the potential of                 tion extraction from biomedical literature using sup-
prompt learning, try different prompt methods, and apply                       port vector machine and long short term memory net-
our model to document-level relation extraction.                               works, Information sciences 415 (2017) 100–109.
                                                                          [16] W. Zheng, H. Lin, L. Luo, Z. Zhao, Z. Li, Y. Zhang,
Acknowledgments                                                                Z. Yang, J. Wang, An attention-based effective neural
                                                                               model for drug-drug interactions extraction, BMC
This work is supported by grant from the Natural Science                       bioinformatics 18 (2017) 1–11.
Foundation of China (No. 62072070 and 62106034)                           [17] Y. Zhang, W. Zheng, H. Lin, J. Wang, Z. Yang, M. Du-


                                                                     58
     montier, Drug–drug interaction extraction via hier-                (2019) 61–68.
     archical rnns on sequence and shortest dependency
     paths, Bioinformatics 34 (2018) 828–835.
[18] Y. Peng, A. Rios, R. Kavuluru, Z. Lu, Extracting
     chemical–protein relations with ensembles of svm and
     deep learning models, Database 2018 (2018) bay073.
[19] C. Sun, Z. Yang, L. Luo, L. Wang, Y. Zhang, H. Lin,
     J. Wang, A deep learning approach with deep contex-
     tualized word representations for chemical–protein
     interaction extraction from biomedical literature, IEEE
     Access 7 (2019) 151034–151046.
[20] Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Sun, Chemical–
     protein interaction extraction via contextualized word
     representations and multihead attention, Database
     2019 (2019) baz054.
[21] W. Xiong, F. Li, H. Yu, D. Ji, Extracting drug-drug inter-
     actions with a dependency-based graph convolution
     neural network, in: 2019 IEEE International Confer-
     ence on Bioinformatics and Biomedicine (BIBM), IEEE,
     2019, pp. 755–759.
[22] C. Park, J. Park, S. Park, Agcn: Attention-based graph
     convolutional networks for drug-drug interaction ex-
     traction, Expert Systems with Applications 159 (2020)
     113538.
[23] Y. Peng, S. Yan, Z. Lu, Transfer learning in biomedical
     natural language processing: an evaluation of bert and
     elmo on ten benchmarking datasets, arXiv preprint
     arXiv:1906.05474 (2019).
[24] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang,
     Biobert: a pre-trained biomedical language representa-
     tion model for biomedical text mining, Bioinformatics
     36 (2020) 1234–1240.
[25] Z. Huang, N. An, J. Liu, F. Ren, Emsi-bert: Asymmetri-
     cal entity-mask strategy and symbol-insert structure
     for drug–drug interaction extraction based on bert,
     Symmetry 15 (2023) 398.
[26] C. Sun, Z. Yang, L. Su, L. Wang, Y. Zhang, H. Lin,
     J. Wang, Chemical–protein interaction extraction via
     gaussian probability distribution and external biomed-
     ical knowledge, Bioinformatics 36 (2020) 4323–4330.
[27] C. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, J. Wang, At-
     tention guided capsule networks for chemical-protein
     interaction extraction, Journal of Biomedical Infor-
     matics 103 (2020) 103392.
[28] X. Liu, J. Tan, J. Fan, K. Tan, J. Hu, S. Dong, A
     syntax-enhanced model based on category keywords
     for biomedical relation extraction, Journal of Biomed-
     ical Informatics 132 (2022) 104135.
[29] M. Herrero-Zazo, I. Segura-Bedmar, P. Martínez, T. De-
     clerck, The ddi corpus: An annotated corpus with
     pharmacological substances and drug–drug interac-
     tions, Journal of biomedical informatics 46 (2013) 914–
     920.
[30] J. Kringelum, S. K. Kjaerulff, S. Brunak, O. Lund, T. I.
     Oprea, O. Taboureau, Chemprot-3.0: a global chem-
     ical biology diseases mapping, Database 2016 (2016)
     bav123.
[31] P.-Y. Lung, T. Zhao, Z. He, J. Zhang, Extracting chemi-
     cal protein interactions from literature, in: Proceed-
     ings of the BioCreative VI Workshop, 2017, pp. 159–
     162.
[32] H. Lu, L. Li, X. He, Y. Liu, A. Zhou, Extracting chemical-
     protein interactions from biomedical literature via
     granular attention based recurrent neural networks,
     Computer methods and programs in biomedicine 176


                                                                   59