-

Relation Extraction in Medical Text

Leo Xinyue Zhang

leo.xinyue.zhang@kcl.ac.uk 0

Angus Roberts

angus.roberts@kcl.ac.uk 0

Sebastian Zeki

sebastian.zeki@gstt.nhs.uk 1 0 King's College London , Strand, London, WC2R 2LS , United Kingdom 1 St Thomas' Hospital , Westminster Bridge Road, London SE1 7EH, United Kingdon

Named Entity Recognition and Relation Extraction are two fundamental tasks for medical information extraction. Typically, these tasks are done in a pipeline way. However, this approach ignores the interactions between these two tasks. In addition, modelling with two models takes time to train and deploy. There is research on modelling the two tasks together. However, some research only considers entities that are in some relations. Moreover, there is seldom research on joining modelling in the medical domain. In this paper, we implemented a promising generative joining modelling method on a medical dataset. We extend the modelling mechanism to incorporate non-relation entities into the output as a self-concept relation. As such, we are able to output the entire entities and relations in one step for medical extraction.

medical extraction, named entity recognition, relation extraction, joint modelling,

1. Introduction

Named Entity Recognition (NER) and Relation Extraction (RE) are two fundamental tasks in Information Extraction (IE) from free text. NER is the process of identifying entities from free text, and categorise them if needed, while RE is the process of identifying any existing relations between the entities. Typically, these two tasks are done in a sequential manner, i.e. named entities are extracted ifrst before passing on to relation extraction. However, there are two main drawbacks with this approach: • This method disregards the interaction between NER and RE tasks[ 1 ]. Because the NER and RE module are two separated modules, the information cannot flow between the two tasks. These information can be helpful. Consider the following example, “London is the capital of the United Kingdom”, the information for relation extraction “capital of” can help the NER task as that relation indicates that the left hand entity will be a city and the right hand entity will be a country, province or equivalent. • Errors from the NER task will propagate to NE [ 1 ]. In the previous example, if London is wrongly identified as a person during the NER stage, this error will not be corrected during the RE stage.

One way to solve this problem is to model NER and RE as one task, i.e. one model creates a single output con[ 2 ]. However, these end-to-end methods only generate relation triplets which means that entities that are not part of any relationship will not be extracted. These entities can be important in real life applications. In this work, we propose sequence formatting that incorporates non-relation entities into the relation triplets. We compare whether incorporating these non-relation entities improves the performance of relation extraction. The contributions of this work are two fold. First, it provides a method for researchers that benefits from end-to-end relation extraction models such as REBEL without the need to set up a separate entity model for non-relation into whether non-relation entities can help relation extractions. contributions of this work are two fold. First, it provides a method for researchers that benefits from end-to-end relation extraction models such as REBEL without the need to set up a separate entity model for non-relation entity extraction. Second, this work shines some insight into whether non-relation entities can help relation extractions.

3. Related Work 3.1. Sequence to Sequence Modelling 2. Introduction

NER and RE tasks[ 1 ]. Because the NER and RE Sequence to sequence (seq2seq) modelling is an impormodule are two separated modules, the informa- tant task in NLP, generating a target sequence given a tion cannot flow between the two tasks. These source sentence. Unlike classification tasks, where the information can be helpful. Consider the follow- model generates a fixed-length output. Seq2seq tasks ing example, “London is the capital of the United require a flexible length of output. Current seq2seq modKingdom”, the information for relation extraction elling uses encoder-decoder models. These models have “capital of” can help the NER task as that rela- two parts; an encoder model to encode the input sentence tion indicates that the left hand entity will be a into some internal representation, and a decoder model city and the right hand entity will be a country, to generate the output sentence from this representation. province or equivalent. Early examples included two Recurrent Neural network • Errors from the NER task will propagate to NE based models as the encoder and decoder for machine [ 1 ]. In the previous example, if London is wrongly translation [ 7 ] and text summarisation [ 8 ]. A recent trend identified as a person during the NER stage, this has switched the focus to attention-based models after the error will not be corrected during the RE stage. proposal of the transformer model [ 9 ]. Attention-based models have shown generally better performance, and

One way to solve this problem is to model NER and RE the ability for parallel computing. The BART model[ 6 ] as one task, i.e. one model creates a single output con- made use of transformer architecture, which enabled it taining both entity and relation extractions. For example, to do seq2seq tasks with no restrictions on output sesequence-to-sequence (seq2seq) models directly output quence length. BART model is pre-trained on denoised relation triplets which includes entities and relations be- and corrupted corpora to gain language reconstruction tween entities [ 2 ], or graph models where the nodes are ability. The BART model achieved state-of-the-art results entities and edges are relations [ 3 ], or question answer- on a number of text generation tasks by the time it was ing models where a sequence of questions are asked, and published. the answer contains entities and relations [ 4 ]. In this research, we focus on joint modelling, specifically using 3.2. Seq2seq model for relation extraction seq2seq models to output relation triplets. This method has been previously used, for example, [ 5 ] used a bidirec- In a recent work by [ 2 ], a pre-trained BART-based model tional Long Short-Term Memory (bi-LSTM) encoder and for relation triplets sequence generation was proposed, decoder to output strings formatting by relation triplets. named REBEL. The model output is in the format of Model performance is, however, restricted by the amount concatenating all the relation triplets in the sentence of data one can use. To overcome the restrictions posed with the help of special tokens. However, to pre-train a by limited amounts of data, [ 2 ] proposed a novel way to BART based model on relations extraction task requires construct a large dataset from Wikipedia, and pre-train a large amount of annotated data. The author proposed a BART-based model[ 6 ] on that dataset. This method a method to generate a silver dataset from Wikipedia. achieved the best performance on the four tasks reported Firstly, they extract all the Wikipedia abstraction, which [ 2 ]. However, these end-to-end methods only generate is the section before the list of content1. In Wikipedia, relation triplets which means that entities that are not the entities are usually in hyperlinks. The author then part of any relationship will not be extracted. These en- mapped these entities to WikiData, which is a collectively tities can be important in real life applications. In this edited knowledge graph of relations between Wikipedia work, we propose sequence formatting that incorporates non-relation entities into the relation triplets. We com- 1This is true by the time when REBEL was published. Now Wikipedia pare whether incorporating these non-relation entities has moved the content list to left hand side. The abstract now can improves the performance of relation extraction. The be defined as the piece of text preceding any section titles entries, and then extracted relations of these entities. But the extracted relations from WikiData may not be necessarily expressed in selected text. For example, in sentence Donald Trump visited President of Canada, there is no relation between Donald Trump and President although this relation exists in WikiData. To alleviate this, the author adapted an established Natural Language Inference model. The NLI will assign a score which indicates how likely the text can entail the relation triplets. The sequence with a score below 75% will be filtered out.

4. Methodology 4.1. REBEL Model for relation extraction

− prefix. In the original REBEL formatting, ibuproWe use the REBEL model as our base model. In essence, fen would be missed out in the output sequence because they formatted a triplet with special tokens. < > it does not exist in any relation triplets. token marks the start of a relation triplet, tokens E-REBEL can give us the power to extract non-relation between < > and < > are the head entity entities and regular relations in one model. However in the relation triplets, tokens between < > and we also ask if the non-relation incorporation could also < > are the tail entity in the relation triplets, and enhance the performance of relation extraction and vice tokens after < > are what the relation is. If a head versa. We conducted a comparison between the REBEL entity appears in more than one relation, then the second and E-REBEL models. tail relation just adds on to the first relation triplets. For example, in the following sentence The patient needs to take paracetamol three times a day for a week. The 5. Experiment Design output sequence will be

< > < > ℎ < > − < > < > − .

4.2. Entity-incorporated REBEL Model

In this work, we proposed a novel way to incorporate entities into relation triplets, named E-REBEL. The idea is to treat entities as entity relations to themselves. For example, the entity paracetamol as medication would be treated as the following triplets < > < > < > − . To put it in a sentence with other entities and relations, in the following sentence The patient needs to take paracetamol three times a day for a week and ibuprofen, the final relation triplets would be

< > < > ℎ < > − < > < > − < > < > − < > < > < > − .

In this way, all the entities will be included in the output sequence, and they can be easily extracted using the

5.1. Dataset and Evaluation

The data used in this project is from the 2018 n2c22 shared task on adverse drug events and medication extraction in electronic health records[ 10 ]. The data includes 505 discharge summaries from the MIMIC-III (Medical Information Mart for Intensive Care-III) clinical care database3. The task defined 9 drug related concepts and 8 drug related relations. The list of concepts and relations, number of samples for each concept and relation can be found in Table 1. The challenge for this task is to distinguish whether two entities form a “Reason-Drug” relation or rather a “Drug-ADE” (Adverse Drug Event) relation. The dataset is not balanced in either entity types, or in relations.

The evaluation metrics used in this experiment include precision, recall and F1 score. They defined as follows: 2∗∗ + =

, 2∗ = 2∗ + + =

+ , 1 = Where TP is the number of true positives, FP is the number of false positives and FN is the number of false negatives. We choose F1 score over accuracy

2https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ 3https://mimic.mit.edu/

relations Drug-Reason Drug-Form Drug-Strength Drug-ADE Drug-Dosage Drug-Frequency Drug-Route Drug-Duration Micro Macro precision

5.2. REBEL Framework

We adopted the pre-trained REBEL model as described in [ 2 ]. We fine-tuned the model on our dataset. We explored the following learning rates:

Learning rate: 1e-5, 2.5e-5, 5e-5, 7.5e-5, 1e-4

After this grid search, we found that the REBEL model

with learning rate 7.5e-5, and E-REBEL model with learning rate 2.5e-5. We use maximum sequence length of 256 for REBEL with batch size 8, and maximum sequence length of 1024 for E-REBEL with batch size 2 because E-REBEL sequences are two to three times longer than the corresponding REBEL sequences. The batch size is due to the GPU memory limit give the sequence length.

6. Preliminary Results

The precision, recall and F1 score of end-to-end relation extraction of REBEL, E-REBEL models are shown in Table 2 and 3 respectively. The precision, recall and F1 score of entity extraction performance of E-REBEL are shown in Table 4. The F1 scores of end-to-end relation extractions of REBEL, E-REBEL and top five models on n2b2 leader board are shown in tabel 5. The F1 scores include micro and macro F1 scores and F1 scores of Drug-ADE and Drug-Reasons. The F1 scores of concept extractions of E-REBEL and top five models on n2b2 leader board are shown in tabel 6. These five models are not necessarily the same as end-to-end relation top five models. Micro and macro F1 scores and F1 scores of ADE and Reasons are shown. All rankings are based on micro-F1 score across all relations or across all concepts.

7. Analysis

From Table 2 and 5, REBEL model achieves relatively good performance on end-to-end relation extraction. Its micro and macro F1 scores are on a par with top models on the leader board. Markedly, its performance on Drug-Reason and Drug-ADE relations is far better when compared to other models. REBEL model has a relatively balanced scores on precision and recall.

E-REBEL model has a decreased performance when recall compared to REBEL model. The drop is on all relations but there are some big drops from the recalls of DrugReason, Drug-ADE, Drug-Duration and Drug-Frequency. This means the model is more conservative on generating relation triplets. This shows that integrating entity triplets may not help with the relation triplets generations. A possible explanation is that the E-REBEL model has to generate sequences that are much longer than the REBEL model does, and within each output sequence, entity parts are generally longer than relation parts. This increases the dificulty of generating more and accurate relation triplets. Additionally, we use REBEL pre-trained model for fine-tuning which is not trained on medical specific data, and it does not have entity incorporating. Lastly, We need to test on more dataset to reach a more convincing conclusion on whether the entity incorporating decreases the relation extraction, especially including dataset that has good amount of non-relation entities.

8. Future work

This paper is a working progress. There are four aspects that I am working on.

Data The data used in this paper is limited. I plan to use more medical data to have a better understanding of E-REBEL model performance. especially for the data that includes entities that are not always in some relations.

Entity incorporating Method There are other ways to incorporate entities into output sequence. I am currently working on some possible methods and to compare the performance of these methods.

Entity incorporated Retraining In this work, we only Incorporated entities in fine tuning stage. The pre-trained REBEL model does not have entity incorporation. This impairs the performance of E-REBEL model and lead to unfair comparison between REBEL and E-REBEL models.

Medical knowledge integration REBEL model are pre-trained on Wikipedia data, which is a collection of general language information. I plan to create a medical REBEL dataset for model to gain domain knowledge.

There main real life applications coming out of this work if it succeeds is a pre-trained entity incorporated REBEL that can serve as a general framework for downstream medical entity and relation extraction tasks such as extracting information from endoscopy, pathology, radiology reports. For example, this pretrained model will be used for my PhD project which involved extracting entities and relations from pathology and endoscopy reports for Barrett’s oesophagus patients. drug events and medication extraction in electronic health records, Journal of the American Medical Informatics Association 27 (2020) 3–12.

[1]

Miwa ,

Bansal , End-to-end relation extraction using lstms on sequences and tree structures , arXiv preprint arXiv:1601.00770 ( 2016 ).

[2]

P.-L. H.

Cabot ,

Navigli , Rebel: Relation extraction by end-to-end language generation , in: Findings of the Association for Computational Linguistics: EMNLP 2021 , 2021 , pp. 2370 - 2381 .

[3] T.-J. Fu , P.-H. Li , W.-Y. Ma, Graphrel: Modeling text as relational graphs for joint entity and relation extraction, in: Proceedings of the 57th annual meeting of the association for computational linguistics , 2019 , pp. 1409 - 1418 .

[4]

Li ,

Yin ,

Sun ,

Li ,

Yuan ,

Chai ,

Zhou ,

Li , Entity-relation extraction as multi-turn question answering , arXiv preprint arXiv: 1905 . 05529 ( 2019 ).

[5]

Nayak , H. T. Ng, Efective modeling of encoderdecoder architecture for joint entity and relation extraction , in: Proceedings of the AAAI conference on artificial intelligence , volume 34 , 2020 , pp. 8528 - 8535 .

[6]

Lewis ,

Liu ,

Goyal ,

Ghazvininejad ,

Mohamed ,

Levy ,

Stoyanov , L. Zettlemoyer, Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , arXiv preprint arXiv: 1910 . 13461 ( 2019 ).

[7]

Cho ,

B. Van

Merriënboer ,

Bahdanau ,

Bengio , On the properties of neural machine translation: Encoder-decoder approaches , arXiv preprint arXiv:1409.1259 ( 2014 ).

[8]

Nallapati ,

Xiang ,

Zhou , Sequence-tosequence rnns for text summarization ( 2016 ).

[9]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez , Ł. Kaiser, I. Polosukhin , Attention is all you need , Advances in neural information processing systems 30 ( 2017 ).

[10]

Henry ,

Buchan ,

Filannino ,

Stubbs ,

Uzuner , 2018

n2c2 shared task on adverse