=Paper=
{{Paper
|id=Vol-2664/cantemist_paper18
|storemode=property
|title=A Joint Model for Medical Named Entity Recognition and Normalization
|pdfUrl=https://ceur-ws.org/Vol-2664/cantemist_paper18.pdf
|volume=Vol-2664
|authors=Ying Xiong,Yuanhang Huang,Qingcai Chen,Xiaolong Wang,Yuan Nic,Buzhou Tang
|dblpUrl=https://dblp.org/rec/conf/sepln/XiongHC0NT20
}}
==A Joint Model for Medical Named Entity Recognition and Normalization==
<pdf width="1500px">https://ceur-ws.org/Vol-2664/cantemist_paper18.pdf</pdf>
<pre>
A Joint Model for Medical Named Entity Recognition and Normalization
Ying Xionga, Yuanhang Huanga, Qingcai Chena,b, Xiaolong Wanga, Yuan Nic, and Buzhou Tanga,b*
a
  Harbin Institute of Technology, Shenzhen, Xili university town, Shenzhen, China
b
  Pengcheng Labtorary, Xili Street, Shenzhen, China
c
  PingAn Health Technology Ltd, Shenzhen, China

                                  Abstract
                                  Traditional pipeline models for medical named entity recognition and normalization (MER and
                                  MEN) suffer from error propagation. To tackle the error propagation problem, we propose a
                                  novel joint deep learning method for the 2020 IberLEF shared task on MER and MEN, where
                                  MER is regarded as a machine reading comprehension (MRC) problem and MEN as multiple
                                  sequence labeling problems corresponding to normalized hierarchical tumor codes. In the 2020
                                  IberLEF shared task, our proposed joint model achieves an F1 score of 0.87 on MER and an
                                  F1 score of 0.825 on MEN, and significantly outperforms pipeline models for comparison.

                                  Keywords1
                                  Medica named entity recognition, medical entity normalization, joint deep learning, multiple
                                  sequence labeling

1. Introduction

In the past few years, researchers have taken great interest in clinical natural language processing (NLP)
and have launched a number of shared tasks on clinical NLP in Spanish. In 2019, Martin Krallinger et
al. [1] organized a shared task for Pharmacological Substances, Compounds and proteins and Named
Entity Recognition (called PharmaCoNER). In the same year, Iberian Languages Evaluation Forum
(IberLEF) launched a shared task on Medical Document Anonymization (MEDDOCAN) [2] and
eHealth Knowledge Discovery [3]. In 2020, IberLEF first organizes CANcer TExt Mining Shared Task
(CANTEMIST), a tumor morphology task, including named entity recognition and normalization of a
critical type of medical concept related to cancer, named medical named entity recognition (MER) and
medical named entity normalization (MEN) [4]. In this shared task, participants are required to
recognize a kind of entity “MORFOLOGIA_NEOPLASIA”. Pipeline methods are usually applied to
MER and MEN, where MEN is a follow-up task of MER [5]. However, it is inevitable that the pipeline
methods suffer from error propagation as reported by Xiong et al. [5]. To avoid alleviate error
propagation problem, a few joint learning methods have been proposed to solve MER and MEN
simultaneously. For example, Leamon et al. [6] proposed an ensembled method composed of two
independent machine learning models to deal with chemical named entity recognition and
normalization jointly. Lou et al. [7] proposed a transition-based model to recognize disease named
entities and map them into normalized concepts. Zhao et al. [8] proposed a multi-task joint learning
method to perform MER and MEN.
In this paper, we propose a novel joint deep learning method for MER and MEN and develop a system
based this model for CANTEMIST in 2020. The model uses different neural network layers for MER
and MEN, but the same word representation shared by MER and MEN. In this model, MER is regarded
as a machine reading comprehension (MRC) problem inspired by Li et al. [9,10] and MEN as multiple
sequence labeling problems corresponding to normalized hierarchical tumor codes as shown in Figure
1, where the first four digital chars denote tumor/cell type, the fifth digital char denotes behavior, the
sixth digital char denotes differentiation, and the last char denotes whether there is a relevant modifier

Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020)
EMAIL:        xiongying@stu.hit.edu.cn;       18s051003@stu.hit.edu.cn;                                            qingcai.chen@gmail.com;   wangxl@insun.hit.edu.cn;
niyuan442@pingan.com.cn; tangbuzhou@gmail.com (corresponding author);
                               © 2020 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)
not included in the terminology of this concept. In the 2020 IberLEF shared task, our proposed joint
model achieves an F1 score of 0.87 on MER and an F1 score of 0.825 on MEN, and significantly
outperforms pipeline models for comparison.


Figure 1: Example of hierarchical architecture of tumor code


2. Material and method

    As shown in Figure 2, we develop a joint deep learning model for the 2020 IberLEF shared task on
MER and MEN. For the MER task, we adopt a machine reading comprehension model to detect ME
spans. For MEN, we regard it as multiple sequence labeling tasks. Each part is presented in the
following section in detail.


Figure 2: Architecture of the proposed joint deep learning model for MER and MEN


2.1.    Dataset

    The 2020 IberLEF shared task organizers provide a corpus, called CANTEMIST, including a total
of 6233 clinical notes in Spanish, 1301 out of which are manually annotated with
“MORFOLOGIA_NEOPLASIA” entities mapped to 8410 normalized turor codes. The annoted corpus
is further split into a training set of 501 notes, a development set (dev) of 250 notes, a supplementary
development set (sdev) of 250 notes and a test set of 300 notes mixed with other 4932 notes as
background (bg). A tumor code consists of six digital chars plus one relevant modifier denoted by
“ABCD/EF/H” and can be divided into four parts of different meanings as follows: “ABCD”-tumor/cell
type, ‘E’-behavior, ‘F’-differentiation and ‘H’-relevant modifier not included in the terminology of this
concept. Table 1 lists the statistics of the corpus in detail.

Table 1 Statistic of the CANTEMIST corpus
            statistics     #training    #dev         #sdev         #test            #bg


                                                    500
            document           501          250         250          300              4932

2.2.    Medical named entity recognition
Different from most existing models that regard MER as a sequence labeling problem that needs to tag
each token with entity boundary and type, in this paper, we regard MER as an MRC problem, whose
task is to answer questions regarding different types of entities based on given passages. Following
previous studies [9,10], we directly use the definition of each type of entity as the question regarding it.
That is, the definition of MORFOLOGIA_NEOPLASIA, “La morfología o histología de las neoplasias
hace referencia a la formam y estructura de las células tumorales”, is the question regarding
MORFOLOGIA_NEOPLASIA, denoted by 𝑞 . A sentence in any clinical record is regarded as a
passage, denoted by 𝑝 . The task of MRC is to determine the start and end position pairs of
MORFOLOGIA_NEOPLASIA entities, given 𝑞 and 𝑝. We define a start and end position pair as (𝑠, 𝑒).
In our model, BERT [11] is first used as backbone to represent the interactions between 𝑞 and 𝑝, and
outputs the representation of passage 𝑝, denoted by 𝐻 ∈ ℝ+×- , where 𝑛 is the length of 𝑝, and 𝑑 is the
representation dimension of each word. Then, multi-layer perception (MLP) is used to compute the
possibilities of start position 𝑠 and end position 𝑒 as follows:
                                 𝑃12 = 𝑀𝐿𝑃(𝑊1 𝐻7 + 𝑏1 ),                                               (1)
                                 𝑃:2 = 𝑀𝐿𝑃(𝑊: 𝐻7 + 𝑏: ),                                               (2)
where 𝑠7 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑃17 ) ∈ {0,1} represents whether the i-th position is the start position of an entity
and 𝑒7 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝑃:2 ) ∈ {0,1} represents whether the i-th position is the end position of an entity, 𝑊1
and 𝑊: are parameter matrices, 𝑏1 and 𝑏: are bias vectors.
   During the training phase, we adopt the cross entropy loss to optimize the parameters of our MER
model, which is defined as follows:
                                  𝐿1EFGE = ∑1 𝐶𝐸(𝑃1 , 𝑌1 ),                                            (3)
                                  𝐿:+- = ∑: 𝐶𝐸(𝑃: , 𝑌: ),                                              (4)
                                  𝐿L:G = 𝐿1EFGE + 𝐿:+- ,                                               (5)
where CE is the cross entropy loss, 𝑃1 and 𝑃: are the possibilities of predicted start position 𝑠 and end
position 𝑒, 𝑌1 and 𝑌: are the possibilities of gold standard start position 𝑠 and end position 𝑒.


2.3.    Medical named entity normalization
    The task of MEN is to map a medical named entity to a normalized code in a given vocabulary. In
this paper, we convert MEN into a multiple sequence labeling problem, where each token is labeled
with three normalized subcodes as shown in Figure 2, where the behavior code and differentiation code
are combined together as we believe they are strongly related to each other. The subcodes of the i-th
token in passage can be predicted by equations (6), (7) and (8) defined as follows:
                        𝑐7FNO- = 𝑀𝐿𝑃(𝑊FNO- 𝐻7 + 𝑏FNO- ),                                        (6)
                             :P                                                                 (7)
                            𝑐7 = 𝑀𝐿𝑃Q𝑊:P 𝐻7 + 𝑏:P R,
                             ST                                                                 (8)
                           𝑒7 = 𝑀𝐿𝑃Q𝑊ST 𝐻7 + 𝑏ST R,
                                     FFNO- :P ST                                                (9)
                               𝑐7 = 𝑐7    /𝑐 /𝑐 ,
                                          7    7
       FNO- :P       ST
where 𝑐7   , 𝑐7 and 𝑐7 are the three subcodes, 𝑊FNO- , 𝑊:P , 𝑊ST are parameter matrice and 𝑏FNO- ,
𝑏:P , and 𝑏ST are bias vectors..
Similar MER, we adopt the cross entropy loss for model parameter optimization. The loss of MEN is
defined as follows:
        𝐿     = V 𝐶𝐸Q𝑃 WXYZ , 𝑌 WXYZ R + 𝐶𝐸Q𝑃 [\ , 𝑌 [\ R + 𝐶𝐸Q𝑃 ]^ , 𝑌 ]^ R,               (10)
         L:+               O         O            O     O            O     O
                  O
where 𝑃O ∗ is the possibility of each predicted subcode 𝑐 ∗ and 𝑌O ∗ is the possibility of each gold
standard subcode 𝑐 ∗ .
The total loss of our joint model is the weighted sum of the MER loss and MEN loss:


                                                      501
                              𝐿E`EFa = 𝐿L:G + 𝜆𝐿L:+ ,                                             (11)
where 𝜆 is the loss weight.

2.4.    Evaluation

   The performances of all models on both MER and MEN task are evaluated by concept-level
precision (P), recall (R) and F1-score (F1) under the exact-match criterion.

2.5.    Experiments setup

   We first investigate the effect of combination parameter 𝜆 of different values (0.5 vs 1.0) on our
joint model and then compare the joint model with a pipeline method that uses the same method for
MER and a generation model SGM [12] for MEN. The BERT model is initialized by “BERT-Base,
Multilingual Cased” (https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-
768_A-12.zip), and further pretrained on the CANTEMIST corpus. All other parameters are optimized
on the supplementary development set.

3. Results
  The performance of our joint model is listed in Table 2, where the highest P, R, and F1 scores of the
model on MER and MEN are highligted in bold. The highest F1 score of MER is 0.87 when the loss
weight 𝜆 is set as 0.5, and the highest F1 score of MEN is 0.825 when loss weight 𝜆 is 1.0. In total, our
model achieves better performance when 𝜆 = 1.0.

Table 2
The overall performance of our model on MER and MEN
                              𝜆      Task       P     R                F1
                             0.5     MER      0.871 0.868             0.87
                                     MEN       0.82 0.808            0.814
                             1.0     MER      0.866  0.87            0.868
                                     MEN      0.824 0.826            0.825


3.1.    Joint learning vs pipeline

As shown in Table 3, the joint learning method yields a higher F1 score than the pipeline method. This
demonstrates the joint model benefits from the shared representation of word representation. For the
MER task, the joint learning method outperforms the pipeline method by achieving a 0.868 F1 score.
For the MEN task, the joint learning method can bring 4.3% F1 score improvements over the pipeline
method. Due to the high imbalance of codes, there are ubiquitous code 8000/6. When removing 8000/6
mentions (denoted as No-Metastasis), the joint learning method has a slight change on P, R and F1
score, which indicates the robustness of the joint model.

Table 3
The performances of the pipeline method and joint learning method on MER and MEN
     Task method        P       R     F1            P-              R-           F-
                                             No-Metastasis No-Metastasis No-Metastasis
     MER pipeline 0.862 0.857 0.86                  \               \            \
             joint    0.866 0.87 0.868              \               \            \
     MEN pipeline 0.794 0.791 0.792               0.799           0.765        0.782


                                                    502
               joint    0.824 0.826 0.825             0.848             0.803            0.825


4. Discussion

    Though our joint learning method shows a great improvement over the pipeline method, there are
still some errors on the MER task. 1) Long tail problem is the main obstacle. The number of the entity
mentions containing over 10 words is about 60, but our model can only recognize 10 of them. 2) Nested
entity mentions are difficult to recognize. Though our model tries to recognize the nested entities, it is
difficult for our model to recognize them because of its rareness in the training set. 3) If a text has “A
y (and) B”, our model finds it difficult to judge whether A and B are both entities.
The joint learning method shows a great improvement in the MEN task, but there are some limits. The
number of sequence labeling submodels has a huge impact on the results. When we further separate the
behavior part and the differentiation part of tumor code, the MEN F1 score on the development set
decreases from 0.794 to 0.708, indicating that the behavior and differentiation have a strong
relationship. In the future, we plan to explore how to detect the relationships among different parts of
tumor code automatically in the model.

5. Conclusion

   In this study, we propose a joint learning method for medical named entity recognition and medical
named entity normalization. We utilize a machine reading comprehension model to solve thee MER
task and a multiple sequence labeling model to solve the MEN task. Experimental results show the
effectiveness of our model.

6. Acknowledgements

   This paper is supported in part by grants: National Natural Science Foundations of China
(U1813215, 61876052 and 61573118), Special Foundation for Technology Research Program of
Guangdong Province (2015B010131010), National Natural Science Foundations of Guangdong, China
(2019A1515011158), Guangdong Province Covid-19 Pandemic Control Research Fund
(2020KZDZX1222), Strategic Emerging Industry Development Special Funds of Shenzhen
(JCYJ20180306172232154 and JCYJ20170307150528934) and Innovation Fund of Harbin Institute of
Technology (HIT.NSRIF.2017052).


7. References
[1]   Gonzalez-Agirre A, Marimon M, Intxaurrondo A, Rabal O, Villegas M, Krallinger M.
      PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity
      Recognition track. In: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks.
      Association for Computational Linguistics; 2019:1–10. doi:10.18653/v1/D19-5701

[2]   Marimon M, Gonzalez-Agirre A, Intxaurrondo A, et al. Automatic De-identification of Medical
      Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of
      Results. In: Proceedings of the Iberian Languages Evaluation Forum Co-Located with 35th
      Conference of the Spanish Society for Natural Language Processing, IberLEF@SEPLN 2019,
      Bilbao, Spain, September 24th, 2019. ; 2019:618–638. http://ceur-ws.org/Vol-
      2421/MEDDOCAN_overview.pdf

[3]   Piad-Morffis A, Gutiérrez Y, Consuegra-Ayala JP, et al. Overview of the eHealth Knowledge
      Discovery Challenge at IberLEF 2019. In: Proceedings of the Iberian Languages Evaluation


                                                     503
      Forum Co-Located with 35th Conference of the Spanish Society for Natural Language
      Processing, IberLEF@SEPLN 2019, Bilbao, Spain, September 24th, 2019. ; 2019:1–16.
      http://ceur-ws.org/Vol-2421/eHealth-KD_overview.pdf

[4]   Miranda-Escalada A, Farré E, Krallinger M. Named entity recognition, concept normalization and
      clinical coding: Overview of the CANTEMIST track for cancer text mining in Spanish, Corpus,
      Guidelines, Methods and Results. In: Proceedings of the Iberian Languages Evaluation Forum
      (IberLEF 2020). CEUR Workshop Proceedings. ; 2020.

[5]   Xiong Y, Shen Y, Huang Y, et al. A Deep Learning-Based System for PharmaCoNER. In:
      Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. Association for Computational
      Linguistics; 2019:33–37. doi:10.18653/v1/D19-5706

[6]   Leaman R, Lu Z. TaggerOne: joint named entity recognition and normalization with semi-Markov
      Models. Bioinform. 2016;32(18):2839–2846. doi:10.1093/bioinformatics/btw343

[7]   Lou Y, Zhang Y, Qian T, Li F, Xiong S, Ji D. A transition-based joint model for disease named
      entity    recognition     and     normalization.      Bioinform.      2017;33(15):2363–2371.
      doi:10.1093/bioinformatics/btx172

[8]   Zhao S, Liu T, Zhao S, Wang F. A Neural Multi-Task Learning Framework to Jointly Model
      Medical Named Entity Recognition and Normalization. In: The Thirty-Third AAAI Conference on
      Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial
      Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in
      Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. ;
      2019:817–824. doi:10.1609/aaai.v33i01.3301817

[9]   Li X, Feng J, Meng Y, Han Q, Wu F, Li J. A Unified MRC Framework for Named Entity
      Recognition. arXiv preprint arXiv:191011476. Published online 2019.

[10] Levy O, Seo M, Choi E, Zettlemoyer L. Zero-Shot Relation Extraction via Reading
     Comprehension. In: Proceedings of the 21st Conference on Computational Natural Language
     Learning (CoNLL 2017), Vancouver, Canada, August 3-4, 2017. ; 2017:333–342.
     doi:10.18653/v1/K17-1034

[11] Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional
     Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North
     American Chapter of the Association for Computational Linguistics: Human Language
     Technologies, Volume 1 (Long and Short Papers). ; 2019:4171–4186.

[12] Yang P, Sun X, Li W, Ma S, Wu W, Wang H. SGM: Sequence Generation Model for Multi-label
     Classification. In: Proceedings of the 27th International Conference on Computational
     Linguistics. ; 2018:3915–3926.


                                                  504

</pre>